This content is translated with AI. Please refer to the original Traditional Chinese version (zh-TW) for accuracy.

The story goes like this: Recently, I took on a project at the company where the process used only 3 metal layers, which is vastly different from previous ones I've worked on, such as .18 microns with 5 metal layers, 90 nm with 9 metal layers, and 40 nm with 10 layers (strictly speaking, only nine layers). This caused a lot of issues with innovus layout, and it's worth writing an article to thoroughly document the experience.

Why use so few layers of metal!

Because 1P3M is a good process In a word: money. Adding a metal layer means adding two photomasks (one for Via and one for the metal layer), which naturally increases costs. From what I've researched, 1P4M is about 30% more expensive than 1P3M.

With my limited knowledge, I guess the main photomask counts for planar MOSFET are as follows:

  • One layer for the Gate creation in the MOSFET region
  • One or two layers for Source Drain creation, depending on whether source and drain need separate processes
  • One layer for contact creation
  • Two photomasks per metal layer, one for routing and one for Via
  • The topmost metal layer has no Via, just one layer

Thus, 1P3M roughly involves (3,4)+2+2+1, totaling (8,9) layers; 1P4M adds two layers, totaling (10,11) layers, increasing costs by about 22-25%.

What to prepare during synthesis?

For designs with high computational density and few metal layers, it is advisable to disable all logic cells with more than 4 inputs during synthesis, such as OAI22, NOR4, AND4, etc. Why do this? Because logical components are typically densely packed, cells with more than 4 inputs have too many junctions per unit area, which may prevent successful routing during APR. Without specific configurations, you would have to set a lower utilization rate, which would directly explode the area (since the area is the reciprocal of the utilization rate, making it more sensitive as the rate decreases).

The solution as shown in this article Step-by-step guide on how to solve local congestion issues in Innovus :

  1. Have innovus pay attention to congestion during placement; I haven't tried it this time, so I'm not sure of its effectiveness.
  2. Set padding values in innovus for these high-density cells to reserve more space for them
  3. Directly avoid using these cells during the synthesis stage, as recommended this time.

With method 2, you would still need to configure it again, so it's simpler not to use them from the start with method 3. The trade-off is increased area, approximately by 8%.

The actual approach is to first review the standard cell documentation to identify cells for deactivation, then enter the following in dc_shell:

read_lib typical.lib
read_db typical.db
set lib_cells [get_lib_cells $lib/*]
foreach_in_collection lib_cell $lib_cells {
  echo "[get_attr $lib_cell full_name]"
}

either read_lib or read_db can be chosen. This is cumbersome because get_lib_cells usually turns out to be far too lengthy to fit on the screen.

Then configure the corresponding cells as don't use:

set_dont_use [get_lib_cells $lib/XXX]
set_dont_use [get_lib_cells $lib/YYY]

You can also use wildcards * during exclusion, but be mindful not to inadvertently exclude unwanted cells.

Based on my current testing, the results are roughly summarized in the table below. You can see that disabling cells significantly improves the utilization rate for 1P3M, whereas it's not particularly necessary for 1P4M.

Settings APR Results
1P3M without disabling cells Utilization rate must be 0.4 to achieve routing
1P3M with cells disabled Utilization rate can reach 0.64 for routing
1P4M without disabling cells Utilization rate can reach 0.77 for routing

Additionally, note that timing components like DFF don't need to be disabled; first, because DFF is not like logic circuits where complex logic can be replaced by simpler logic; second, DFFs usually have a relatively large area, allowing for wire density reduction even with multiple pins.

The above covers the basic settings needed in a design compiler. I thought of an interesting application: if you disable all logic cells except NAND, could it make the design compiler complete "Nand2Tetris" in a second? XDD

APR Density

Below are some of the settings I used while working on the innovus layout, but since I haven't written any articles related to APR, some parts might be hard to understand.

Utilization rate is sometimes called U rate in the industry, and the general recommendation is above 0.7; this time, the ultimate attempts were roughly like this:

  • 3M process, when trying up to about 0.64
  • 4M process, when trying up to about 0.77

Adding more metal layers allows for higher chip density, but it's not unlimited, as chip area is also restricted by factors like the clock tree and reset tree buffers, typically using about 10% of the chip area. If the Utilization Rate is too high, even buffers can't be squeezed in.

What's mentioned above is just the utilization rate, a global number—the ratio of total standard cell area to the entire area. Density problems, however, are more localized. Without additional restrictions, innovus tries to place nearby cells closely together during placement, which is usually fine but can result in very crowded regions with un-routable paths in 1P3M. Congestion refers to routing density. With 3M, routing was the most common factor that blocked me; I devote extensive discussion to this below.

For articles related to congestion, I recommend this interpretation of congestion reporting ; also, Cadence's official post Mitigating Congestion, CTS, OCV, and Other Challenges using Cadence Tools and Support provides relevant information.

After referring to online articles like, How to control density and congestion in PR tools , here are options that have proven effective, to be written before place and while doing ECO:

set_db place_global_max_density 0.80
set_db opt_max_density 0.80
set_db place_global_uniform_density true

The first two are straightforward, which means not letting the density be too high during placement and optimization; the second attempts to distribute global density more evenly across the layout.

These settings, in my view, are like all medications, somewhat effective but come with risks. For instance, the figures in the commands, like 0.8, are based on an initial utilization rate of 0.7. During an area squeeze with an additional 50 um reduction in height, problems occurred during the first ECO design run in innovus as the fundamental density reached about 0.8, indicating that no matter how innovus moved the cells, the density would surpass 0.8, essentially an impossible condition for innovus to satisfy.

If there's an overwhelming number of error messages during ECO, don't hesitate to Ctrl+C and restart from scratch.

Congestion

Use the following two commands to retrieve data on hotspots and overflows:

report_congestion -hotspot
report_congestion -overflow

For this successfully taped-out layout, the hotspot data is as follows:

[hotspot] +------------+---------------+---------------+
[hotspot] | | max hotspot | total hotspot |
[hotspot] +------------+---------------+---------------+
[hotspot] | normalized | 3082.49 | 5123.15 |
[hotspot] +------------+---------------+---------------+
Local HotSpot Analysis: normalized max congestion hotspot area = 3082.49, 
normalized total congestion hotspot area = 5123.15 
(area is in unit of 4 std-cell row bins)

Although I managed to use this with the above hotspot data, shrinking the vertical space by 50 um later resulted in a layout with a max hotspot of 3421.38 and a normalized hotspot of 4777.11 that ultimately couldn't be routed. In conclusion, the max hotspot might be more critical than the normalized one, as the heat could be too intense to navigate.

In 1P3M, congestion is the most serious issue. Despite not having hard macros or routing blockages, routing might still trigger metal shorts and DRC errors at the final stages.

I attempted to use the following options to pay more attention to congestion during placement but am unsure of the effect:

set_db place_global_cong_effort high

Routing Error

In 1P3M, the most frequent issue is Routing Error, due to fewer metal layers causing short circuits when routing routes past each other, as detailed in the above experience, max_density approaching 0.8 meant routing became impossible. To address such circumstances, I suggest prioritizing congestion routing over timing during innovus routing. The assumption here is that the design doesn’t significantly emphasize Timing, as more intricate designs would likely not employ such low-grade processes.

Regardless, it's natural to see numerous errors after Routing in 1P3M; if the Error count is under 10,000, there might be room for maneuver. My current record started from 7,000 errors during routing, eventually whittled down to zero via various techniques.

Normally, dealing with such few layers in routing is quick, with options only letting M1 handle non-critical paths, M2 for vertical and M3 for horizontal, rapidly resolving Routing errors during iterations.

However, exceptions exist. If Error explosion occurs (I think having more than ten thousand usually leaves no chance), or entirely unsolvable conditions appear in the layout, a single iteration can take 30 minutes, an overnight run unfinished. I recommend setting a midway stop condition at the beginning, with 5-10 rounds of iteration when compacting area and utilizing high-density layouts (especially given so few metal layers) and see the results after 2-3 hours of halting naturally. Otherwise, accumulating insoluble errors in layouts waste excessive time; only if under threat of a metaphorical gun demanding completion at the penalty of a death sentence would I advise against considering back-to-floorplan and allocating slightly more area.

Additionally, below are some ideas—take them for reference; sometimes they're indeed brilliant tricks. I believe that some are unnecessary unless absolutely necessary; a 1P3M extreme process is not frequent, but if you just happen to need to reduce errors to zero in a specified area, give them a try.

1. Ignore the errors and move forward with the Post route process

I'd suggest proceeding with post route ECO to fix timing errors. Normal routing isn't authorized to move placed cells, but the optimizer to fix timing errors may do so, inadvertently resolving wiring issues. Additionally, even if you resolve errors now, moving cells for timing adjustments may introduce new errors to fix latterly, so fixing them initially might be more efficient.

2. Trust in innovus’s craft by letting it run extensively

If the errors seem resolvable and a single iteration runs within the 5-10 minutes range, let innovus extensively attempt resolutions. Set a 30-40 round routing, more than default termination conditions, forcing innovus to attempt resolutions. I've tried 40 iterations to compact errors from over a thousand down to 33 violations, a relatively efficient approach; furthermore, if the Error count decreases, later iterations get faster and more efficient.

Personally, I don't recommend exceeding 50 rounds; first, it's too time-consuming; secondly, if there’s no solution within 50, there probably won't be beyond 50 then either.

3. Move cells that are significantly difficult to route

One situation I encountered: inside a cell was a C-shaped M1 block enclosing a contact needing a leftward connection, while the cell happened to be below a power stripe running vertically with M2, preventing direct upward connection to M2 and M3. Multiple iterations proved fruitless. Finally, I moved it to a nearby vacant spot using the graphic interface; after a few more iterations, it was resolved.

However, this has risks; I recommend doing this at single-digit Error levels. One time, more errors resulted post-move, also unsolvable. Also, moving cells means post-routing setup/hold-time analysis and ECO are a must post-move.

4. Directly remove problematic routes

Another discovery is post-innovus routes leaving unresolved Errors due to algorithms lacking existing path-jumping mechanisms, hence iterations not solving them. An effective solution tried was deleting problematic routes, assisting innovus in escaping standard thought frameworks, thus surprisingly eradicating unsolvable errors.

Practically, right-click and select Copy Name on the line, delete the net with shortcut "d", and redo ECO Route. However, note that if the lines involve significant ones, like those starting with CTS, FE (indicating clock tree routes) necessitate a re-timing post-deletion.

Routing Error Summary

Innovus rarely errs, so if errors persist post-Routing, review for impossible conditions to solve first. Before, my first attempt with 5 layers metal lacked proper blocking layer setup causing M2 ring to overlay standard cells, leading to unresolvable errors. During 1P3M, with abundant errors, I initiate ECO route settings with higher stop conditions, letting it run a few more times to compress the numbers; when errors reduce to single digits, I then employ cell relocation and routing deletions.

One notable experience is under 1P4M conditions, with the added metal layer, Routing Errors resolve quickly, with routing issues preceding those setup/hold-time related due to cell density; which might explain why innovus has a ThreeLayerMode triggering autonomously under fewer than 4-layer metal situations.

What to do with Timing Issues?

In this instance, I encountered no timing issues post-CTS but post-routing actual wiring showed timing issues due to routing considering actual load impacts. The solution is to add an ECO round post-CTS, since routing shows timing issues, it's best to set slack for the ECO target accordingly.

For example, if data slows by 0.3 ns post-routing, set the target for ECO slack from 0 to +0.3 to allocate more time for Routing to avoid the problem.

set_db opt_setup_target_slack 0.3

How to handle innovus not fixing Antenna Rule?

In this layout, an issue arose where innovus inexplicably wouldn't fix Antenna errors during routing despite detecting 5 Antenna errors, only reporting warnings without repairs. This is especially severe with Clock trees, likely because clock trees aren't routing objects, hence Antenna checks during Routing won’t correct Clock tree issues.

Online scripts offer solutions Digital IC backend implementation gems | Methods and Golden scripts for automated fixing of Antenna Violation in ICC2 and Innovus . However, I used a simpler, brute method: as above, delete the entire path that violates Antenna rule, then re-ECO route, and innovus will correct it.

This could only be expressed as:

Cadence, do better please.

Aside on innovus stylus

During producing the 1P3M layout, I studied many configurations to understand bugs, discovering innovus's newly added -stylus modes completely upheaved settings, such as in the density chapter, current -stylus settings are:

set_db place_global_max_density 0.80
set_db opt_max_density 0.80
set_db place_global_uniform_density true

The reference article states:

setPlaceMode -place_global_max_density 0.80
setPlaceMode -place_global_uniform_density true
setOptMode -max_density 0.80

Command naming methods have shifted directly from lower camel case to snake_case, resulting in numerous online documents requiring minor translations or trials afterward, owing perhaps to mysterious incentives.

Conclusion

Above is my summarized experience participating in 1P3M layout, attempting a process with only minimal metal layers; thinking back to the times with usual usage of 1P5M, 1P9M indeed seemed luxurious—leaving nothing to worry about back then—a reflection shows those APRs weren't the exceptionally crafted perfect results indeed. Sure enough, challenging extremes reveal a plethora of considerations within APR; signs and subtleties remain for oversight.

This article serves as a tribute, strictly honoring all APR engineers.