Asynchronous Logic Sign-Off Flows Beyond Structural CDC

SSO Symposium Case Study by Louis Cardillo of NVIDIA

Case Study Overview

Louis Cardillo of NVIDIA presented at Real Intent’s 2023 Static Sign-Off Symposium on how Nvidia enabled expanded its sign-off flow beyond structural clock domain crossing (CDC), to address more complex asynchronous design challenges. Meridian CDC with Simportal and Meridian RDC were deployed.

Below are lightly edited highlights of what he presented.

Synchronous vs. Asynchronous Design Logic Verification

Scale of Async logic usage

This is an 80 billion transistor chip, in Nvidia’s line that we would call GPUs.

From an asynchronous design sign-off point of view, it’s not our most complicated chip. We have SoCs with much higher clock domain counts and more varied IP sources; they have a lot more async reset domains and even more clock domains. Even though they might be a fifth or a seventh of the size of these, they’re actually quite a bit harder to sign off. Most of the flow is harder, but in particular, the asynchronous design flows are harder.

Being in a company with established flows and products means we’re not creating a new flow. We have a flow that’s existed for a long time and we’re continuously modifying it to make it better. And while we’re modifying it, we are taping out chips seven to eight times a year. There’s a huge variety of designs so these numbers are general. We have designs that have almost no async resets. We have designs that are almost all async resets

Large variation across top level blocks
10’s to 100’s of clock domain
100’s to >100K synchronizers
>1% to 70% flops with async resets

Download Case Study PDF

Synchronous paths are >99% of typical designs

Even a chip that has a lot of asynchronous logic is still 99 percent synchronous. That is a good thing because that’s much easier and more predictable; it is dependable to walk through a fairly straightforward ASIC flow and sign things off.

From a metastability randomness point of view, there’s no risk in in the normal flows if you are using single cycle signoff in a standard cell flow. Logic equivalence checking proves our netlist is the same, as long as we meet single cycle timing.

Asynchronous paths are <1% of typical designs

Those flows are very dependable and work very well for synchronous logic; but depending on how it is counted, a percent or so of the paths on the chip are asynchronous.

It’s very easy to make mistakes with asynchronous logic. Many of the assumptions people make about synchronous logic — things they just take for granted — aren’t going to be true. For example, even experienced design engineers will think of asynchronous sign-off as clock domain crossing (CDC) sign-off.

However, with CDC sign-off only, a team can get 100% coverage, but the chip still isn’t going to work unless they meet a number of other requirements. I discuss some of them below.

Click for more details

Use standard code generators. There are standard RTL code generators that we use for clock domain crossings; doing that alone probably eliminates 90 percent of the bugs — including a lot of the hard-to-find bugs. This is one of the reasons we try so hard to force designers into using these standards. There’s no reason to reinvent a FIFO. It’s been designed well, and we’ve made mistakes in them in the past that we can now avoid. But it’s also evolving; sometimes designers come up with new things, so our flows must keep up with that.
There are a bunch of special cells that we have. The main ones are the standard cell synchronizers, so we are not expecting to build synchronizers out of flops. Things like that.
The RTL simulation with synchronizer randomization is something that we’ve been doing for as long as I remember — and it’s constantly evolving and constantly getting better.
I’ll talk more about CDC, RDC, and assertion checks later. I will also discuss downstream glitch checking.
Another thing we have special flows for is the MBTF. Essentially, it’s about deciding that our synchronizers are deep enough. If we have a million synchronizers or 500,000 synchronizers on a chip, really rare events become fairly common. We have a lot of flows related to verifying that across PVT.

Meridian CDC and RDC workflows help reduce noise

I’d like to talk specifically about how engineers can use Meridian CDC and Meridian RDC sign-off flows to reduce noise.

1. Discourage waivers, encourage constraints

In general, it’s better for someone to specify that something is stable than to waive something because they know it’s stable. Tell the tool the truth and let it analyze that — sometimes there are other side effects of that.

There is a lot of value to specifying it as a constraint — the tool can sanity check it and do consistency checks.

A Meridian Simportal flow can help validate assumptions in simulation.

2. Run at different hierarchy levels

Another common technique is running at multiple hierarchy levels. We’re building workflows for RTL designers, and they want quick feedback. We don’t want to wait until chip assembly to give them feedback. We want to be able to run it at very small unit levels, then assemble those blocks as we go up to higher levels.

Someone asked me, “When do you run CDC and who runs it?” I answered, “The designer runs it. The person running regressions runs it. We run it in two different trees, the RTL tree and the physical design tree.”

This fits this whole left shift mentality. But it’s not like when we left shift, we stop running it on the right — we still run it later to catch downstream things such as DFT, ECOs. We’re just trying to find the bugs earlier.

3. Combine modes when it makes sense

Multi-mode analysis is actually really helpful and really effective for a lot of things. It matches the way our constraints look in the back end with multi-clock propagation.

But how do we get the right errors and warnings violations to the right designer?

These teams work on totally different schedules when they sign things off. We want to give them the control just to look at the bugs that they can fix.

Simulation Checks for CDC & RDC Assumptions

RTL simulations can be augmented to help verify CDC & RDC constraints & async logic behavior

There are some general principles that we’ve started to use more and more over time. One of these are anchor points in the RTL — we can tie special simulation behavior to them. When our back-end team looks at a violation, they can see that it is on this anchor point that existed in the in the front end.

Pointing them to the front-end analysis of the issue helps them deal with noise. Was it waived in the front end? Why was it waved? Who waved it? We used to and still do communicate a lot of this through emails and bug reports, but this is a way to tie it into the design. It’s part of our RTL. It’s part of our netlist.

Meridian Simportal generates simulation checkers to help verify CDC/RDC assumptions

Meridian Simportal is an extension of this idea. Simportal takes things where we don’t have anchor points and helps to answer why something that has passed CDC sign-off still fails.

For example, if a person said a signal was stable in a file in one place in one design review and we use that in CDC to propagate and knocked out 20,000 warnings .. what if that signal is not stable?

Asynchronous Resets

Async resets cause an explosion of async edges

We’ve been doing RDC-like checks for quite a while, but a lot of what we did in the past were straight structural checks or design rule type checks.

We evolved into wanting to treat our RDC tools and flows more like we do CDC.

Resets are dangerous — and they’ve become a lot more dangerous because of IP reuse. Designers would think “If I get through power-on reset on my chip, that’s all I have to worry about.” But now they put that IP on another chip and hook them together. Do this with a thousand IPs and they’ve got power-on resets that happen once, but a lot of these blocks are reset hundreds of times a second all over the chip. There are resets going on all the time, for different reasons for different chips.

Resets can leak from one block to another if only one is reset one at a time.

In a nutshell, if it was a power on reset, they didn’t worry so much about the assert edge. But in in a chip with hundreds of these domains, the assert edge of reset is actually really dangerous. It asserts and all those flops are pushed into their state — and there is just a crazy explosion of asynchronous edges that are everywhere in the design. Every flop is sending out these asynchronous edges. If they hit any downstream flop that’s not also being reset and it’s being clocked, there are more async edges. Ultimately, the danger leaks out of the block that was intended to be reset.

There are many ways to try to stop this. (click for details)

There’s the equivalent of power domain clamping we can do with reset clamping
Reset sync sequencing so we don’t have to worry about one of the directions in that diagram if we know which reset is dominant, i.e., which one is always set while the other one is set.
Stop clocks so that’s another thing that we use a lot
Change a lot of logic to synchronize a reset.

All those things are fine. But no matter how hard we try to make a dependable trustworthy correct by construction design, we must then verify it.

RTL bugs found with Meridian RDC

That’s why we use RDC sign-off flows. Meridian RDC has grown for us; we’re using it much more now — in many cases everywhere that we run CDC.

It’s different because it’s within a clock domain, so even some things that you might not run CDC sign-off on you could still want to run RDC on if you have multiple reset domains.

These are the kind of bugs that we’ve actually found using Meridian RDC. A number we might find later — there’s also a left shift thing going on here — finding it early in RTL is better.

Common things we find are leaking async paths that are coming out, or you can have race conditions between resets, and see what they’re resetting. And then late in the design cycle we have ECOs; so, finding it early becomes quite valuable, so we don’t either miss it or not find it until the back end physical design.

How to Not Break a Chip

I will wrap up with a few broad statements on how to not break a chip from an async interface point of view.

Use standard plugins whenever possible, don’t reinvent the wheel for async interfaces
Run simulations with randomizing synchronizer models
Run CDC and RDC early and often, and after late ECOs
Check CDC and RDC constraints with simulation assertions

Don’t fall for “these async paths worked on the last chip so the design must be fine”

These are easy mistakes to make if you’re used to a synchronous type of world. People will say, “Well, it worked on the last chip so it must be fine.” That’s almost definitely not true for any async path. It proves almost nothing to make that statement — it’s like saying that you didn’t have a hold problem at this voltage, so it must be okay.

The async paths might work or they might not work. But they’re not going to work dependably unless you’ve checked them. So, in the next chip, when the delays change a little bit, the async paths won’t work if you don’t check them.

Don’t assume that equivalence checks will flag glitchy logic optimizations

Then one we actually see quite a bit — and I think it’s because it’s pretty good for synchronous paths. It’s not a terrible statement for synchronous paths, but it’s certainly a recipe for non-working chips for async pads.

Another is sort of a glitch logic that I think in a synchronous world people are used to the idea that if they can prove the netlist is the same as the RTL, they’re done — nothing can be bad. And glitches are not a problem with static timing, because a glitch is resolved before the end of the cycle. Whether you have a glitch or not, your chip still works.

Those assumptions are not true here — neither one. The glitch can break asynchronous logic and the logic can be 100 percent logically equivalent and still not work even though the RTL was fine.

Don’t assume that correct async logic in synthesized netlists stays that way in layout

Another related one is assuming that if something’s okay in netlist it’ll still be okay in layout.

Glitches are probably the prime example of that, but there are other things too. For example, if an engineer runs skew checks across various buses, and things in an early stage or in one PVT, does that mean it will okay in layout?

Those assumptions, even in synchronous designs, are not always true now, unless they are taking into account cell versus wire delay variation and things like that.

But they can come up with rules of thumb that’ll work for that for synchronous paths a lot easier than async.

Don’t assume that correct async logic in synthesized netlists stays that way in layout
Check that synchronizer depths will meet MTBF goals across PVT
Run async timing checks to cover async skew assumptions across PVT