Nvidia: Static Sign-Off Best Practices

DAC 2019 panel presentation by John Busco of Nvidia (edited transcript)

Case Study Overview

John Busco of Nvidia presented a case study on Nvidia’s static sign-off methodology and best practices, spanning RTL Linting, RDC and Single mode & multimode CDC, and RTL and gate-level CDC.

Download NVIDIA Best Practices PDF

Nvidia Chips Push PPA Limits

I think you all associate Nvidia with computer graphics.  We invented the GPU, or Graphics Processing Unit.

Basically, these are among the largest chips that are done in the industry.  We’re always pushing the limits of performance, power and area.

Accordingly, we need very high capacity, very good performance, and very good accuracy from our EDA tools.

We take chips like Volta and we apply it to a number of different markets. Graphics is our bread and butter, and we’ve started doing in hardware which is sort of the holy grail of computer graphics.

The architecture of a GPU is also applicable to several other large and fast-growing industries, such as supercomputing, deep learning for artificial intelligence, and in the future, for driving autonomous vehicles as well. We’re getting traction in all these markets.

When & Where Static Sign-Off Tools Are Used

This is a schematic representation of a chip design cycle, showing when we use various static sign-off tools.

The idea is that we’re using different combinations of the tools in different phases of the design process. When you’re starting with an RTL module creation, what you can do at that point is run lint. 

As the chip comes together and the block is more functionally complete, you can start to run CDC (clock domain crossing) and RDC (reset domain crossing).  And then as the different blocks are assembled, you can run multi-clock CDC and RDC. 

And finally, before tapeout you want to run a gate-level check as well. As you can see, we’re applying static sign-off tools continuously throughout the design cycle.

Problems Addressed by Static Sign-Off

So why do we do this?

Lint is great for an early vetting of the RTL. It’s quicker and easier to get feedback than running simulation or synthesis.  

CDC gives us complete coverage to ensure that our interfaces are clean. We try to minimize the potential for CDC problems by having strong design guidelines, using golden IP, and certain simulation techniques.  However, as Prakash [Prakash Narain, Real Intent] mentioned, those techniques do not cover everything; we need a solution to do so.

RDC is a similar situation. The class of bugs that can be quite subtle and quite hard to find — and if any escape into silicon it can be hard to debug. 

So again, RDC tools provide a solution to catching any problems with your reset synchronization.

Static Sign-Off Successes

Here are just a few examples of types of problems that these tools can find.

On the lint side, it ranges from very simple things to looking at more corner cases of your case statement usage or arithmetic widths — or more subtle problems that might even be missed by simulation, such as self-determined expressions where unless you have the right test case in simulation you might miss it, and yet a lint tool can flag them.

On the CDC side, it can verify all the clock-domain crossing interfaces in your design. By running it after assembly — after all the blocks are put together — you can catch things that may have changed post-synthesis.

In this example, we inserted pipelining registers in our design — perhaps that was done by an in-house tool.  We wanted to make sure that those registers are clocked correctly. So, the CDC tool would catch that at that point.

Finally, why do you run it at gate-level? Things happen during the design closure process. There may be more DFT logic going in and there are ECOs that are changing timing or function. So, you do a final check at gate-level before sign-off.

Best Practices

Best Practice #1 – Automation & Enforcement

To apply these tools, some of the best practices that we recommend, much as Hamid [Hamid Shojaei, Google] mentioned, we have a lot of emphasis on automation.

We want to make it easy for our designers to run things in sort of a push-button manner, and we want the grading of the results to be clearly “Go” or “No go”.

We post it to a dashboard so the chip management and design teams can see whether they’re clean as far as static sign-off. All of this is very automated and run frequently.

What static sign-off is not is someone just running CDC one time, saying it’s okay and telling their manager.

It needs to be repeated and continuous so that you know that no bugs are slipping in.  Our automation provides that capability.