Enabling Lint/CDC/RDC Sign-Off on All Check-ins
for RISC-V inference chip

SSO Symposium Case Study by Raj Khanna of Esperanto

Case Study Overview

Raj Khanna of Esperanto presented at Real Intent’s 2023 Static Sign-Off Symposium how Esperanto enabled linting, clock domain crossing (CDC), and reset domain crossing (RDC) sign-off on all their RISC-V inference chip check-ins.

Below are lightly edited highlights of what he presented. Real Intent Ascent Lint, Meridian CDC and Meridian RDC were deployed.

The video has the following chapters:

  • Esperanto AI chip
  • Challenge: vicious static checking circle
  • Esperanto’s improved RTL Linting/CDC/RDC flow
  • Esperanto static sign-off methodology results

Esperanto ES-1 RISC-V Core

Esperanto has essentially made an AI accelerator using RISC-V cores.

  • Slide 1: Esperanto’s AI Accelerator
  • Slide 2: Integrating over 1000 64b CPUs on a single chip
  • Slide 3: Esperanto RISC-V based ET-SoC-1 Performance per Watt

[Toggle the slides to view Esperanto’s technology background graphics and explanations.]

Challenge: Vicious Circle of Static Checking

We’ve done static sign-off on many chips; the challenge I what I found on past projects is that we tended to be a bit hesitant to do static checks because: they could run kind of slow; they took a long time; and they created a lot of noise.

What happened is this tendency, especially for large designs, to run it infrequently. We put it off and did it a bit later. That led to long reviews. And by that time, we were getting really close to tape out.

By the time we got there, we had long reviews and managers would stand in the designer’s cube and say, “When are you going to be done with reviewing those million warnings and errors that you’ve got from lint?”

With so many violations and waivers to be written, we would then add automation around the waivers to try to handle all of them. And then the automation could go wrong and create errors.

Challenge: Vicious circle static checking

This was a vicious loop that I’ve seen in many projects. The method was high effort. It could actually impact the schedule and can actually be very error prone – things can get missed. So clearly this was a problem to be solved.

You can see that this this method is actually high effort. It can actually impact the schedule and can actually be very error prone if you miss something. So clearly this is a problem to be solved.

Esperanto’s Improved Lint / CDC / RDC static sign-off flow

Lint-CDC-RDC static sign-off flow

We were actually able to solve this problem, I think pretty effectively, using the latest [Real Intent] tools. Essentially, if you have static checking tools which are relatively no low noise and fast run time, then what we can do is actually enable lint checking and CDC and other things essentially on all check-ins.

If you are familiar with the concept of “always alive” — we’ve now extended that to “always correct” — which is amazing. All of our check-ins have static check runs that leads to incremental reviews, rather than giant piles of violations.

The feedback is direct to the designer who just wrote the code, so everything is fresh in his mind. He gets immediate feedback and is able to fix it.

And now we’re in a situation where we can actually end up with no waivers and no errors because very often it’s actually easier to fix the problem than to go analyze it and put a waiver.

Now we have the ability to fix it right at the source — by the designer who created the problem. What helps us even more is having a quick way to debug and actually visualize what the issues are, so they can be fixed.

Then as we go up the hierarchy, we add CDC, we add RDC, and again it works sort of seamlessly with our sort of the “always correct” concept.

Project Resource Impact

From a project resource standpoint, a recent project that I was on we actually spent probably a month in in a critical part of the project, essentially doing all the static checks, and getting through waivers and so forth.

It was probably a couple of engineers, full-time. And a large part of the design team had to get involved.

All of those the resources and time are pretty much saved now — because we do it incrementally and most of the issues can be solved by the designer in five or ten minutes if they’re presented up front.

Results: Esperanto’s Improved Static Checking Methodology

When a designer decides to check in his work, say at the end of the day, that’s when static sign-off would get run. And depending on if he’s checking in a module block or full chip, whatever has changed will trigger different checks and so we run it essentially at every check-in.

RTL Linting

When a designer decides to check in his work, let’s say at the end of the day, that’s when he’d run static sign-off. And depending on whether he’s checking in a module block or full chip, whatever has changed will trigger different checks. So, we run it essentially at every check-in.

The Ascent lint RTL linting tool was actually very helpful. It found a lot of the things that we expected — signals with mismatches, enumeration types driven to zero, etc.

Many of these things, our synthesis tool would find — and give us an error and crash; then the physical design guy comes to us and says, “Hey fix this”. And some of these issues actually don’t even show up in that loop until much later because the synthesis tools will just work around it; but then it goes to place and route, and there’s a DRC or something later that catches it.

That’s a loop that that we now can largely avoid. So, this is actually extremely useful, and obviously handling them up front is the best thing.

We were able to do this because Ascent Lint runs in about five minutes versus another tool that we used that took more like an hour. An hour is really not acceptable for check-ins but five minutes is no problem.

Ascent Lint

Meridian CDC Meridian RDC

Clock Domain Crossing Sign-Off

With the Meridian CDC tool we found some missing synchronizers.

I think the best part of the tool was that there is a hierarchical flow. This meant we didn’t have to go and create black boxes, which can create other errors and other things that need to be reviewed. So, that was actually a really good part of the CDC flow.

Reset Domain Crossing Sign-Off

We ran Meridian RDC very late, but we did find some issues. We found things that looked like fundamental problems.

We actually went back and changed our reset distribution to get rid of many of those issues. It took about a month for initial setup, and about a week to review the results.

Areas for Improvement

Now that we have this improved static sign-off flow, we run it pretty much all the time. So, there are still a few things to work on. We found that reading constraints for a very large hierarchical design was a little bit slower than what we had wanted.

For the future I would like to see a voltage-aware RDC. We actually have many voltage domains on our chip. There’s a reset network that runs through many of them and we have to be very careful — especially on power-on resets — that all the power domains are actually sequencing properly in order to be able to get the job done.

Summary

To sum up, this is the game where we want to try to have the most efficient inference on the planet.

We’re proud of the fact that you can actually just use out-of-the box RISC-V tools to program our machine, which is unusual in the AI World. Often it takes a lot of very low-level coding to be able to do this, and to be able to get this general-purpose programming model and very efficient inference is really our crown jewel at this point.

We were first-time correct because we used static checks, and we used a lot of simulation and all the advanced simulation techniques.