Rivos’ Reset Domain Crossing Methodology for AI LLM chips

By Mark Pearce, Principal MTS, Rivos

Overview

Case study on Rivos’ reset domain crossing methodology for their data center chips for AI LLMs and data analytics using Real Intent Meridian RDC.

Why Reset Domain Crossing (RDC) Sign-off is important to Rivos

Rivos designs data center chips for AI LLMs and data analytics. Our chips have a large number of reset domains with many complex interactions. We use Real Intent Meridian RDC to sign-off that we do not have any reset domain crossing errors.

Reset domain crossings can occur when flops controlled by one reset interact with flops controlled by another reset. If both flops are reset simultaneously, all flops in the path reset together and there is no problem. However, if the destination flop is in a different reset domain and not in reset, that flop could go metastable or corrupt its state, potentially bringing down the system.

Rivos’ chips have RISC-V cores, GPUs on the device, DRAM, and Ethernet. They all have multiple resets. For example, they go through a sequence of resets on boot: the main boot core has one reset, then releases other parts of the chip from reset—all with different resets. Our chips also have test reset domains and other specialized resets. We must test all interactions between the domains.

Reset Domain crossing

Reset Domain Crossing

The reset sequence is critical – one reset being asserted could corrupt a flop in another reset domain. Without proper RDC verification, the system could become unstable, fail to boot, or be unable to perform warm resets.

Rivos’ Hierarchical RDC Sign-off Methodology

RDC sign-off is part of Rivos’ RTL sign-off process – we run Meridian RDC at both IP and chip levels. Meridian RDC’s hierarchical design flow analyzes lower-level blocks independently, then uses abstracted models at higher levels for efficient chip-level verification.

Rivos ensures its designs are CDC clean before running RDC, as CDC errors can corrupt RDC analysis.

We hand off our RTL to backend multiple times to validate the design, but final sign-off requires RDC closure; we do allow some waivers. For RDC closure, Rivos has its design, architecture, and firmware teams review reset scenarios, violations, and applied constraints.

This cross-functional review validates that the constraints match our actual system operation.

Reset Domain Crossing methodology2

Setting up Meridian RDC

Meridian RDC setup

Our initial Meridian RDC setup takes only a couple of hours, as it leverages Rivos’ Meridian CDC setup. Auto-generated input decks speed up the process significantly because each block has many ports. Every I/O must be defined, including its clock domain and reset domain. Creating all those by hand would be very time-consuming.

Rivos then starts running the tool – it takes about a week to fully verify and clean up the setup.

Part of debugging our results involves determining whether issues stem from setup errors or actual RDC violations. Real Intent’s iDebug tool is extremely helpful for this debugging process, as it separates setup errors from RDC errors into distinct categories and guides our engineers toward the solutions. Setup errors should be cleaned first.

Adding scenarios to Meridian RDC to improve precision

Reset scenarios determine whether two resets are active simultaneously when flops on one reset domain drive flops on another reset domain. If the destination flop is already in reset, there’s no problem—it’s held in a known reset state and won’t go metastable. If it’s not in reset, there could be a problem. The reset scenario distinguishes these cases and only reports violations when the destination reset is not asserted.

Meridian RDC provides default reset scenarios. Rivos augments these by providing the tool with specific scenarios that define sequences of reset assertions and de-assertions. This dramatically reduces violation noise as we only validate reset scenarios that will actually be executed in working silicon.

Exclusivity Constraints

To further reduce report ‘noise’, Rivos can define constraints in reset scenarios, so that the tool doesn’t assume worst-case behavior to propagate errors. We use reset scenarios to specify constraints such as reset ordering or exclusivity such that one reset won’t assert while another is active.

For non-reset signals, Rivos can also define exclusive signals in the environment setup — signals that never toggle simultaneously. When one toggles, the other remains stable. While Meridian RDC’s logic analysis often determines signal exclusivity automatically, we can also explicitly specify it as an environment constraint when needed.

RDC scenarios - exclusivity constraints

Mode-Specific Scenarios

RDC methodology - mode-specific scenarios

As part of its reset domain crossing methodology, Rivos implements exclusive signal constraints in reset scenarios—signals that never toggle simultaneously. For example, functional mode and test mode signals are mutually exclusive.

During functional mode scenarios, test signals remain stable. For DFT mode analysis, Rivos runs separate scenarios with functional signals de-asserted and test signals active. Meridian RDC then identifies mode-specific reset domain crossing issues.

Blocking Signals & Constraints

Rivos implements blocking signal constraints in its designs. One way to fix an RDC issue is adding RTL logic that sends a blocker signal before asserting reset A when a path exists from a flop on reset A to a flop using reset B. This blocking signal gates the data path between the two reset domains.

Reset scenarios validate these constraints, and Meridian RDC verifies the blocking signals adequately protect all paths. The tool propagates constraints from pins through flops and synchronizers, revealing setup errors when engineers accidentally connect the wrong reset or when clocks aren’t propagating.

RDC methodology - blocking signals

Defining Reset Scenarios

Defining reset scenarios

Setting up scenarios as part of our reset domain crossing methodology is straightforward – Rivos’ designers understand the valid reset sequences for their blocks. Rivos works with architects and firmware teams to define the specific sequences our firmware will use.

Reset scenarios define reset assertion/de-assertion order and system constraints: clock states (on/off), blocking signals, and signal requirements like test mode signals. Multiple scenario types exist for different use cases (boot, warm reset, etc.). Rivos not only validates these real-world scenarios for RDC closure, but also individual scenarios per reset.

Reset Domain Crossing Bugs Found

Clock Gater Bug

Clock gater bug

Meridian RDC identified an incorrectly reset clock gater (gating cell) bug where an asynchronously reset flop drove the clock gater enable input. When asynchronous reset asserts, the reset flop output toggles independently of the clock edge. Since static timing analysis operates clock-edge to clock-edge, this created an untimed path that can propagate metastability.

When this untimed path drives a clock gating cell, it creates a clock glitch. Even though downstream flops aren’t being reset, they receive a corrupted clock signal – potentially a short pulse or glitch. This can cause system failures. Our reset methodology requires using resets whose assertion and deassertion are both synchronous to the clock being gated. The tool found cases where this methodology was not being followed correctly.

Blocking Signal Bug

Blocking signal bug

Meridian RDC identified cases where incorrect RTL prevented the blocking signal from functioning properly. The tool flagged these issues by showing the constraint couldn’t propagate to protect the affected paths.

Mode Exclusivity Error Detection

Meridian RDC identifies violations where functional mode and test mode signals aren’t properly constrained as mutually exclusive. By defining these signals as exclusive and running separate scenarios for each mode, the tool catches reset domain crossing issues specific to DFT mode that wouldn’t occur during functional operation.

iDebug for Debugging

Real Intent’s iDebug tool is invaluable for debugging RDC violations. Its guided wizard suggests potential fixes, typically offering three options to consider. Each error type includes detailed subcategories—allowing engineers to pinpoint the exact variant of the issue and access targeted guidance rather than generic error descriptions.

iDebug links directly to relevant documentation for additional clarity. Below are some of the specific elements we find useful.

Schematic Visualization

iDebug displays pruned schematics showing the violation’s source and observable endpoint, with lightning bolt icons marking problem points. Our engineers can quickly determine whether issues stem from tool setup errors, actual RDC violations, or constraint propagation problems.

RDC schematic visualization - iDebug Meridian RDC

The schematics are expandable, so we can trace signals backward through the design rather than navigating a flat hierarchy searching for specific flops. This significantly accelerates root cause analysis.

Waveform Debugging

Meridian RDC generates waveforms showing the sequence of events leading to each violation. While these aren’t true waveforms from simulation, as Meridian RDC is a static tool, they represent key signals to enable step-by-step debugging of the issues.

Reset domain crossing methodology Waveform debugging - idebug

These waveforms help our engineers determine whether violations indicate actual design bugs or setup errors. For example, if a waveform shows an impossible sequence, such as two mutually exclusive signals asserting simultaneously, the issue is a missing constraint. If the sequence is valid, it reveals a genuine RTL bug requiring design changes.

Environment Setup & Pin Constraints

RDC requires defining the environment for input pins, similar to CDC verification. In our flow, each pin is assigned a clock domain and reset domain. The pins may also need additional constraints, for example: stable, constant, or exclusive.

Meridian RDC automatically infers the relevant relationships when constraint logic exists within the design being analyzed. However, for logic driving input pins from outside the analyzed block, our engineers must explicitly define pin behavior. Mode pins that select different operating modes, for example, often require specification as stable or constant during functional operation.

Grouping Violations

Grouping RDC violations - idebug Meridian RDC

Meridian RDC groups violations by root cause within the iDebug window. The text file reports may contain hundreds of items with the same error ID – all representing variations of the same underlying issue, either starting from or converging at a single metastable or observable flop.

iDebug dramatically reduces the debugging workload by consolidating these into violation groups – one fix may resolve all error instances in the group.

Spreadsheet-Style Filtering

iDebug’s spreadsheet-like interface displays violations in sortable, searchable columns. Our team can filter by source flop, destination flop, reset domains, logic regions, etc… This allows strategic debugging, such as isolating all violations between specific reset domains or within particular design blocks.

This filtering capability is essential when initial runs produce high violation counts. Our engineers systematically narrow the list to determine whether each violation represents a setup error or a genuine design bug requiring RTL changes.

Reset domain crossing violation filtering

Conclusion

Our overall view of Meridian RDC is that its analysis is very good and the value it provides is high.

Real Intent treats us like a partner. We’ve run experiments on Rivos designs, sharing run-time and memory usage statistics to help Real Intent optimize their tools. The resulting updates have significantly improved our experience with their tools.

Further, we’ve had great application engineering support. We always get a response within 24 hours — usually less. This fast response rate is important to us given our tight delivery schedules.