Western Digital:  Advanced Clock & Reset Verification of Complex Memory Controller SoCs

Executive Summary: DAC 2020 presentation by Mukesh Panda, Western Digital

Case Study Presentation Overview

Mukesh Panda of Western Digital presented a case study on Western Digital’s clock and reset static verification methodology utilizing Real Intent CDC static sign-off tools.

Mukesh Panda presented results showing how the new advanced sign-off methodology enabled them to exponentially reduced noise and debug effort for CDC static sign-off by making Real Intent’s CDC tools aware of Western Digital’s clock and reset architectures.

Because their new approach only flagged real violations, Western Digital was able to catch corner-case reset synchronization issues and successfully signed-off its complex memory controller SoC.

Motivation & Goals

Mukesh presented a recent study showing that the 2 main flaws contributing to ASIC design failures are:

  • Functional bugs
  • Clocking

In particular, clocking and reset aspects are very complex for a memory controller.

Western Digital’s memory controller SoC has multiple hosts with asynchronous clocks and multitudes of resets. Thus, they needed an accurate methodology to verify complex interactions between clocks and resets to avoid ASIC respins.

Western Digital’s Clock & Reset Architectures

Western Digital has a central unit that:

  • Manages the clock and reset distribution across their IPs.
  • Enables or disables the clocks based upon the reset architecture requirements.

Their memory controller SoCs utilize IP from multiple sources, both internal and third party.

Their approach is to group the IP based on the SoC’s reset architecture requirements.

The incoming resets to an IP are typically asynchronous to the incoming clock domains. To eliminate metastability issues, the central unit must thus ensure that the resets and clocks are enabled or disabled at the appropriate times.

Clock & Reset Verification Challenges

With regards to their central unit which manages clocks and resets across the SoC, Mukesh said that verifying that the complex clocks and reset interactions do not lead to any metastability or CDC issues is a major challenge.

Mukesh then reviewed the specific functionality of the unit more deeply.

  • It receives a free incoming clock.
  • During the assertion of the reset to a specific IP, the unit disables the clock for this IP.
  • Because the clock is disabled, they can ensure there are no metastability issues — even though reset is asynchronous to the clock.

The challenge with this approach is that if the CDC static sign-off tool is not aware of the above architecture, it can start producing a large number of violations that are just noise.

It would close to impossible to verify all the permutations and combinations using dynamic verification. So, Western Digital believes that static sign-off technology is the best way to comprehensively verify this kind of architecture for reliable sign-off.

Their best approach is to enhance the CDC methodology to understand these architectures.

Enhanced CDC Methodology & Results

Western Digital worked towards enhancing their CDC methodology to handle the various clock and reset scenarios.

To do so, they created architecture-specific assumptions and constraints, such as:

“For example clock through clk_blk_a is off when reset through rst_blk_a is asserted”.

Western Digital partnered with Real Intent to enhance Real Intent’s Meridian CDC tool to understand these architectural assumptions. Independently, they worked on formal and dynamic means to verify the above assumptions and constraints.

This enhanced methodology allowed them to identify whether there were any incorrect reset synchronization issues.

By reducing the noise significantly in the CDC analysis, they could focus on actual violations. It also allowed them to catch reset & clock connectivity issues when the IP didn’t connect to the correct signals from the central unit.

They signed-off on using this methodology the first time. The design was

  • ~6 million gates
  • Had ~64 blocks with different resets

Without the enhanced methodology, Western Digital  used to get around 800K reset related CDC violations on this SoC. With the enhanced methodology:

  • They had zero violations
  • Missed none of the issues

Western Digital also used netlist simulations to confirm there were no real reset related CDC violations on the design.

Conclusion

The clock and reset architecture complexity is increasing for Western Digital’s SoCs. Their WDC memory controllers have non-trivial reset and clock interactions.

Mukesh needed enhancements to the design team’s incumbent CDC methodology to improve efficiency.

Western Digital collaborated with Real Intent to develop and enhanced clock and reset verification methodology to make the Meridian CDC static sign-off tool aware of Western Digital’s clock and reset architectures.

The result was they were able to:

  • Exponentially reduce the CDC sign-off noise and debug effort.
  • Catch corner-case reset synchronization issues because only real violations were flagged with the enhanced methodology.

Western Digital verified their assumptions using formal and dynamic means — and signed-off successfully on the chip.