Posts Tagged ‘RTL testbench’

Power7 Verification: It’s Not Rocket Science (It’s More Advanced)

Thursday, May 20th, 2010 by saturday

By Hemendra Talesara

Complexity

In his recent presentation discussing verification of the Power7 processor, John Ludden of IBM opened with a quote from an IBM exec more than a decade ago. “it’s not rocket science”- a perception held by some members of the management and design communities at that time.

However, designs have become a whole lot more complex over time. The Power7 processor at 45nm has 1.2B transistors on a 567 sq. mm die, supporting 8 cores with 4 threads each, an on-chip eDRAM, 3 levels of caches and 2 DDR memory controllers. Yet as verification complexity multiplies in this multi-threaded design, it’s very helpful to have some of the more advanced tools and methodology at your disposal.

Tools and Methodology

Fortunately for Ludden and the Power7 team, IBM has invested in verification technology for years (in spite the quote from the exec). The company continues to develop and rely on in-house tools for many of the advanced verification technologies for processor-specific testing. These include the test-bench, multi-thread test generators, hardware accelerators, formal and semi-formal tools, micro-architecture checkers (API based), cache coherency checkers and coverage tools. Exercisers
originally developed for post-silicon validation were used to exploit the hardware acceleration platform. Forty-five thousand coverage points were organized to assist with big picture and were used to re-direct the test generator and exercisers for accelerators.

To support corner case testing for events that occur rarely, especially in multi-threaded scenarios, software irritator threads were used. These irritators are capable of creating the worst possible contentions. Through their application, twenty-three high quality bugs were revealed hiding in the corners.

A methodical application of these tools and technology clearly captured and advanced the industry best practices.

Designing for Verification

Designing for Verification was an important element in managing the overall risk to verification time line. IBM minimized the risks by maintaining a tight interaction between the specification and verification teams during the design phase and allowing the verification team to maintain architectural changes. “Chicken switches” were placed in silicon that allowed verification team to back-off an area considered risky or possible of otherwise compromising the verification effort. These switches provide workarounds, with some small impact on performance but no functional change, for accessing difficult to verify micro architectural features. Hardware irritators were also used to enable stress testing of corner cases in both pre-silicon and post-silicon testing.

Conclusion

The Power7 draws many architectural features from the Power5 and 6 designs, although it is a much more complex and powerful processor with a much shorter verification cycle. Ludden and the Power7 team accomplished this remarkable feat with a lot of foresight in planning, metrics collection and careful execution. Tight interlocking between metrics collected and verification plan was key part of tracking mechanism and functional closure. This project should serve as an example of how to plan for and manage risks in a complex verification project.

Kudos to John and the IBM team. His full presentation can be downloaded here.

Knowing When Verification is Complete

Friday, March 27th, 2009 by admin

Introduction

This article presents an overview of functional design verification using a coverage driven methodology while attempting to answer the question of how much testing is enough. The part being verified in this case will be a general purpose microprocessor, such as those found in mobile computing devices. Note that an approach of this magnitude is not always required. Designs with very limited instruction sets or highly restricted functionalities may be satisfied by simply writing directed assembly code tests to verify their intended functionalities.

Comparison of Simple and Complex Architectures

Figure 1 depicts a simple architecture as compared to a complex one. Note that the number of corner cases and unpredictability of the verification space increases as the architecture gains complexity. Thus, the complexity of the architecture determines how much testing will need to be accomplished to properly verify the component’s function.

Figure 1. Comparison of Verification Spaces

Comparison of Verification Spaces

Measuring Verification Progress

Coverage metrics are the dominant method for measuring verification progress in the industry today. Coverage points are normally designated by the design engineers looking at the logic of their block and by verification or system engineers looking at the functional definition of the part. Both of these are critical insights into the required verification coverage of the design.

Coverage points, indicated by the red dots in Figure 2, are deliberately chosen with respect to placement and density according to design knowledge and risk assessment.

Figure 2. Distribution of Coverage Points

Distribution of Coverage Points

Directed Testing

In the past, directed tests were typically written to hit coverage points. Because directed tests are by their very nature highly targeted and relatively inflexible, this resulted in much of the design not being tested as is shown by the ratio of red to gray in Figure 5. In addition to the low overall coverage that results from this approach, creation of directed tests is time consuming and requires highly skilled engineers. In this approach, testbench checkers that detect hits to coverage points are often overlooked with the assumptions that the engineers writing the tests know how to hit the required coverage points and that human errors will not be significantly problematic. In addition, as the design changes over the course of development, the directed test may lose track of its target coverage point. Without coverage monitors, these types of errors will not be detected and the design will not be as thoroughly verified as it appears to be on paper.

Using a Random Test Generator to Close Coverage

As processor designs became more complex, the need to hit more coverage points became apparent. Once the grid has been established, large numbers of purely random tests may be incorporated to begin closing coverage. Some of these tests may hit points on the coverage grid while others will not.

Figure 3. Intersection of Coverage Grid and Pure Random

Intersection of Coverage Grid and Pure Random

Approaching the problem of hitting coverage points from a random test generator viewpoint, a single engineer begins by writing a few generator templates and then generates tests using those templates. The generated tests are then run on a testbench which incorporates coverage monitors. The coverage monitors report all coverage points that are hit by the tests. As long as tests generated from the templates continue to hit new coverage points, the templates are kept in the nightly suite. As the rate of hitting new coverage points declines, new generator templates are created to target coverage holes. This approach requires skilled engineers to write checkers for the testbench but less skilled engineers to run the test generator.

Directed-random templates are created around points not hit by the purely random templates. We now begin to see the coverage grid closing more tightly (around 95%), and the verification process comes closer to completion.

Figure 4. Coverage Grid, Directed Random and Pure Random

Coverage Grid, Directed Random and Pure Random

Hitting Corner Cases

Not all coverage points will be hit by fully random or directed random templates. Some coverage points require a long series of events before the targeted behavior takes place. In this case, there are two possible approaches: write directed tests and write directed templates. Probably both of these approaches should be used. Directed tests can get to these most difficult coverage points more quickly but prove only one or a few cases around that point. Directed templates take more time to create but can be written to allow as much random behavior around the coverage point as possible.

Figure 5. Review Templates and Relax Restrictions

Review Templates and Relax Restrictions

Finally, existing tests are reviewed, and as much directed behavior as possible is removed before the tests are run again. Coverage then reaches full closure, and these tests are run until the schedule no longer permits.

Processor Design Verification Overview

Friday, February 20th, 2009 by admin

Anyone who has worked on a microprocessor design in recent years knows that verification has become a larger and larger share of the effort to bring a product to market. Designs are becoming increasingly complex and this complexity is often magnified in verification. Since completed verification is always the final stage in the design cycle, the additional time, effort and resources required are in the critical path to get the design out the door. For this reason, selection of verification tools has a direct effect on the total cost of the design and on meeting the time-to market criteria.

Through a series of upcoming postings, we will seek to outline several usage models and technologies while highlighting their strengths and weaknesses and providing a platform to judge the best verification strategy for a given product.

The goal of verification is to achieve bug-free first silicon on schedule. However, determining the absence of implementation flaws is no easy task; engineers must find an undetermined number of design flaws in an infinite space.

Determining when verification has reached completion is also a difficult task. This is typically based on heuristics, statistical measurements and experience. On all but the simplest designs, there is no point where the verification lead can say that the design has been completely tested and is known not to contain any hidden errors. Instead, verification engineers rely on proven methodologies, apply legacy test suites, and create sample applications to simulate real world conditions to the best of their abilities.

Functional verification typically involves running a large number of assembly level tests in RTL simulation. Figure 1 depicts a simple environment in which an external stimulus is applied to the device under test (DUT). The more random tests that are run in RTL pre-silicon, the greater the chance the DV team has of finding all the bugs.

Figure 1. A Simple Verification Testbench Environment

A simple RTL Testbench Environment