Posts Tagged ‘Complex Architectures’

Power7 Verification: It’s Not Rocket Science (It’s More Advanced)

Thursday, May 20th, 2010 by saturday

By Hemendra Talesara

Complexity

In his recent presentation discussing verification of the Power7 processor, John Ludden of IBM opened with a quote from an IBM exec more than a decade ago. “it’s not rocket science”- a perception held by some members of the management and design communities at that time.

However, designs have become a whole lot more complex over time. The Power7 processor at 45nm has 1.2B transistors on a 567 sq. mm die, supporting 8 cores with 4 threads each, an on-chip eDRAM, 3 levels of caches and 2 DDR memory controllers. Yet as verification complexity multiplies in this multi-threaded design, it’s very helpful to have some of the more advanced tools and methodology at your disposal.

Tools and Methodology

Fortunately for Ludden and the Power7 team, IBM has invested in verification technology for years (in spite the quote from the exec). The company continues to develop and rely on in-house tools for many of the advanced verification technologies for processor-specific testing. These include the test-bench, multi-thread test generators, hardware accelerators, formal and semi-formal tools, micro-architecture checkers (API based), cache coherency checkers and coverage tools. Exercisers
originally developed for post-silicon validation were used to exploit the hardware acceleration platform. Forty-five thousand coverage points were organized to assist with big picture and were used to re-direct the test generator and exercisers for accelerators.

To support corner case testing for events that occur rarely, especially in multi-threaded scenarios, software irritator threads were used. These irritators are capable of creating the worst possible contentions. Through their application, twenty-three high quality bugs were revealed hiding in the corners.

A methodical application of these tools and technology clearly captured and advanced the industry best practices.

Designing for Verification

Designing for Verification was an important element in managing the overall risk to verification time line. IBM minimized the risks by maintaining a tight interaction between the specification and verification teams during the design phase and allowing the verification team to maintain architectural changes. “Chicken switches” were placed in silicon that allowed verification team to back-off an area considered risky or possible of otherwise compromising the verification effort. These switches provide workarounds, with some small impact on performance but no functional change, for accessing difficult to verify micro architectural features. Hardware irritators were also used to enable stress testing of corner cases in both pre-silicon and post-silicon testing.

Conclusion

The Power7 draws many architectural features from the Power5 and 6 designs, although it is a much more complex and powerful processor with a much shorter verification cycle. Ludden and the Power7 team accomplished this remarkable feat with a lot of foresight in planning, metrics collection and careful execution. Tight interlocking between metrics collected and verification plan was key part of tracking mechanism and functional closure. This project should serve as an example of how to plan for and manage risks in a complex verification project.

Kudos to John and the IBM team. His full presentation can be downloaded here.

Knowing When Verification is Complete

Friday, March 27th, 2009 by admin

Introduction

This article presents an overview of functional design verification using a coverage driven methodology while attempting to answer the question of how much testing is enough. The part being verified in this case will be a general purpose microprocessor, such as those found in mobile computing devices. Note that an approach of this magnitude is not always required. Designs with very limited instruction sets or highly restricted functionalities may be satisfied by simply writing directed assembly code tests to verify their intended functionalities.

Comparison of Simple and Complex Architectures

Figure 1 depicts a simple architecture as compared to a complex one. Note that the number of corner cases and unpredictability of the verification space increases as the architecture gains complexity. Thus, the complexity of the architecture determines how much testing will need to be accomplished to properly verify the component’s function.

Figure 1. Comparison of Verification Spaces

Comparison of Verification Spaces

Measuring Verification Progress

Coverage metrics are the dominant method for measuring verification progress in the industry today. Coverage points are normally designated by the design engineers looking at the logic of their block and by verification or system engineers looking at the functional definition of the part. Both of these are critical insights into the required verification coverage of the design.

Coverage points, indicated by the red dots in Figure 2, are deliberately chosen with respect to placement and density according to design knowledge and risk assessment.

Figure 2. Distribution of Coverage Points

Distribution of Coverage Points

Directed Testing

In the past, directed tests were typically written to hit coverage points. Because directed tests are by their very nature highly targeted and relatively inflexible, this resulted in much of the design not being tested as is shown by the ratio of red to gray in Figure 5. In addition to the low overall coverage that results from this approach, creation of directed tests is time consuming and requires highly skilled engineers. In this approach, testbench checkers that detect hits to coverage points are often overlooked with the assumptions that the engineers writing the tests know how to hit the required coverage points and that human errors will not be significantly problematic. In addition, as the design changes over the course of development, the directed test may lose track of its target coverage point. Without coverage monitors, these types of errors will not be detected and the design will not be as thoroughly verified as it appears to be on paper.

Using a Random Test Generator to Close Coverage

As processor designs became more complex, the need to hit more coverage points became apparent. Once the grid has been established, large numbers of purely random tests may be incorporated to begin closing coverage. Some of these tests may hit points on the coverage grid while others will not.

Figure 3. Intersection of Coverage Grid and Pure Random

Intersection of Coverage Grid and Pure Random

Approaching the problem of hitting coverage points from a random test generator viewpoint, a single engineer begins by writing a few generator templates and then generates tests using those templates. The generated tests are then run on a testbench which incorporates coverage monitors. The coverage monitors report all coverage points that are hit by the tests. As long as tests generated from the templates continue to hit new coverage points, the templates are kept in the nightly suite. As the rate of hitting new coverage points declines, new generator templates are created to target coverage holes. This approach requires skilled engineers to write checkers for the testbench but less skilled engineers to run the test generator.

Directed-random templates are created around points not hit by the purely random templates. We now begin to see the coverage grid closing more tightly (around 95%), and the verification process comes closer to completion.

Figure 4. Coverage Grid, Directed Random and Pure Random

Coverage Grid, Directed Random and Pure Random

Hitting Corner Cases

Not all coverage points will be hit by fully random or directed random templates. Some coverage points require a long series of events before the targeted behavior takes place. In this case, there are two possible approaches: write directed tests and write directed templates. Probably both of these approaches should be used. Directed tests can get to these most difficult coverage points more quickly but prove only one or a few cases around that point. Directed templates take more time to create but can be written to allow as much random behavior around the coverage point as possible.

Figure 5. Review Templates and Relax Restrictions

Review Templates and Relax Restrictions

Finally, existing tests are reviewed, and as much directed behavior as possible is removed before the tests are run again. Coverage then reaches full closure, and these tests are run until the schedule no longer permits.