Software-Reliability-Engineered Testing

John D. Musa, AT&T Bell Laboratories and James Widmaier,
National Security Agency

Musa.gif John D. Musa Widmaier.gif James Widmaier


Softwarereliabilityengineered testing (SRET) is system testing designed and guided by quantitative software system reliability objectives and expected field usage and criticality of different operations. It can be applied to both custom software in development and offtheshelf software of any singular or collectively integrated size. Ada and other objectoriented (OO) languages that tout reuse should now take advantage of this concept for their quantitative reliability metric. Thus, software modules or systems with known reliability would be more likely to be reused than those of unknown or unspecified value. The concept of reliability here is defined as the probability of execution without failure for some specified interval, generally called mission time. The definition is thus compatible with that for hardware reliability even though the mechanisms are different. SRET includes feature, load, and regression testing. This article describes the standard, proven practice employed in numerous successful communications software-based systems at American Telephone and Telegraph Co. (AT&T). The National Security Agency is attempting to merge this technology with security engineering for its software-based products.

Introduction: A Valuable Standard Practice

SRET is currently being practiced in a substantial number of communications software-based systems at AT&T Bell Laboratories. It is based on their best current practice of software reliability engineering (SRE), which is incorporated in a software development process that currently is undergoing International Organization for Standardization certification. In the Operations Technology Center of the Network Services Division, developers have used SRE on over 20 projects. This was the primary development center for the AT&T business unit that won the Malcolm Baldridge National Quality Award in 1994. Another area of AT&T that used SRE reported test interval and cost reductions by a factor of two, reliability increases of 10, and maintenance reductions by a factor of 10.

Many other software development organizations are beginning to apply SRET to software systems that require high reliability (or security). The National Security Agency is now embracing the technology to build communications system software where security and reliability are logical and required ingredients. The spread of this technology will soon be enhanced with the McGrawHill and IEEE (Institute of Electrical and Electronics Engineers) Computer Society Press handbook [2].

The SRET Process

Before detailing some aspects of SRET, it would be helpful to describe the two types of SRET: development testing and certification testing. Development testing is testing a softwarebased system you are constructing and debugging as you test. The metric estimated and tracked, which is equivalent to reliability, is failure intensity [1] (failures per unit execution time). System testers use failure intensity to guide the fault, i.e., bug, correction process.

Certification testing, on the other hand, does not involve fault removal to resolve failures but enables an overall accept or reject decision. Certification testing is typically used for acquired software such as commercial-offtheshelf components or object libraries. (Ada and OO libraries must accept this requirement to be viable for reuse.)

The SRET process consists of the five principal activities shown in Figure 1.

Figure 1: SRET Process
Figure 1: SRET Process.

Note that execute tests and interpret failure data occur simultaneously in the Component and System Testing Cycle. The activities in the latter three steps are primarily conducted by system testers. Defining reliability and developing operational profiles, the first two activities, are ordinarily performed by system engineers and architects. However, system testers are strongly motivated to see that these activities are done well. Hence, they add value when they are included as part of the systems engineering and systems architecture teams.

Step 1: Define Necessary Reliability

To define the necessary reliability for each system for product development, we must complete three specific activities: determine operational modes, define system failure, and set failure-intensity objectives. These activities are explained below.

Determine Operational Modes. An operational mode is a distinct pattern of system usage that needs separate testing because it is likely to stimulate different failures or because it occurs rarely and needs more testing than natural occurrences would provide. Factors that yield different modes may be day of the week or time of day, i.e., prime hours and off hours, special conditions (partial operation, overload, etc.), or rare critical events (battlefield, field exercise, etc.). Although more modes may increase realism of test, they will increase effort and cost of test case selection and execution.

Define System Failure with Severity Classes. A failure is a departure of program behavior in execution from user requirements (a user oriented concept). A fault, on the other hand, is a defect in the software that causes the failure when executed (a developer oriented concept). A severity class is a set of failures that share the same degree of impact on users. Common classification criteria include impact on human life, cost impact, and service impact. Each of these classes may be further subdivided but should be separated by an order of magnitude in impact.

Set Failure-Intensity Objectives. As previously mentioned, failure intensity is an alternative way of expressing reliability in terms of failures per unit execution time; for example, six failures for each 1,000 central processing unit (CPU) hour. One sets failure-intensity objectives (FIO) based on analysis of specific user needs, existing system reliability, and the capabilities of competing systems. The objectives must be set for the different operational modes and severity classes. Normally, one determines the failure intensities in clock hours of the hardware and the acquired software components of the system. Then, one subtracts these from the system FIO to obtain the FIOs required for the developed software. The results are converted into FIOs in terms of CPU hours.

Step 2: Develop the Operational Profile

An operation is a complete task performed by the system. It is logical rather than physical and can be executed over several machines or be executed in noncontiguous time segments. The operational profile is simply the set of operations and their probabilities of occurrence. To develop an operational profile,

Identify the Initiators of Operations. An operation can be initiated by a user, a transaction, another system, or the system's controller. An example would be a command activated by a user, a transaction sent for processing, etc. Users are grouped into user types who use the system in similar ways. Usage commonality is the key.

Enumerate the Operations That Are Produced by Each Initiator. To enumerate operations for each initiator, consult the system requirements, draft users manuals, work process documentation, etc. Do not forget critical operations.

Determine the Occurrence Rates (for each CPU hour) of Operations. Determining occurrence rates is usually not difficult. Existing field data from similar systems or simulation and estimation make these estimates possible. As will be discussed later, system usage data collection software should be built into the overall system logic.

Step 3: Prepare Test Cases

Before discussing test case preparation, let us examine some testing terminology.

A run is a specific instance of an operation; it is characterized by its operation and its complete set of input variable values and environment. Test cases, on the other hand, incorporate all the independent aspects of the run but are independent of environment. Thus, a test case can be used for multiple test runs. Runs involve test cases executing in operational modes, driven by test procedures. The test procedure is the statistical specification of the set of runs associated with an operational mode, made by providing values of operation occurrence rates. The relationship among these terms is best illustrated in Figure 2.

Figure 2: Test Cases, Operational Modes, and Runs
Figure 2: Test Cases, Operational Modes, and Runs.

The area represents all possible runs for the program, each of which is represented by a point. Both operational modes include test case X.n, a test case that is part of operation X. However, the runs are different because the operational modes are different, resulting in different environments. Hence, we label the runs X.n.1 and X.n.2, corresponding to the different test procedures.

The process for preparing test cases involves three steps:

1. Estimating the number of test cases. Estimating the number of test cases requires estimating the number of runs from the amount of testing scheduled and allocating them among the different operational modes. The minimum number of test cases required is equal to the maximum number of runs allocated to an operational mode. This number is necessary to prevent duplication of runs. Such duplication wastes testing resources without yielding any new information about the system. By duplicate we mean execute with exactly the same values for all input variables. You can make exceptions to this rule when you need to gather more data, verify that a failing run now operates successfully, or conduct a regression test. Note that having multiple operational modes results in the need for fewer test cases to yield a given number of runs. If you run a test case without a test procedure, you are in effect doing feature testing.

2. Specifying the test cases. Specification of test cases includes selecting the operation with the probability equal to its occurrence probability in the operational profile and selecting the run with equal probability from all possible runs of the operation. In load testing, large numbers of test runs are executed in the context of a test procedure that models load.

3. Preparing the test case and test procedure scripts. Preparation of execution scripts should be automated whenever possible since large numbers of test cases are usual.

Step 4: Execute Tests

Test management systems are worth their weight in gold when it comes to setting up, executing Test Procedures Scripts, capturing input and output, and cleaning up. But, in addition, it is also advisable to build into the software system, a mechanism to automatically record execution parameters and results so that real operational results may be compared against test script results. It is important to capture failures, when they occurred, and the severity of the failure.

Step 5: Interpret Failure Data

Failure data are interpreted differently for development testing than for certification testing. During system test for developed software, periodic estimates of failure intensity are made from failure data. The intensity estimates are typically computed from failure times or failures for each period, using reliability estimation programs such as CASRE [2], which are based on software reliability models and statistical inference. The failure-intensity trend plots of Figure 3 represent the maximum likelihood estimate and upper and lower confidence bounds (75 percent). Upward trends in failure intensity commonly indicate either of two undesirable conditions: unplanned system evolution or a change in the operational profile.

Figure 3: Example Failure-Intensity Trend
Figure 3: Example Failure-Intensity Trend.

We compare failure intensities with their corresponding failure-intensity objectives, which can be different severity classes. We can thus identify "at risk" schedules or reliabilities. The comparison can also be used to guide release from component test to system test, system test to beta test, or beta test to general availability.

On the other hand, certification testing uses a reliability demonstration chart [3], illustrated in Figure 4. Failure times are normalized by multiplying by the appropriate failure-intensity objective. Each failure is plotted on this chart. Depending on the region in which it falls, you may accept or reject the software being tested or continue testing. Note that Figure 4 shows a test in which the first two failures indicate you should continue testing, and the third failure recommends that you accept the offtheshelf software module or system under test.

Figure 4: Reliability Demonstration Chart
Figure 4: Reliability Demonstration Chart.


Practitioners at AT&T have found that software reliability engineered testing provides a proven way to develop and manage testing of a software system whether it is uniquely developed software, offtheshelf components, or a combination of both. SRET has enabled software testing to be oriented toward a quantitative failure-intensity objective. Since the method develops quantitative values for the software reliability, these figures can be combined with those of hardware for a composite reliability figure of merit.

John D. Musa
39 Hamilton Road
Morristown, NJ 07960
Voice: 2012675284
Fax: 2012676788

James Widmaier
National Security Agency
Ft. Meade, MD 20755
Voice: 4108596881
Fax: 4108596939

About the Authors

John D. Musa is an independent consultant. He was formerly technical manager of Software Reliability Engineering at AT&T Bell Laboratories, Murray Hill, N.J. where he developed extensive software engineering-related experience. He is author of the classic text on software reliability engineeringSoftware Reliability: Measurement, Prediction, Application and author of more than 80 publications in the software engineering field. He also gives seminars and short courses worldwide on software reliability engineering and has been a keynote speaker at numerous conferences. He is an IEEE fellow and is listed in Who's Who in America and American Men and Women in Science.

James Widmaier is a senior software engineer at the National Security Agency at Ft. Meade, Md. His technical career of 21 years at the agency has been focused on software engineering standards, practices, and quality assurance metrics. In particular, he has developed the concept of formal methods for secure software synthesis, automated quality assurance analysis of code, and has written a software engineering lifecycle standard for INFOSEC software. He is currently on a technical sabbatical to concentrate on software reliability engineering.


1. Musa, J. D., A. Iannino, and K. Okumoto, Software Reliability: Measurement, Prediction, Application, McGraw Hill, New York, 1987.

2. Lyu, Michael, ed., Handbook of Software Reliability Engineering, McGraw Hill and IEEE Computer Society Press, 1996.

3. Musa, J. D., op. cit., pp. 201-203.