Glossary of Common Terms

Following is a list of common terms used in the Reliability and Risk Assessment disciplines.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

A

Active Device
An electronic component whose operation in a circuit relies on a semiconductor junction. Examples include integrated circuits, diodes, and transistors.

Allocation
see Reliability Allocations

Alpha Failure Model
Used in Common Cause Failure modeling in Fault Tree analyses to describe the contribution a common event makes to an events probability of failure or failure rate.

Apportionment
Used as part of a FMEA analysis to describe the percentage of time a particular failure mode is expected to occur. The sum of all apportionments for failure modes associated with a particular component or function = 100%

Arrhenius Temperature Model
A temperature model (equation) used in part to model the effect of temperature on the failure rate of an electronic or mechanical component. Using a supplied baseline failure rate, the ambient temperature, and a reference temperature, the equation produces an expected failure rate at the ambient temperature.

Availability
The probability that a system, or part of a system, is operational when demanded to perform. Availability is a unit less value from 0 to 1, where 0 is a certainty of failure. Availability equations account for both failures and repairs of the system.

B

Barlow-Proschan
An approach to determining importance measure of a particular event in a Fault Tree or Reliability Block Diagram.

Bellcore
An electronic reliability prediction standard developed by Bellcore Communications Research (Bellcore). Originally developed for telecommunications, it now has applicability to most commercial, and some military prediction requirements. Both failure rate and MTBF are results gained from this prediction approach. Failure rate is given in FITS, failure per billion hours.

Beta Failure Model
Used in Common Cause Failure modeling in Fault Tree analyses to describe the contribution a common event makes to an events probability of failure or failure rate.

Bill of Materials (BOM)
A list of electronic and other components that makes up a piece of equipment or circuit board. General information such as Part Number, Description, Circuit Location, etc. are included. Reliability parameter values such as electrical stresses, temperatures, etc. are not usually included. Typically a BOM is used to populate a reliability prediction tool for further analysis.

Birmbaum
An approach to determining importance measure of a particular event in a Fault Tree or Reliability Block Diagram.

Burn-in
A testing procedure by which equipment or individual devices are operated at elevated temperatures or loads in an attempt to produce early life failures.

C

CASS
An approach to assess the compliance of a safety system with IEC 61508 and other standards. A systematic approach to assess compliance during all stages of development, manufacturing, deployment, and operation.

Chi-Squared
One of many statistical distributions used for failure models of Fault Tree events, Reliability Block Diagram blocks, and Event Tree branches.

Common Cause Failure
A single event that can cause other failures of a system. For example: a set of pumps bolted to a rack. If the rack falls apart, the pumps fail as well. The probability of the failure of the rack must be added to the probability of the pumps failing on their own. Common causes can be modeled explicitly via Repeat Events, or implicitly via Alpha and Beta failure models.

Component Libraries
Libraries containing manufacturer datasheet and reliability prediction information for commercial and military components. User defined libraries, populated with the most commonly used components, can greatly improve analysis efficiency.

Corrective Action
An action taken as a result of a failure or negative observation, typically associated with manufacturing or other product production processes.

Criticality
Used to rank the consequence of a failure mode and its frequency of occurrence as part of a FMECA. Criticality is a relative, unit less value, based upon several parameters, including component failure rate, and apportionment of the failure mode.

Cut-Set
see Minimal Cut-Set

D

Derating
Derating is a guideline used to ensure that components are operated well below their rated voltage, power, or current levels. By default, failure rates calculated within reliability prediction standards assume a higher than typical-design stress on the components, leading to conservative results. Derating guidelines point out the components in the analysis which are Nominal, Above Nominal, or Overstressed. This enables the analyst to recommend which stress levels are appropriate for the component, ensuring increased reliability and lower failure rates. There are several military and commercial derating standards available, and it is possible to define your own.

Design FMEA
A failure modes and effects analysis of the Design Process for any product.

Distribution Parameters
Each statistical distribution requires certain parameters such as mean, standard deviation, characteristic life, shape factor, bounds, etc. The parameters needed will depend on the type of distribution being used.

Dormant
A period of non-operation for a system or device. Typically it is assumed that the failure rate during this period of time is lower than when the device is in operation.

E

Event Sequence Diagram
A diagram showing a sequence of events, and their potential outcomes, which occur after an initiating event, leading up to end states and consequences. Similar to an Event Tree, but used in more advanced risk assessment methodologies.

Event Tree Analysis
Used to determine the consequences of an initial event, and the subsequent, sequential events that may or may not occur. The probability of each event is determined, and the probability of the various outcomes are common results.

Exponential Distribution
The most widely used distribution in reliability engineering. Used for time-dependent data where the rate of event occurrence does not vary.

F

Failure Mode
A specific "way" a component or function fails. The failure mode of a component or function is expected to have a direct effect on another part of the system.

Failure Mode and Effect Analysis (FMEA)
An inductive method of analyzing system design, safety, and performance. Determination of the effects of component and functional failure modes on the system is a large part of the analysis.

Failure Mode, Effects and Criticality Analysis (FMECA)
An inductive method of analyzing a system design for safety and performance. Determination of the effects of component and functional failure modes on the system is a large part of the analysis. Assigned to each failure mode is a criticality factor. This unit less number is used to rank the failure modes to expose those having the most potential impact on the system.

Failure Rate
The number of failures experienced or expected divided by the total exposure time. The failure rate is the inverse of the mean time between failures (MTBF).

Fault Tree Analysis
A deductive method of analyzing a system design for safety and performance. A specific top event is defined, along with all of the events and logic in the system that will cause the top event to occur. The logical structure or paths to failure are defined graphically with AND, OR, and other types of gates. Any event, whether it is a hardware failure or human interaction can be accounted for in a fault tree analysis.

First Year Multiplier
Used in the Bellcore prediction standard, it is the ratio of the predicted first-year failure rate to the predicted steady-state failure rate of a component.

FITS
Failures per billion hours, or Failures In Time as defined by the Bellcore prediction standard.

Fixed
A simple value used to indicate the probability of an event occurring. A unit less number from 0 to 1, where 1 is a certainty the event will occur, and 0 is the event will never occur.

FPMH
Failure per million hours. Used primarily in the MIL-217 prediction standard.

Fussell-Vesely
An approach to determining importance measure of a particular event in a Fault Tree or Reliability Block Diagram.

G

Gaussian Distribution
see Normal Distribution.

H

Hot-swap or Hot-standby
A device that is powered up, but is intended to be a backup for the main operating device.

Hybrid
A custom electronic component that contains other components mounted to a substrate. Often these devices are sealed or otherwise enclosed to prevent tampering or reverse engineering.

I

IEC 61508
Functional safety of electrical/electronic/programmable electronic safety-related systems. IEC

IEC 62380
Reliability data handbook - Universal models for reliability prediction for electronics components. IEC

Initiating Event
A specific event which causes a subsequent series of events to occur.

Importance Measures
Methods used to determine the relative impact an event or failed block will have on a system. See also Fussell-Vesely, Birnbaum, Barlow-Proschan.

J

Junction
The region where the two types of semiconductor meet.

Junction Temperature
A key parameter in reliability prediction for active components.

K

K out of N (KooN)
Used to define the number of devices or blocks that must be functional (not failed or under repair) for a parallel portion of a system to be considered available.

L

Latin-Hypercube Sampling
A superior approach to Monte Carlo simulation, whereby a distribution of plausible parameter vales are generated. It ensures that all intervals of potential samples are explored. Used primarily in Uncertainty analysis.

Life Cycle Cost (LCC)
Analysis of the cost of a product over its entire life. It includes development, production, warranty, repair, and disposal costs.

Lognormal Distribution
Similar to the Normal distribution. The logarithm of the values of random variables, rather than the values themselves, are assumed to be normally distributed.

M

Maintainability Prediction
A method to determine the mean time to repair (MTTR) and other measures for a system or device. Specific repair tasks and the times to perform each task are taken into account to arrive at the overall downtime that can be expected from these repairs. Corrective and Preventative Maintenance cycles can be accounted for.

Mean Time Between Failures (MTBF)
The mean time expected between failures. MTBF is the inverse of the failure rate. MTBF should be used for repairable items, while MTTF (Mean Time to Failure) should be used for non-repairable items. The assumption is that over an extended period of time the fail/repair cycle will occur many times.

Mean Time to Failure (MTTF)
The mean time expected to the first failure. MTTF is the inverse of the failure rate. MTTF should be used for non-repairable items.

Mean Time to Repair (MTTR)
The mean time spent performing all corrective and/or preventative maintenance repairs.

Methods
Are optional data sources used within the Bellcore prediction standard to take into account burn-in, laboratory, and field data when calculating failure rate and MTBF for components.

MIL-HDBK-217
The most commonly used reliability prediction standard originally developed for military related organizations. It uses mathematical reliability models for many types of electrical and electronic components, but is considered behind the times as far as current technologies available. Models are based on parameters of the components such as type of packaging, power dissipation, voltage stress, and the environment. MIL-HDBK-217 delivers both a failure rate and MTBF (Mean Time Between Failures) results.

MIL-HDBK-472
A published standard for maintainability prediction analysis. Tasks are listed, along with their approximate times of completion, leading to an overall MTTR for each component, sub-assembly, and system.

MIL-STD-1629
A published standard for performing Failure Mode, Effects, and Criticality Analyses. Perhaps the most common approach used by any organization including military and commercial concerns. Criticality calculations are also included for ranking of failure modes.

Minimal Cut-Set
A minimal cut-set is a cut-set that has been reduced to a point where it does not appear in any other cut-set the analysis produces. This approach ensures that duplicate "paths to failure" do not exist in the analysis results. Using Boolean algebra, initial cut-sets are further analyzed to see if reduction is possible. The end result is a set of cut-sets which clearly describe the paths to failure in a system. Fault Tree, Reliability Block Diagrams, and Event Trees all product minimal cut-sets.

Monte Carlo Simulation
A simulation approach which performs random tests on a system to determine an approximate overall reliability and availability of the system. Monte Carlo is not as accurate as Latin-Hypercube approach to approximation.

N

Non-Operational (NonOp)
A storage or other non-operational period a system ,sub-system, or component experiences. Stresses are normally much lower than operational periods, potentially leading to lower failure rates, and fewer failure modes. However, devices can still fail, and have failure modes during non-operational periods.

Normal Distribution
A commonly used distribution in the field of statistics and probability. The distribution is symmetric. The mean and standard deviation are its two parameters.

NSWC
Naval Surface Warfare Center. This government organization has created a reliability prediction standard for mechanical components. It is used widely in military and commercial industries.

O

Operating Environment
One of the key reliability prediction standard parameters is the assumed environment the system or devices are operating in. The predicted failure rate of a device is greatly impacted by the operating environment.

P

Parallel Operating
Redundancy of multiple, and perhaps identical devices, performing the same task, but fully operational during the period of analysis. The devices do not necessarily need to be identical.

Passive Device
A discrete component such as resistors and capacitors, which do not have an active semiconductor junction.

Pi Factors
A term used to describe the parameters used in reliability prediction equations. Parameters such as stress, temperature, environment, etc. are represented using the Pi symbol, but the traditional value for Pi is not used in the calculation. An unfortunate choice of Greek letters.

Pivotal event
An event which occurs after an initiating event where only one of two outcomes may occur; Occurs or Non Occurs. These types of events are used in Event Sequence Diagrams to illustrate what might happen after a previous event.

Poisson
A commonly used statistical distribution for reliability and availability prediction.

Probabilistic Risk Assessment
An approach to assessing risk to or a result of a system, using established probabilistic and statistical methods. All upsetting events and their eventual outcomes are considered, along with their probability of occurrence. The end result is a view of each "end state", the probability of it occurring, and any consequences to safety, finance, or other aspects of an organization or system.

Process FMEA
A Failure Modes and Effects Analysis of a process, rather than a physical device or system. A common analysis performed in the manufacturing or operations discipline of a company.

PRM
Process Reliability Management - the discipline of managing the reliability of a process, manufacturing or otherwise. Commonly used to ensure a process produces the same results.

Q

Qualitative
A modeling approach which considers only the elements which make up a system, and how they interact with each other logically. No effort is taken to associate probability of occurrence, or failure rate numbers to the elements of the system. However, the results of this type of analysis include possible paths to failure, and a clear picture of how the system can fail.

Quantitative
A modeling approach which is based upon a qualitative foundation, but includes probability and failure model distributions to determine numeric results for the overall system availability and other factors.

Q (Unavailability)
The probability a system is failed at a specific point in its lifetime. 1- Availability(t).

R

Redundancy
Having more than one piece of equipment available to perform a function within a system. In general, redundancy helps improve the reliability and availability of the system, but this may not always be the case, depending on the other elements of the system.

Reliability
The ability to perform a required function under stated conditions for a stated period of time. Reliability is expressed as a probability from 0 to 1. Assuming the system was operating at time zero, Reliability is the probability that it continues to operate until time t.

Reliability Allocation
Approaches used to allocate a given reliability goal to the various elements of a system. Weights are assigned to each element during the early phases of design, helping to determine where the designers should focus their efforts and budget.

Reliability Block Diagram (RBD)
A diagrammatic analysis methodology used to determine, in part, the reliability and availability of a system. Serial and parallel arrangements can be modeled. Failure distributions or probabilities of failure may be defined at each "block" of the system. RBD analysis is nearly identical mathematically to that of Fault Tree.

Reliability Centered Maintenance (RCM)
The definition of a maintenance program having reliability as a significant input parameter. MSG-3 (Maintenance Steering Group), originally developed by the aviation industry, is an early example of the methodology.

Reliability Growth
Improvement in a reliability parameter due to corrective action taken to the design or manufacture of a device.

Reliability Prediction
Calculation of component, sub-assembly, or system failure rate, and the related MTBF, is the foundation of a reliability prediction analysis. It may be based upon published standards, or engineering judgment, or a combination of both.

Repeat Events
An event in a Fault Tree that is repeated elsewhere in a tree. It is not a simple copy of an event. Rather, it is indicating that a specific event can occur in multiple logical locations in the tree. Also know as "explicit common cause" modeling.

Risk Priority Number (RPN)
Used in ISO 9000 FMEA analysis to rank the possible importance of failures. RPN = Severity x Occurrence x Detection, a unit less, relative number.

S

Safe Failure Fraction
Used in a Failure Modes, Effects, and Diagnostic Analysis (FMEDA, IEC 61508), the SFF displays the fraction of the system failure rate that is considered safe vs. the total failure rate. Some system failures are defined as "safe", while others are "dangerous".

Safety Integrity Level
The Safety Integrity Level (SIL) is a measure of risk reduction defined in IEC 61508. It has four possible levels, 1-4, that can be used to assess risk and quantify potential risk

Sensitivity Analysis
Part of an overall analysis to determine how sensitive a system is to changes in reliability or failure rates of the elements it is composed of. "What-if" scenarios are constructed and results compared to determine the critical points of the system.

Series
A string of system elements, all of which must be operating for the system to function.

Single Point Failure
An element that, if failed, would cause the entire system to fail.

Software Reliability
A determination of the probability that software performs its intended function when called upon. Relative to hardware and human reliability, software reliability prediction is in its infancy with few clear methodologies.

SpareCost
An approach to determining the number of spares needed to support a set of equipment due to failures, cost, and the risk of running out of stock.

Steady-State Failure Rate
The failure rate of a component or system after any early-life period.

Success Tree Analysis (STA)
A symbolic logic model similar to a Fault Tree (FTA), but is focused in the success domain, rather than the failure domain. STA amd FTA are compliments to each other.

T

Telcordia
see Bellcore.

Time-Dependant Distribution
A distribution of failure data which is a function of time. Time is a significant parameter in all reliability calculations.

Time-Independent Distribution
see Fixed.

U

Unit
A collection or assembly of devices in a piece of equipment. Typically a unit is a replaceable item which may or may not be repaired at a later time.

Unavailability
1 - Availability (t)

Uncertainty
The degree of the lack of confidence in a result. When using distributions for reliability analysis, there is inherent uncertainty of the values used. These uncertainties add up at the system level, creating even greater uncertainty.

Unreliability
1 - Reliability (t)

V

Voltage Stress
The ratio between the Applied Voltage and the Rated Voltage of an electronic component. Used in reliability prediction analyses as a significant contributor towards the failure rate of a device.

W

Weibull Distribution
A commonly used distribution that is very handy due to its ability to model constant, increasing, or decreasing failure rates. 2 and 3 parameter Weibull distributions are available, and it is the manipulation of these parameters that gives the distribution its flexibility.

Weighting
A method used to assign relative importance to an event or outcome. A unit less number arbitrarily set by the analyst.