Global markets are hugely competitive, requiring product designers and manufacturers to achieve two major objectives in order to remain successful: reduce the time-to-market for products that will work first time on delivery; and design and manufacture products that will operate reliably for as long a period as possible.
To achieve these strategic goals requires products to be designed and produced in a way that ensures they will operate reliably once delivered. This, in turn, calls for a method of assuring product reliability prior to delivery. To remain competitive, companies require techniques that are fast, cost-effective, and produce worthwhile results.
A common difficulty facing design engineers working on electronic equipment is establishing what exactly is meant by reliability. A vast amount has been written about the definition and theory of reliability.
The term reliability is internationally defined as the ability of an item to perform a required function under stated conditions for a stated period of time*. The required function includes the specification of satisfactory operation as well as unsatisfactory operation. For a complex system, unsatisfactory operation may not be the same as failure. The stated conditions are the total physical environment including mechanical, thermal and electrical conditions. The stated period of time is the time during which satisfactory operation is desired and is often called the service life of a product.
To make matters more complex, depending on the application, different measures of reliability may be more appropriate. For example, survivability is the probability that an item will perform a required function under stated conditions for a stated period of time without failure. The differences between survivability and reliability can be summarised as follows: reliability, as defined earlier, may be described qualitatively while survivability may be described quantitatively. Reliability is an ability rather than a probability. Reliability can be broadly defined to include the possibility of repair, whereas survivability applies only to applications in which failures are not routinely repaired.
For items such as telecommunications equipment, where a measure of its reliability must include the possibility of repair as well as failure, then a measure of reliability may well in fact be availability, which applies to situations in which failures are routinely repaired. Availability is a measure of the degree to which an item is in an operable state when called upon to perform.
To add yet another dimension, the other general measure of reliability is maintainability. This refers to the maintenance process associated with system reliability. Maintainability is the degree to which an item can be retained in, or restored to, a specified operating condition.
The traditional approach to reliability evaluation (life cycle testing) involves tests carried out within the product's aexpected environment' or using actual operational conditions (Table1). The test process could also include an environmental qualification based on the predicted life cycle of the product.
However, if a five-year life cycle for a product is expected, a traditional reliability evaluation programme would require testing to encompass 43800hours of usage. This would not only be costly, but also delay the product's entry into the marketplace. Table2 highlights many of the factors which will need to be considered when testing a product for reliability.
In reality, the market window for most products is small and so the time available for reliability evaluation of the product is typically days, or maybe weeks at best. If this kind of testing is performed at normal operational conditions, it is not likely to yield a statistically significant number of failures unless a large number (tens of thousands) of products are tested. Indeed, for most components or sub-assemblies full life cycle testing is not practical because of the high costs involved. So if reliability testing needs to be performed in only a few weeks, then a different test methodology is required.
Accelerated life testing and environmental stress screening have become increasingly accepted as methods of assessing product reliability before shipment. They are now recognised by major multinationals operating in Europe and around the world as legitimate product reliability test methods. The key reasons are that they give a level of confidence that a product will not develop faults after delivery or in use, and provide a process to identify any design defects, component problems or production-related issues.
Accelerated life testing is based on using real-life operational data and trying to accelerate fault conditions by applying key operational failure-causing stresses at levels above those that the product would experience in its application environment.
This accelerated ageing approach allows a distribution of failure times to be obtained, albeit at more stressful conditions than ordinary operating conditions. It also requires the distribution of failure times to be related to the distribution of failure times that would be anticipated under operational conditions. This would call for an accelerated life model to be created which is typically characterised by a linear relationship between failure times at different sets of conditions.
The key operational failure-causing stresses that contribute most commonly to the impairment of a product's reliability are thermal cycling, vibration and fatigue, and power cycling.
Temperature cycling induces stresses within a product due to differential expansion of components and materials. Extending the temperatures (both high and low) to which a product is exposed accelerates creep due to coefficient of thermal expansion (CTE) mismatches within the product. The more extreme the temperature cycle, the higher the acceleration factor.
Vibration promotes mechanical failures due to cyclic stressing. The deterioration of material strength due to cyclic stressing is known as fatigue. If a product's operational vibration environment is known, then it may be accelerated using Miner's Theorem, for example.
Putting a product through thermal and vibrational stresses in combination with power cycling will accelerate the discovery of a product's failure mode. The test programme has to be devised and implemented in a way that should not damage the product.
The benefit of accelerated life testing is principally that it helps detect the design flaws which are most likely to give rise to a product's ainfant mortalities'. The disadvantage is that this method may precipitate some unrepresentative failures.
Highly accelerated life testing
First developed in the USA during the 1980s, highly accelerated life testing (HALT) takes a practical rather than a predictive approach (Table3).
The method is an extension of accelerated life testing discussed earlier, but the test levels used are not based on operational data. Thermal and mechanical stimuli re applied separately and then together in order to determine the operating and destruct limits of the item under test. This testing methodology is particularly suited to products in the development or prototype stage. When coupled with power cycling and product specific stresses, this test method has been proved to expose design flaws within hours when traditionally this has taken many days or weeks using conventional test methods.
A key difference between HALT and traditional accelerated life testing is that stress factors, such as high temperatures, are applied directly to the component or sub-assembly under test and not to the system as a whole. This can make a great difference in accelerating failure rates.
Defect analysis is a key stage in the HALT process and is conducted when the operation and destruct limits
(if possible) are known. The operating limit is defined as the point at which the unit remains operational but any further increase in stress causes a recoverable failure. The destruct limit is the level at which the product stops functioning and remains inoperable. At this stage, all major flaws in the design should be exposed. Most may require a simple fix, some may require major modification, yet it may be considered that the design is sufficiently rugged and that no further action is required.
There is a common misunderstanding by engineers that HALT has a tendency to lead to aover engineered' products. This is not the case. In fact a HALT appraisal allows designers to establish the limitations of their product designs.
HALT provides a number of major benefits for designers and manufacturers, including: real tests earlier in the design/development process; the ability to characterise and identify flaws in a design rapidly; shorter time-to-market with a amature' product; cheaper, more compressed product testing; and fewer field returns.
TUV Product Service has many years' experience and has carried out numerous HALT evaluations for manufacturers of equipment in telecommunications, defence, aerospace and consumer industries. At the company's Fareham facility (Fig.1), a QualMark OVS (omniaxial vibration system) combined stress test chamber is used to implement HALT and HASS test methods to provide designers and manufacturers with an extremely rapid (typically three to five days for HALT) method of revealing potential product weaknesses which may have taken much longer to expose by conventional methods.
Ralph Harris is with TUV Product Service Limited, UK. " target="_blank">www.tuvps.co.uk
* AT&T Reliability Manual edited by David J KlingerYoshinao Nakada and Maria A Menendez."