A lifetime of confusion – part two

Monday, November 23, 2015

A colleague of mine read my previous blog post on product reliability. And, while there’s nothing wrong with what I wrote, he did send me lots of material on more modern approaches to determining product robustness, rather than simply product reliability. This material makes for exciting reading and does indeed provide a very different and more true-to-the-application view on reliability.

Robustness validation methodology

In essence, you can say that the modern approach is more like a systematic analysis of the actual usage conditions and their impact on the products installed. In recent years, several electronics manufacturers have adapted the robustness validation methodology which you can find described in more detail in How to measure lifetime for robustness validation – step by step by ZVEI.

This means that you first have to collect a lot of data on what the stressors are for the product. When looking at an AC drive (or a variable frequency drive as they are often known), stressors will typically be things like supply voltage, temperature, humidity and vibration. The normal conditions for these stressors then need to be determined for the installation, and regularly appearing outliers need to be evaluated.

Defining the application

It’s already clear that, for a pumping application, there are several different installations to evaluate, as the behaviors of the stressors are very different in, for example, mining sites and commercial buildings. Even for commercial buildings, the differences are huge depending on the geographical area in which the drives are installed.

To accommodate the analysis, standardized environments (mission profiles) are defined based on the main segments where an AC drive manufacturer’s products are installed. This means that, to determine the life expectancy of a variable frequency drive, it is important in the evaluation of a product to discuss, not only the environmental conditions but also the actual application of the drive.

Accelerated test model

In order to actually test the reliability of the drive, it is necessary to find out the duration of the stress exposure and also the intervals. Also, the behavior of the AC drive to the stressor is a very important factor, as the test needs to emulate the same failure mode. Some failure modes are linear: increasing the voltage 10% also increases the stress by 10%. But others have a logarithmic or exponential relationship meaning that a minute change in the stressor can have almost no effect or actual detrimental effect on the drive.

Based on this information, accelerated testing can be defined. Stress levels can be increased and cycle times reduced so we can actually test the failure modes in as close to real-life situations as possible, but in a much shorter time. Testing needs to be completed before we launch a new or modify an existing AC drive. We need to know it will have the expected product reliability when selling the product to avoid failure in the field. It must, however, also be carefully considered that failure modes are not introduced by unrealistic increased stress which would never appear in the field.

When specifying drives, manufacturers will frequently require an actual specification of the environment before making compliance statements to a product’s lifetime. This requirement is quite logical, but puts a lot more emphasis on the knowledge of the end use of the product.

As an example, an engineer is specifying a booster set for a hospital and wants a 10-year lifetime. In most cases we would know the geographical location, since we know where the project is being planned. What is not specified, however, is the installation location.

If the drives operating the booster set are installed in an air-conditioned motor control center, then operating conditions will be optimum and life expectancy can be more easily estimated. If, however, the drives are installed on the pumps, and the pump room is not air conditioned, then suddenly the geographical location becomes very important. We might then experience temperatures exceeding 50ºC/122ºF and vibration from the pump. This immediately increases the stress on the drive and potentially reduces its life expectancy making the accurate assessment of product reliability more complex.

A reduction in life expectancy of a drive is not necessarily a problem. If a drive has been designed for the operating conditions and the design has gone through a thorough robustness validation, it could still easily meet the required product lifetime. If conditions were more ideal, it would far exceed the product lifetime, which would likely mean that it, to some degree, is over specified for the application and cost has been added to increase robustness beyond the customer’s specification.

Deciding what to design for

Design is always a careful balance between the most extreme requirements and the market price levels for the average applications. This is also why many AC drive manufacturers have several different product series targeting specific segments and applications. While a product may have the same physical appearance, changes may have been made to optimize the cost to meet market pricing levels. This could mean using less sturdy capacitors for lower-tier applications, or adding coating and ruggedization for harsh environments with corrosive agents in the air and high vibration levels.

There is an excellent example of how to make a step-by-step analysis of the failure mode; starting from failure mode identification over multiple devices testing to account for part-to-part variations, to an analysis of the three stages of failure:

early life extrinsic failure (typically manufacturing defects)
useful life random failures (tolerances and unforeseen events cause a moderate failure level)
wear out intrinsic failures (end of component life accelerates failure rate)

These three modes will create a ‘bathtub’ curve which defines the lifetime of a product. For products requiring a long lifetime, the distance in the bathtub between stage 1 and stage 3 should be as long as possible.

A lifetime of confusion

Some would say that a long lifetime is always desirable, but that is not the case. A hospital, for example, has mostly good environmental conditions and needs products with a long lifetime. However, the environment in a cement factory is such that machinery is typically replaced every 3–5 years. Here, the product reliability of the drive is defined by very high uptime during a much shorter period.

Failures accumulate over the product’s lifetime

The graphics below show the integrated function of the bathtub curve which is called the Mean Cumulative Function (MCF) curve. The advantages with this curve are that it can be used for a design target and this can be compared with the field failures.

A lifetime of confusion

Everyone knows that, once a product fails, it doesn’t just drop out; it remains in the statistics as a failure and contributes to the customer’s perception of product quality. This curve provides a simplified view on ‘lack of customer satisfaction’ as a function of the cumulative failures of a product.

It is therefore important to understand how failures accumulate over the lifetime of a product, and what influences the total curve of cumulative failures, in order to influence customer satisfaction with the product.

If you look at the curve of the total mean cumulative failures (the purple line in the graph below), this should be split into four distinct curves, each contributing to the above-mentioned bathtub curve.

0-time failures (the red line) are typically failures occurring in transport, installation or due to poor instructions. These types of failures are often due to lack of consideration for the practical handling of a product before it is put into service.

A lifetime of confusion

Early failures (the green line) are typically manufacturing or design issues with the product and will have the highest failure rate early in a product’s lifetime and will then flatten, as all influences not compatible with the application by design or manufacturing will be triggered early and cannot be reintroduced in the application.

Lack of design robustness (yellow) is a constant curve. This is often because a product did not fulfill the actual requirements of an application. In these cases, either the mission profile was not clear from the specification, or a product was selected to reduce cost even though it didn’t meet the specification. This type will be low initially, as the stress in the application has not yet affected the product, but will have a constant rate, simply because it is the nature of the application when a specific product is applied.

Degradation and end of life for components (blue) is the intrinsic failure mechanism, which is determined by specific components within the AC drive degrading to a level where the drive no longer meets specification and fails, or the component itself reaches end of life and fails.

Specifying the true need

As a specifier, it is important to understand all the environmental factors in the installation and make sure these are defined in the specification. I had a case some years ago with a large plant where drives were corroding within six months of installation. According to the end user, the motor control center was well-ventilated and filters were installed and maintained to protect the drive against all dust in the facility, so environmental factors should have been under control. The specification had called for compliance with IEC 60721-3-3 class 3C2, so moderate coating had been applied. However, when air inside the panels was measured, high concentrations of sulfur and high humidity were found to be causing sulfuric acid corrosion on the drive terminals. Levels were significantly higher than the specification had stated. As a result, drives with higher protection (complying with IEC 60721-3-3 class 3C3) were installed and the problem was resolved.

There’s always a margin to consider

As a manufacturer, you need to understand the customer requirement specification and take component degradation into account in your design. Traditionally designs have been made to meet the customer's specification, but this actually means that, after a certain period of operation, the actual product specification has degraded and now no longer meets the customer's specification. In practical terms, at the time of purchase, the product met all requirements, except the degradation was not taken into account, so the actual lifetime of the product would be less than specified, since the application requirements sometime after installation would degrade the performance and thus accelerate failures.

My colleague, together with some of his peers, also published a paper in the IEEE Journal of Emerging and Selected Topics in Power Electronics. It has a much more elaborate explanation of this requirement for designing with a robustness margin to meet customers’ long-term expectations. In both the ZVEI document and the IEEE paper you will find guidance to more documentation and, for those wanting a deeper understanding of the subject, these are highly recommended.

Author: Frank Taaning-Grundholm, Director, Global Strategic Customers, Fluid Handling, Danfoss Drives