Methods and Tools for Process Validation

Dr. Wayne A. Taylor


There are many statistical tools that can be used as part of validation. Control charts, capability studies, designed experiments, tolerance analysis, robust design methods, failure modes and effects analysis, sampling plans, and mistake proofing are but a few. Each of these tools will be summarized and their application in validation described.


Validation requires documented evidence that a process consistently conforms to requirements. It requires that you first obtain a process that can consistently conform to requirements and then that you run studies demonstrating that this is the case. Statistical tools can aid in both tasks.


This section describes the many contributions that statistical tools can make to validation. Each tool appearing in bold is further described in Section 4.

One tool that is particularly useful in organizing the overall validation effort is a failure modes and effects analysis (FMEA) or a closely related fault tree analysis (FTA). An FMEA involves listing out the potential problems or failure modes and evaluating their risk in terms of their severity, likelihood of occurring and ease of detection. Where potential risks exist, the FMEA can be used to document which failure modes have been addressed and which still need to be addressed. As each failure mode is addressed, the controls established are documented. The end result is a control plan. Addressing the individual failure modes will require the use of many different statistical tools.

Failures or nonconformities occur because of errors made and because of excessive variation. Obtaining a process that consistently conforms to requirements requires a balanced approach using both mistake proofing and variation reduction tools. When a nonconformance occurs because of an error, mistake proofing methods should be used. Mistake proofing attempts to make it impossible for the error to occur or at least to go undetected.

However, many nonconformities are not the result of errors, instead, they are the result of excessive variation and off-target processes. Reducing variation and proper targeting of a process requires identifying the key input variables and establishing controls on these inputs to ensure that the outputs conform to requirements. Strategies and tools for reducing variation and optimizing the process average are described in Section 3.

The end result is a control plan. The final phase of validation requires demonstrating that this control plan works, i.e., that it results in a process that can consistently conform to requirements. One key tool here is a capability study. A capability study measures the ability of the process to consistently meet the specifications. It is appropriate for measurable characteristics where nonconformities are due to variation and off-target conditions. Testing should be performed not only at nominal, but also under worst-case conditions. When pass/fail data is involved, acceptance sampling plans can be used to demonstrate conformance to specifications. Finally, in the event of potential errors, challenge tests should be performed to demonstrate that mistake proofing methods designed to detect or prevent such errors are working.

Depending on circumstances, not all tools need be used, other tools could be used instead and the application of the tools can vary.


Each unit of  product differs to some small degree from all other units of product. These differences, no matter how small, are referred to as variation. Variation can be characterized by measuring a sample of the product and drawing a histogram. For example, one operation involves cutting wire into 100 cm lengths. The tolerance is
100 ± 5 cm. A sample of 12 wires is selected at random and the following results obtained:

98.7    99.3     100.4      97.6    101.4     102.0

100.2    96.4    103.4     102.0      98.0    100.5

A histogram of this data follows. The width of the histogram represents the variation.

Of special interest is whether the histogram is properly centered and whether the histogram is narrow enough to easily fit within the specification limits. The center of the histogram is estimated by calculating the average of the 12 readings. The average is 99.99. The width of the histogram is estimated by calculating either the range or standard deviation. The range of the above readings is 7.0 cm. The standard deviation is 2.06 cm. The standard deviation represents the typical distance a unit is from the average. Approximately half of the units are within ± 1 standard deviation of the average and about half of the units are more than one standard deviation away from the average. On the other hand, the range represents an interval containing all the units. The range is typically 3 to 6 times the standard deviation, depending on the sample size.

Frequently, histograms take on a bell-shaped appearance that is referred to as the normal curve as shown below. For the normal curve, 99.73% of the units fall within ± 3 standards deviation of the average.

For measurable characteristics like wire length, fill volume, and seal strength, the goal is to optimize the average and reduce the variation. Optimization of the average may mean to center the process as in the case of fill volumes, to maximize the average as is the case with seal strengths, or to minimize the average as is the case with harmful emissions. In all cases, variation reduction is also required to ensure all units are within specifications. Reducing variation requires the achievement of stable and capable processes. The figure below shows an unstable process. The process is constantly changing. The average shifts up and down. The variation increases and decreases. The total variation increases due to the shifting.

Instead, stable processes are desired as shown below. Stable processes produce a consistent level of performance. The total variation is reduced. The process is more predictable.

However, stability is not the only thing required. Once a consistent performance has been achieved, the remaining variation must be made to safely fit within the specification limits. Such a process is said to be stable and capable. Such a process can be relied on to consistently produce good product.

capability study is used to determine whether a process is stable and capable. It involves collecting samples over a period of time. The average and standard deviation of each time period are estimated and these estimates plotted in the form of a control chart. These control charts are used to determine if the process is stable. If it is, the data can be combined into a single histogram to determine its capability. To help determine if the process is capable, several capability indices are used to measure how well the histogram fits within the specification limits. One index, called Cp, is used to evaluate the variation. Another index, Cpk, is used to also evaluate the centering of the process. Together these two indices are used to decide whether the process passes. The values required to pass depending on the severity of the defect (major, minor, critical).

While capability studies evaluate the ability of a process to consistently produce good product, it does little to help achieve such processes. Reducing variation and the achievement of stable processes requires the use of numerous variation reduction tools. Variation of the output is caused by variation of the inputs. Consider a pump. An output is flow rate. Suppose the pump uses a piston to draw solution into a chamber through one opening and then pushes it back out another opening. Valves are used to keep the solution moving in the right direction. The flow rate will be affected by piston radius, stroke length, motor speed and valve backflow, to name a few. Flow rate varies because piston radius, stroke length, etc. varies. Variation of the inputs is transmitted to the output as shown below.

Reducing variation requires identifying the key input variables affecting the outputs and then establishing controls on these inputs to ensure that the outputs conform to their established specifications. In general, one must identify the key input variables, understand the effect of these inputs on the output, understand how the inputs behave and finally, use this information to establish targets (nominals) and tolerances (windows) for the inputs. One type of designed experiment called a screening experiment can be used to identify the key inputs. Another type of designed experiment called a response surface study can be used to obtain a detailed understanding of the effects of the key inputs on the outputs. Capability studies can be used to understand the behavior of the key inputs. Armed with this knowledge, robust design methods can be used to identify optimal targets for the inputs and tolerance analysis can be used to establish operating windows or control schemes that ensure the output consistently conforms to requirements.

The obvious approach to reducing variation is to tighten tolerances on the inputs. This improves quality but generally drives up costs. The robust design methods provide an alternative. Robust design works by selecting targets for the inputs that make the outputs less sensitive (more robust) to the variation of the inputs as shown below. The result is less variation and higher quality but without the added costs. Several approaches to robust design exist including Taguchi methods, dual response approach and robust tolerance analysis.

Another important tool is a control chart. A control chart can be used to help determine whether any key input has been missed and if so to help identify them. Many other tools also exist for identifying key inputs and sources of variation including component swapping studies, multi-vari charts, analysis of means (ANOM), variance components analysis, and analysis of variance (ANOVA).

When studying variation, good measurements are required. Many times an evaluation of the measurement system should be performed using a Gage R&R or similar study.


A brief description of each of the cited tools follows:

  1. Acceptance Sampling Plan – An acceptance sampling plan takes a sample of product and uses this sample to make an accept or reject decision. Acceptance sampling plans are commonly used in manufacturing to decide whether to accept (release) or to reject (hold) lots of product. However, they can also be used during validation to accept (pass) or to reject (fail) the process. Following the acceptance by a sampling plan, one can make a confidence statement such as: “With 95% confidence, the defect rate is below 1% defective.”
  2. Analysis of Means (ANOM) – Statistical study for determining if significant differences exist between cavities, instruments, etc. It has many uses including determining if a measurement device is reproducible with respect to operators and determine if differences exist between fill heads, etc. Simpler and more graphical alternative to Analysis of Variance (ANOVA).
  3. Analysis of Variance (ANOVA) – Statistical study for determining if significant differences exist between cavities, instruments, etc. Alternative to Analysis of Means (ANOM).
  4. Capability Study – Capability studies are performed to evaluate the ability of a process to consistently meet a specification. A capability study is performed by selecting a small number of units periodically over time. Each period of time is called a subgroup. For each subgroup, the average and range are calculated. The averages and ranges are plotted over time using a control chart to determine if the process is stable or consistent over time. If so, the samples are then combined to determine whether the process is adequately centered and the variation is sufficiently small. This is accomplished by calculating capability indexes. The most commonly used capability indices are Cp and Cpk. If acceptable values are obtained, the process consistently produces product that meets the specification limits. Capability studies are frequently towards the end of the validation to demonstrate that the outputs consistently meet the specifications. However, they can also be used to study the behavior of the inputs in order to perform a tolerance analysis.
  5. Challenge Test – A challenge test is a test or check performed to demonstrate that a feature or function is working. For example, to demonstrate that the power backup is functioning, power could be cut to the process. To demonstrate that a sensor designed to detect bubbles in a line works, bubbles could be purposely introduced.
  6. Component Swapping Study – Study to isolate the cause of a difference between two units of product or two pieces of equipment. Requires the ability to disassemble units and swap components in order to determine if the difference remains with original units or goes with the swapped components.
  7. Control Chart – Control charts are used to detect changes in the process. A sample, typically consisting of 5 units, is selected periodically. The average and range of each sample are calculated and plot. The plot of the averages is used to determine if the process average changes. The plot of the ranges is used to determine if the process variation changes. To aid in determining if a change has occurred, control limits are calculated and added to the plots. The control limits represent the maximum amount that the average or range should vary if the process does not change. A point outside the control limits indicates that the process has changed. When a change is identified by the control chart, an investigation should be made as to the cause of the change. Control charts help to identify key input variables causing the process to shift and aid in the reduction of the variation. Control charts are also used as part of a capability study to demonstrate that the process is stable or consistent.
  8. Designed Experiment – The term designed experiment is a general term that encompasses screening experiments, response surface studies, and analysis of variance. In general, a designed experiment involves purposely changing one or more inputs and measuring the resulting effect on one or more outputs.
  9. Dual Response Approach to Robust Design – One of three approaches to robust design. Involves running response surface studies to model the average and variation of the outputs separately. The results are then used to select targets for the inputs that minimize the variation while centering the average on the target. Requires that the variation during the study be representative of long-term manufacturing. Alternatives are Taguchi methods and robust tolerance analysis.
  10. Failure Modes and Effects Analysis (FMEA) – An FMEA is a systematic analysis of the potential failure modes. It includes the identification of possible failure modes, determination of the potential causes and consequences and an analysis of the associated risk. It also includes a record of corrective actions or controls implemented resulting in a detailed control plan. FMEAs can be performed on both the product and the process. Typically an FMEA is performed at the component level, starting with potential failures and then tracing up to the consequences. This is a bottom-up approach. A variation is a Fault Tree Analysis, which starts with possible consequences and traces down to the potential causes. This is the top-down approach. An FMEA tends to be more detailed and better at identifying potential problems. However, a fault tree analysis can be performed earlier in the design process before the design has been resolved down to individual components.
  11. Fault Tree Analysis (FTA) – A variation of an FMEA. See FMEA for a comparison.
  12. Gauge R&R Study – Study for evaluating the precision and accuracy of a measurement device and the reproducibility of the device with respect to operators. Alternatives are to perform capability studies and analysis of means on measurement device.
  13. Mistake Proofing Methods – Mistake proofing refers to the broad array of methods used to either make the occurrence of a defect impossible or to ensure that the defect does not pass undetected. The Japanese refer to mistake proofing as Poka-Yoke. The general strategy is to first attempt to make it impossible for the defect to occur. For example, to make it impossible for a part to be assembled backward, make the ends of the part different sizes or shapes so that the part only fits one way. If this is not possible, attempt to ensure the defect is detected. This might involve mounting a bar above a chute that will stop any parts that are too high from continuing down the line. Other possibilities include mitigating the effect of a defect (seatbelts in cars) and to lessen the chance of human errors by implementing self-checks.
  14. Multi-Vari Chart – Graphical procedure for isolating the largest source of variation so that further efforts concentrate on that source.
  15. Response Surface Study – A response surface study is a special type of designed experiment whose purpose is to model the relationship between the key input variables and the outputs. Performing a response surface study involves running the process at different settings for the inputs, called trials, and measuring the resulting outputs. An equation can then be fit to the data to model the effects of the inputs on the outputs. This equation can then be used to find optimal targets using robust design methods and to establish targets or operating windows using a tolerance analysis. The number of trials required by a response surface study increases exponentially with the number of inputs. It is desirable to keep the number of inputs studied to a minimum. However, failure to include a key input can compromise the results. To ensure that only the key input variables are included in the study, a screening experiment is frequently performed first.
  16. Robust Design Methods – Robust design methods refers collectively to the different methods of selecting optimal targets for the inputs. Generally, when one thinks of reducing variation, tightening tolerances comes to mind. However, as demonstrated by Taguchi, variation can also be reduced by the careful selection of targets. When nonlinear relationships between the inputs and the outputs, one can select targets for the inputs that make the outputs less sensitive to the inputs. The result is that while the inputs continue to vary, less of this variation is transmitted to the output causing the output to vary less. Reducing variation by adjusting targets is called robust design. In robust design, the objective is to select targets for the inputs that result in on-target performance with minimum variation. Several methods of obtaining robust designs exist including robust tolerance analysis, dual response approach and Taguchi methods.
  17. Robust Tolerance Analysis – One of three approaches to robust design. Involves running a designed experiment to model the output’s average and then using the statistical approach to tolerance analysis to predict the output’s variation. Requires estimates of the amounts that the inputs will vary during long-term manufacturing. Alternatives are Taguchi methods and the dual response approach.
  18. Screening Experiment – A screening experiment is a special type of designed experiment whose primary purpose is to identify the key input variables. Screening experiments are also referred to as fractional factorial experiments or Taguchi L-arrays. Performing a screening experiment involves running the process at different settings for the inputs, called trials, and measuring the resulting outputs. From this, it can be determined which inputs affect the outputs. Screening experiments typically require twice as many trials as input variables. For example, 8 variables can be studied in 16 trials. This makes it possible to study a large number of inputs in a reasonable amount of time. Starting with a larger number of variables reduces the chances of missing an important variable. Frequently a response surface study is performed following a screening experiment to gain further understanding of the effects of the key input variables on the outputs.
  19. Taguchi Methods – One of three approaches to robust design. Involves running a designed experiment to get a rough understanding of the effects of the input targets on the average and variation. The results are then used to select targets for the inputs that minimize the variation while centering the average on the target. Similar to the dual response approach except that while the study is being performed, the inputs are purposely adjusted by small amounts to mimic long-term manufacturing variation. Alternatives are the dual response approach and robust tolerance analysis.
  20. Tolerance Analysis – Using tolerance analysis, operating windows can be set for the inputs that ensure the outputs will conform to requirements. Performing a tolerance analysis requires an equation describing the effects of the inputs on the output. If such an equation is not available, a response surface study can be performed to obtain one. To help ensure manufacturability, tolerances for the inputs should initially be based on the plants and suppliers ability to control them. Capability studies can be used to estimate the ranges that the inputs currently vary over. If this does not result in an acceptable range for the output, the tolerance of at least one input must be tightened. However, tightening a tolerance beyond the current capability of the plant or supplier requires that improvements be made or that a new plant or supplier selected. Before tightening any tolerances, robust design methods should be considered.
  21. Variance Components Analysis – Statistical study used to estimate the relative contributions of several sources of variation. For example, variation can on a multi-head filler could be the result of shifting of the process average over time, filling head differences and short-term variation within a fill head. A variance components analysis can be used to estimate the amount of variation contributed by each source.

Written for
Global Harmonization Task Force (GHTF) Study Group #3
Quality Management Systems – Process Validation Guidance – Edition 2 document.
Appears as Annex A of the document.

Copyright © 1998 Taylor Enterprises, Inc.

7 thoughts on “Methods and Tools for Process Validation”

  1. Hello Sir,
    We conducted process validation with sample size selected based on 95 % confidence and 95 % reliability which comes out to be 60 samples with 0 failures. Now we have seen 2 failures on 60 samples, so the team is recommending to go to lot acceptance sampling plan for attributes and Normal inspection II.So we produced a lot of 3700 so they are recommending the we test 200 samples and accept the lot if we have 7 or less failures (AQL 1.5). Can this be done and how will this be justified?

    Thanks in Advance!!

    1. Since the protocol had the plan n=60, a=0 and it failed, the validation failed and the reason for the nonconformances should be investigated and improvements made. Following that, the validation can be repeated.

      STAT-12, Verification/Validation Sampling Plans for Proportion Nonconforming, has a table of 95%/95% plans that include single sampling plans with nonzero sampling plans as well as double sampling plans. I generally use double sampling plans, as they keep the initial sample size close to 60, but allow a second larger sample in the case of a nonconforming. All plans are 95%/95% plans, so offer the same protection against a bad process. The alternate plans reduce the chance of false rejections.

      95%/95% means there is 95% confidence the proportion conforming exceeds 95% or that the proportion nonconforming is below 5%. This is because there is a 95% chance of rejecting a 5% nonconforming process. 95% conforming is an unacceptable level, almost certain to fail. That means the RQL (rejection quality level) of the plan is RQL0.05 = 5%. Selecting validations sampling plans is based on confidence statements and RQLs.

      Selecting a sampling plan for validation based on AQLs in inappropriate. That does mean we ignore the AQL, as it helps us to determine the chance of false rejection, but the focus is the RQL. The plan n=200, a=7 has an RQL0.05 = 6.5%. It is a 95%/93.5% plan and not the necessary 95%/95% plan.

      1. Hello Taylor,
        Could you explain more regarding to the last paragraph? As you mentioned, We should concern the RQL, not AQL when choose the validation sampling plan, as AQL help to determine the false rejection chance. My question is as per my understanding α risk is to determine the false rejection change and β is associated with false acceptance.

        Besides, if the sampling plan is used during the incoming process, we should consider both AQL, RQL, α risk and β risk. Is it correct and why it is different with process validation?

        Thank you

  2. Im trying to get my head around the relationship between the PPK for data and the ability to make a Confidence and reliability statement when using a certain amount of samples to perform your testing. As an example if you wanted to make a 95%/99% CR statement and you observed that a Pre-PQ observation was a PPK of 1.09, then as per a table this allows you to choose 80 samples and as long as you get a PPK of>= 0.92 you can then make 95/99 claim. The table gives several examples using a Pre_PQ observed value so you pick the One in the table which is closest to your observed value and lower (1.05 is close and lower than 1.09).

    Ive being scouring the web to find an explanation of this but have had no luck. Can any of you point me in the right direction.?

    Pre-PQ observed
    (95% confidence
    lower bound) Sample Size Acceptance Criteria LTPD0.05
    Ppk≥1.13 n=50 Ppk≥ 0.96, Pp≥ 1.02 ≤ 1%
    Ppk≥1.11 n=60 Ppk≥ 0.95, Pp≥ 1.01 ≤ 1%
    Ppk≥1.05 n=80 Ppk≥ 0.92, Pp≥ 0.99 ≤ 1%
    Ppk≥1.02) n=100 Ppk≥ 0.90, Pp≥ 0.97 ≤ 1%

    1. STAT-12, Verification/Validation Sampling Plans for Proportion Nonconforming, of my book Statistical Procedures for the Medical Device Industry contains the following table for selecting variables sampling plans for making a 95%/99% statement. Passing any of these sampling plans allows one to state with 95% confidence more than 99% of units are conforming. All of these sampling plans have the same chance of a false acceptance, namely a 5% chance of falsely passing a 99% conforming product/process. Since a 99% conformance is almost certain to fail, passing allows one to state the product/process is better than 99% conforming. False acceptance is the primary customer/regulatory concern.

      Table of 95/99 Variables Sampling Plans for a 2-Sided Specification

      The 95%/99% plans have different chances of false rejections. This is mainly an internal concern, although regulatory agencies may be concerned if they think you are likely to fail. Since we know 99% conforming is expected to fail, it raises the question of how much better do you need to be to pass. The AQL column describes a quality level with a 95% chance of passing. For example, the AQL of the n=30 plan is 0.0061% nonconforming or 99.9939% conforming. This corresponds to a Ppk of 1.28. Compare historical data to this column to help decide which plan to select. We want the lowest possible sample size so long as we are expected to pass.

      Estimates have associated errors. One may want to go down one or two rows for a ballpark estimate to allow for this error. When there is no historical data for your product, do the best you can. Use a similar product or engineering judgment. If you have no idea, pick a plan from the bottom half of the table. The book also has double variables plans that start with 15, 20 or 30 samples that can be used to determine a second sample size.

      Passing allows you to make the desired 95%/99% statement regardless of the plan.

  3. Hello, I have a question on sample sizes related to OQ validation for a molded part. If the table shows sample size = 29 for attributes and in the OQ, we want the supplier to provide samples at High settings and Low settings, should the sample size be 29 at High settings and 29 at Low settings or something like 15 samples at High and 14 at Low settings? Thanks.

    1. The practice is generally to do 29 at each condition. For a given lot of production, at worse case one would end up running just one of the worst-case conditions. For variables data the two PQ runs cannot be pooled because you end up with a mixture of two distributions.

      It is different for PQ because the three lots represent the same conditions and can be pooled together.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top