STAT-04: Statistical Techniques for Design Verification

This is part of a series of articles covering the procedures in the book Statistical Procedures for the Medical Device Industry.


Design verification studies are confirmatory studies to ensure the product design performs as intended.  They make pass/fail decisions as to whether the product’s design outputs (specifications, drawings) ensure each design input requirement (requirements definition document) is met or not.  The procedure determines which design requirements require statistical methods to verify them and then helps to select the best method.

Note:  It is the goal of Process Validation to show the process is capable of meeting the specifications for the design outputs.  It is the goal of design verification to show that the entire specification of the design output result is product that works (meets the design inputs).

Design Output


    Sometimes Statistical Methods are not Required – When no variation

  • Rules for deciding if testing 1 unit is sufficient.  This is for requirements where there is no variation in performing the test.  One example is logic testing of software where each test is performed once because the same result will occur each time the test is performed.  In software validation, the focus is identifying all the logic paths and testing each one once as part of a test script.  This option and approach also applies to other features such as color and presence of a feature.

   There are  Several Alternatives to Sampling Plans – When there is variation

  • Demonstration by Analysis.  Examples of Demonstration by Analysis include tolerance analysis, finite element analysis, and models developed by designed experiments.  Testing is sometimes, but not always required, to demonstrate the model is predictive.  For example, a confirmation run at the optimal settings following a designed experiment.  The model is then used to demonstrate that as the design outputs vary over the design space, the product performance meets the design inputs.
  • Worst-case testing allowing 1-5 units to be tested at each worst-case setting.  Worst-case conditions are the settings for the design outputs that cause the worst-case performance of the design inputs. When worst-case conditions can be identified and units can be precisely built at or modified to these worst-case conditions, a single unit may be tested at each of the worst-case conditions. This ensures the design functions over the entire specification range.  This approach is generally preferable to testing a larger number of units toward the middle of the specification range.  When units cannot be precisely built at or modified to these worst-case conditions, multiple units may have to be built and tested.

   Instructions for Selecting Sampling Plans 

  • Sampling plans are selected based on the confidence statement that can be made if they pass.  For example, 95% confidence more than 99% of the units meet the requirement (denoted 95%/99%).  When a sampling plan is required, instructions are provided for linking the confidence statement to risk using a table like the one below.
Product Risk/Harm Level Design Verification Reduced Level (Stress Tests)
High 95% / 99% 95% / 95%
Moderate 95% / 97% 95% / 90%
Low 95% / 90% 95% / 80%
  • For each confidence statement, numerous sampling plans exist.  For example the following table from STAT-12 contains several attribute single and double sampling plans that all make the same 95%/99% confidence statement.
Type Parameters AQL
Single n = 299, a = 0 0.017%
Double n1 = 320, a1 = 0, r1 = 2, n2 = 256, a2 = 1 0.069%
Single n = 473, a = 1 0.075%
Double n1 = 327, a1 = 0, r1 = 2, n2 = 385, a2 = 2 0.094%
Single n = 628, a = 2 0.13%
Double n1 = 327, a1 = 0, r1 = 3, n2 = 582, a2 = 3 0.15%
Single n = 773, a = 3 0.18%
Double n1 = 330, a1 = 0, r1 = 3, n2 = 719, a2 = 4 0.18%

Strategies for Reducing the Sample Size – For when a sampling plan is used 

  • Variables data – having a measurement instead of attribute pass/fail results, allows variable sampling plans to be used.  They require as few as 15 samples in contrast to the minimum of 299 above.
  • Stress testing – testing a small number of units using a method that induces more failures than expected in the field.  Note the stress test column in the table above.   This can include design margin as described in Appendix A of STAT-03.
  • Multiple tests on the same unit.  A sample size is 30 means 30 tests are required.  Under certain circumstance if may be possible to test 3 units 10 times each.

15 thoughts on “STAT-04: Statistical Techniques for Design Verification”

  1. Hello,
    Thanks for this clear sample plan.
    I have a question about these sentences “Multiple tests on the same unit. A sample size is 30 means 30 tests are required. Under certain circumstance if may be possible to test 3 units 10 times each.”
    Can you detail these circumstances please?

    Have a great day.
    Best regards,

    1. One example is hardware like a glucose meter where 3 units can be tested 10 times each. It assumes the difference between the meters is small compared to the overall variation in the data. The key is repeated measures on the same unit vary as much as measurements on different units. An example of where it cannot be done is when the 10 measurements on one unit are close together but very different than the 10 measurements on the second and third units.

      1. In order to justify using a small number of units with multiple runs to achieve the desired samples size, would I first have to show that the variance between the units is negligible using an F-test? I am proposing the following to save on unit cost for verification testing
        Based on risk, I want to show 95%/95% confidence and reliability. I need at least n=59 a=0 for my sampling plan to achieve this. I don’t want to test 59 individual units because the units are costly. If I run 3 units with 10 runs each and do F-tests to show that the units have equal variance then is that enough justification to allow me to pool the 60 samples among the 3 units for reliability and confidence acceptance criteria?

        Thank you for your expertise and input.

      2. What would be best way of justifying this assumption using n=3 units with 10 runs each?, “It assumes the difference between the meters is small compared to the overall variation in the data.”

        Would you do an F-test on the 3 units to show that there is equal variability among the units to then justify pooling the multiple runs (n=10) on each unit?

        1. By performing a variance components analysis and demonstrating the between-lot standard deviation is less than 30% of the total standard deviation. Variance components analysis is covered in Appendix C of STAT-08. An example is given in STAT-03, Appendix C.

  2. Where did 15 come from in the following paragraph:
    “Variables data – having a measurement instead of attribute pass/fail results, allows variable sampling plans to be used. They require as few as 15 samples in contrast to the minimum of 299 above.”

    1. My tables of variables sampling plans range from 15-100 samples. The 15 is due to the assumption of normality. Less than 15 samples does not provide enough data for verifying the normality assumption. While 15 is minimum, I would generally recommend 50 samples for mormaility test. 15 is less than I would like but still tolerable. It is sufficient to detect larger departures from normality which have the largest effect on the variable sampling plan.

      1. I’m new to testing for normality… am I understanding this correctly:
        – Quantity 50 is suggested minimum per best practice but it might go as low as 15; but neither of these numbers are a hard rule or are published in a whitepaper/journal etc?
        – I’m seeing people on forums claiming normality with a sample size of 3 using the Sharipo-Wilk test… which makes no sense to me because normality tests don’t actually confirm normality. All these people really did was prove that the data was no too grossly non-normal to fail the test?

        1. 50 is my suggested number for testing for normality relative to acceptance sampling. Too large of number can lead to rejecting the normal distribution for small departures which have little effect on the sampling plan. I suggest 15-100, ideally 50.

          You are correct that normality tests do not prove normality. Instead, they detect sizeable departures from normality. The size of the departure detected depends on the sample size. Less than 15 is too few samples. More than 100 is too many samples. With 3 samples, the normality test will almost always pass even for very nonnormal data.

  3. What is your say about lot to lot variation requirment for design verification? As you said intent of design verification is about meeting design output with design input, so I don\\\’t think getting calculated sample size from different lots is required for design verification

    1. Different lots are not required but a representative sample of the design space is. The samples should cover the majority of the design space or some other strategy like worst-case testing or demonstration by analysis should be used. All are explained in STAT-04. If samples cover a small portion of the design outputs, while the resulting units may perform acceptably, production may use a larger portion of the design space and not function as well.

  4. Is the product risk/harm level mentioned in your design verification sampling table pre-mitigated or post-mitigated? If this is post-mitigated, Isn’t design verification the mitigation of the risk. So how can we test for a post-mitigated risk level when we have not completed design verification yet?

    1. In my book for Design Verification, I mention that the confidence statement should be set based on Risk/Harm. On pages 2-3, I suggest basing the confidence statement on Severity and possible P2 as the best approach. The confidence statement proves the Occurrence (O) or P1 components of risk are at an acceptable level for the associated Severity and P2. This means the confidence statement is based on neither the pre-mitigated or post-mitigated risk, just Severity and P2.

      I recognize some companies use RPN instead. In this case, I recommend that Severities 4 and 5 be assigned to the highest level regardless of RPN. RPN is then only for lower severity items.

  5. Thanks for all theses precious information. I’d like to make sure I understand how to use stress tests to reduce the number of samples. If I have to demonstrate 95/95 with a test by attribute, does a margin of 10 on the test conditions allow me to use 5 samples instead of 59 initially. Thank for your expertise.

    1. If the stress test causes 10 times as many units to fail as during normal usage, then 5 samples can be tested under the stress conditions rather than 59 under normal usage conditions. 5% failures under normal usage correspond to 50% failures under stress conditions.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top