Contents
- Index
Transformation
A transformation is a function applied to the data points before analyzing them. The most common transformation is the log transformation. If the original data is from the lognormal distribution, taking the log of the values will cause them to fit the normal distribution.
For every distribution, there is a transformation that makes data from that distribution fit the normal distribution. Identifying the distribution that fits the data is identical to identifying the transformation that makes the data fit the normal distribution.
Transforming data has the potential to be abused. Before deciding to transform the data, consider the following items:
1. Did a shift occur in the middle of the data? This would indicate the process in unstable. Generally the cause of the shift should be identified and eliminated.
2. Are there multiple sources that are different? For example, there might be consistent differences between the different cavities of an injection molding process. If a significant difference is detected between the cavities, consider testing each cavity separately.
3. Is data truncated? For example, the supplier might 100% inspect the components before shipping them. If product is 100% inspected and the 100% inspection removes significant numbers of out of spec units, then the data must be handled as attribute data rather than variables data. An attribute sampling plan can be used instead to demonstrate any claims about units being in spec.
4. Is there poor measurement resolution? This will be evidenced by frequent ties in the data. The Anderson-Darling and Shapiro-Wilks tests should not be used in this case. Make sure the Skewness-Kurtosis based tests are used.
5. Are there outliers? If it can be demonstrated outlier values are measurement related, they can be eliminated or replaced. One way of doing this when testing is nondestructive is to retest it multiple times and demonstrate the result is consistently different than the first test result. In which case, the result can be replaced by the average of the retests. Outliers are easily confused with long tails of nonnormal distributions, so don't be afraid to try transforming the data in this case, once reviewing the potential outliers.
6. Is there too much data? Nothing is truly normal. With enough data very small departures from normality can be detected that are of no practical concern. If there are over several hundred data points, it is better to use the estimates of the skewness and kurtosis to judge it the data is sufficiently normal.
If none of these issues are identified and it appears the underlying reason the data failed the normality tests is that it comes from some distribution other than the normal distribution, then a transformation is appropriate.