# What is Cpk?

Uncategorized — By Gladys on 12 May 2008 at 9:18 amIn the industry, a lot of people have heard at least once the terms “Cpk study”. But when it comes to define it, not so much can provide a simple and clear definition, which is why we have decided to make our own inquiry.

The function of the Cpk study can be explained through this image, that was used in the forum of Six Sigma: let us say that the points for the Cpk study are darts. If your shots are falling in the same spot and form a good group, then you have a high Cp; when the sighting is adjusted so that this tight group of shots is landing on the bullseye, then you have a high Cpk.

In the video above, Keith Bower explains us the function of Cpk as well as the essential keys to be able to use this index in a relevant way. **You will find bellow the written transcription of what is said in this video.**

“In this video I am going to talk about Cpk. First and foremost Cpk is a capability index. We use it to try out how good a process is, admitting certain specification limits. We always use it in conjunction with Cp. If you remember the formula for Cp, you look at the difference between the upper space limit and the lower space limit, and then we divide it by a sixth of the deviations. That is what Cp is.

Why do we use Cpk? Well, if you think about the formulation for Cp, it does not actually care where the distribution itself is centered, it could be anywhere. Cpk on the other hand, does take into consideration where the distribution is centered; so, if you look at Cp and Cpk, and the two values are identical, let us say, pretty down close to each other, we know that the distribution is centered midway to these specification limits, and for many systems that is exactly what we are looking for.

Keep in mind that Cpk does not include a target value, as there are other capability studies that could be use for that, for example I am thinking along the lines of Cpm. All that Cpk is doing is saying whether the Cpk process is centered in relation to these specification limits. Now, these capability indexes are controversial, not many people like them, because you try to pack a heck of a lot of information in just one number.

That being said, let us talk about the important things we consider. First things first, the process itself is going to be a statistical control, otherwise how do you know that these estimates of the mean and standard deviation that are being used in the formula, are in any shape of form useful, they could be erroneous.

You know, if the process is not stable, then we do not really have an ability to put a hand on the heart, and conclude whether the process is capable of meeting these specification limits. Another problem that I see quite frequently is that people use statistical software packages to compute Cp and Cpk, when the assumption of normality simply is not valid.

I actually saw it in a conference once, where somebody had a seriously scued data. They were actually looking at cycle time data, so you have got a natural truncation of zero, but they were still laying a normal distribution over the top of this histogramm, it did not fit it well at all. Keep in mind that all models are wrong, some models are useful, and normal distribution certainly was not useful for that model, for that set of data.

So, when we are looking at capability indexes, when a statistical software prints them out, we have to worry about the model that is being used. Frequently, pretty much always, we assume of a normal distribution. If a normal distribution is not an adequate fit, we have got to do something else, maybe transform the data, maybe use a different distribution, something that will fit the data better, assuming that this distribution itself is stable over time, otherwise you are just going to fit a model to completely erroneous data, and there is no point in doing that. So, it is going to be a useful model, process is going to be stable over time, and there is something else to consider, that quite frequently are in situations where people do not worry too much about it, but they should. Here is an act of what happened in a conference once.

These are two people, here I have got Zeppy, and here I have got George:

*Keith: So, you do capability analysis.*- Zeppy: Yes I do.
*K: So the process you use for your capability analysis is stable?*- Z: Oh yes it is stable
*K: Since your statistical control is wired by a normal distribution?*- Z: Yes it is normal.
*K: What is your Cpk value?*- Z: 1.33.
*K: Oh really. And what about you George, I am assuming that your distribution is normal, and stable and all that stuff. What is your Cpk value?*- George: 1.35.
*K: So you have got 1.33 and you have got 1.35, ok. What amounts of data, how many data points did you actually use, you Zeppy how many did you use?*- Z: Oh I have got 500 data points.
*K: 500 data points? Ok, and stable over time, ok, very good. And you George how many did you have?*- G: I have got 5.

So clearly, if you do not have many data points, then when you consider the confidence interval that is surrounding the point estimate of a Cpk, in George´s case it was 1.35, but it could be as low as 0.8 or as high as 2, I do not know. In the case of Zeppy it is going to be a much tighter confidence interval because he had 500 data points.

So, when people come and say to you “Oh, I have got a Cpk of such and such a value”, you should always ask them the questions:

- Is the model good?
- Is the process stable?
- How many data points are being used to come up with that estimate?

Moreover, I think that it is more appropriate nowadays, – it has always been more appropriate – now that the computers have it in-build many times for the software programmes, look at 95% confidence intervals for these capability indexes as well.It is going to give you a, let us say, less disingenuous interpretation of what the actual capability of the process itself is going to be.

So, I hope this is useful, Cp and Cpk matters, a great deal in process interpretation.”

## Leave a comment

RSS for comments on this article