Out and about in the world of IT, I tend to see a great deal of variety as I meet with different organizations and individuals. One thing that a lot of people (especially without a technical background) tend to be unclear about is the basic reason for the existence of statistics. Improvement initiatives like Six Sigma rely heavily on statistics and it is a good idea for those weak in this area to strengthen up and learn a bit more about it. The purpose of this week’s post is to get folks started with a high level summary of the topic and those that are interested can research the topic further online.

Statistics exists mainly because you cannot measure everything. Let me illustrate this with an example. Let us assume that I own a paper clip manufacturing company. Now this company is manufacturing one million paper clips a day utilizing four different machines. Can I measure and test each of the million paper clips being produced every day? I would require a staff of at least 10,000 to do that which would drive me into a loss making state very quickly. So what do I do? I take a “sample” of the 1,000,000 clips being produced (also known as the “population”). The derivation of the sample could be performed in many thought provoking ways. As there are four machines, perhaps a sample of 1,000 clips could be taken from each machine on the hour every hour for a total of 32,000 clips to be tested for defects. This way if a particular machine is malfunctioning, it will be quickly and easily spotted. Of course, there are many permutations and combinations of deriving the sample units from the population, this being only one of many.

Astute readers will have noticed one problem with all of this and it is the following: we produced 1,000,000 paper clips and we only tested 32,000. How do we know that this sample accurately represented the population? What if we only tested the 32,000 that were good and the remaining 968,000 are bad? And this is where statistics helps us. Not only can we perform useful operations like mean, median and standard deviation on our sample, we can use statistical techniques to tell us how accurately the sample’s data co-relates to the population itself. So, in our example, we can say that the sample of 32,000 turned out to be 98% defect free and we are 90% sure that the remaining units of the rest are the population also are 98% defect free. This, ability to predict quality levels of the units that were never tested is the chief strength of statistics and the various techniques of statistics that exist. Of course there are other applications of statistics, but this is the primary one.

I speak of statistics this week because it is about time that IT organizations start utilizing all the tools available to them through this discipline and improving their efficiency. There are organizations that utilize function points and advanced statistical techniques in a big way and they are at levels of efficiency that are going to be very hard to beat. It’s time for the others to get going.