Excel is a useful tool for statistical calculations. It is important to realise however that the interpretation and understanding of the results can be very important. Excel will display results but will not explain their meaning or significance.
Would you like to learn more?
Statistics typically summarise a collection of data and may be able to forecast other values based on a limited sample. The simplest tools are the three types of averages.
|2||Journeys A||Journeys B||Type||A||B|
The Mean is the common average - the sum of the values divided by the number (count) of values. The Median is the middle value if the numbers are sorted from lowest to highest. The Mode is the most commonly occurring value within the range.
Both of the sets of journeys have the same Mean and Median and yet a glance at the data immediately reveals that they are different. There is much more variation in the values of Journeys A. A measure called a Standard Deviation can describe how tightly the values are clustered around the Mean in each set of data. The higher the Standard Deviation, the greater the amount of scatter.
|16||One Std Deviation|
|17||22.86 ± 3.72 (i.e. 19.14 to 26.58)|
|18||22.86 ± 0.69 (i.e. 22.17 to 23.55)|
These statistical techniques assume that the data is normally distributed - i.e. a single Mean value with most values close to either side of the Mean and the number of values tailing away the further you get away from the Mean - often described as a bell shaped distribution. In a Normal Distribution 68% of randomly selected values will be within 1 Standard Deviation from the Mean (i.e. 34% on either side of the mean). Approximately 95% of the values will be within 2 Standard Deviations.
In the example above, we would expect that 68% of journeys in group A will be within the range of 19.1 to 26.6 miles.
Note that Excel contains two functions for calculating the Standard Deviation. There is STDEV() and STDEVP(). The first one is for the more common situation in which there is a sample of data, while the second is for when you have all data for the entire population. In many circumstances the whole population is not known and a sample is taken in order to estimate the population characteristics (e.g. the likely number of journeys on any given day).
Having calculated the mean journey length it is reasonable to assume that this value is a good indicator of the average of all related journeys - i.e. the whole population. It is very unlikely however that the result would be exactly 22.86 again. We can use a confidence interval to define the range of values within which the real mean value is likely to be found.
|21||Description||Journeys A||Journeys B||Excel formula|
|22||Level of confidence||0.95||0.95||[H22] 0.95|
|23||Standard deviation||3.71||0.69||[H23] =STDEV(A3:A9)|
|24||Sample size||7||7||[H24] =COUNT(A3:A9)|
|25||Confidence interval||2.75||0.51||[H25] =CONFIDENCE(1-H22, H23, H24)|
The function takes the form CONFIDENCE(significance
level, standard deviation,
sample size). Note that the significance level is
A significance level of 5% means that 19 times out of 20, the mean value will be within the specified range. If relating this to the bell curve diagram (above) on 1 occasion in 20 the value will be in either of the two tails outside the central shaded area, each tail representing 2.5% of values.
|Get this information as a document
accompanied by Excel worksheets
|Click here for details about
file. It has been rewritten for Excel 2010.
|file: excel-statistics1.htm||Page last updated: Dec11||© MeadInKent.co.uk 2013||CMIDX S1 P7 Y|