Banner Image

AP Statistics

How to Calculate Standard Deviation?

Written by Prerit Jain

Updated on: 25 May 2023

How to Calculate Standard Deviation?

How to Calculate Standard Deviation?

Standard deviation is a measure of how dispersed the data is in relation to the mean.

Standard deviation is important because it helps in understanding the measurements when the data is distributed. The more the data is distributed, the greater will be the standard deviation of that data.

A low standard deviation means data are clustered around the mean, and a high standard deviation indicates data are more spread out.

The standard deviation is fixed and well-defined for a set of data and hence in analysis, it helps to predict performance trends because the level of dispersion would indicate the defined amount of variation or we can say deviation from the normal or mean value.

How to calculate standard deviation?

Let us see the step-by-step calculation of the standard deviation 

  • Step 1: Find the mean of the data;
    • The mean is calculated by adding all the data points and dividing the obtained total by the number of data points.
    • The mean for grouped data can be calculated by multiplying each data point with its respective frequency and then finding the sum of all of them and dividing the sum by the total number by the data points.
  • Step 2: For each data point we now find the square of the difference from the calculated mean.   
  • Step 3: Now sum all the values of squares obtained in step 2.
  • Step 4: now divide the obtained sum by the number of data points.
  • Step 5: Take the square root of the value obtained in step 4 and we get the standard deviation of a given data set.

The formula for the standard deviation 

Standard deviation =

    \[\sigma  = \sqrt {\frac{{\sum\limits_{}^{} {{{\left| {x - \mu } \right|}^2}} }}{N}} \]

The relation between the statistical unit called variance and the standard deviation is 

Variance which is equal to the square of the standard deviation of a data set.

The variables in the above equation are as follows:

(1) σ is the standard deviation.

(2) ∑ is the summation of the squared terms.

(3) x is the data point.

(4) µ is the mean of the data set.

(5) N is the number of data points in the set.

ap statistics practice tests and past papers download

Examples of standard deviation in action

Calculating standard deviation for a small data set

Example 1: Find the standard deviation for the given data set.
4 18 45 9 30 14 50 37 23 30

 Solution 1:

The mean of the data set is        

    \[\frac{{4 + 18 + 45 + 9 + 30 + 14 + 50 + 37 + 23 + 30}}{{10}}\]

Which gives us the mean as 26.

 Now following the steps, we need to find the sum of the squares of the difference of the data points with the mean. 

    \[{\left( {26 - 4} \right)^2} + {\left( {26 - 18} \right)^2} + {\left( {26 - 45} \right)^2} + {\left( {26 - 9} \right)^2} + {\left( {26 - 30} \right)^2} + {\left( {26 - 14} \right)^2} + {\left( {26 - 50} \right)^2} + {\left( {26 - 37} \right)^2} + {\left( {26 - 23} \right)^2} + {\left( {26 - 30} \right)^2}\]

Calculating the above we get  

    \[484 + 64 + 361 + 289 + 16 + 144 + 576 + 121 + 9 + 16\]

Which is equal to 2080.

Now we have to divide the above obtained sum by the number of data points.

    \[\frac{{2080}}{{10}}\]

Which gives us 208.

 Now we got to take the square root of the obtained value.

 

    \[\sqrt {208}  = 14.422\]

               We get the standard deviation to be 14.42

Using standard deviation to compare data sets.

Example 2: Use standard deviation to compare the given sets of data.
Set 1
: 45 68 17 34 16
Set 2: 23 20 47 73 25

Solution 2:

Following the steps, we need to find the mean 

For set 1:

    \[\frac{{45 + 68 + 17 + 34 + 16}}{5}\]

The mean for set 1 is 36.

For set 2:

    \[\frac{{23 + 20 + 47 + 73 + 25}}{5}\]

The mean for set 2 is 37.6.

We need to find the sum of the squares of the difference of the data points with the mean. 

For set 1:

             

    \[{\left( {36 - 45} \right)^2} + {\left( {36 - 68} \right)^2} + {\left( {36 - 17} \right)^2} + {\left( {36 - 34} \right)^2} + {\left( {36 - 16} \right)^2} = 1870\]

For set 2:

    \[{\left( {37.6 - 23} \right)^2} + {\left( {37.6 - 20} \right)^2} + {\left( {37.6 - 47} \right)^2} + {\left( {37.6 - 73} \right)^2} + {\left( {37.6 - 25} \right)^2} = 2023.2\]

Now we have to divide the obtained sum of squares by their respective number of data points.

We get 

For set 1: 

    \[\frac{{1870}}{5} = 374\]

For set 2:

    \[\frac{{2023.2}}{5} = 404.64\]

Now to obtain the standard deviation we need to take the square root of the above-calculated values 

For set 1:

    \[\sqrt {374}  = 19.33\]

For set 2:

    \[\sqrt {404.64}  = 20.11\]

Hence the standard deviation for set 1 is 19.33 and set 2 is 20.11 respectively.

We can observe from the obtained values the standard deviation of set 2 is larger than the standard deviation of set 1.

This indicates that the values in data set one are more varied compared to set two. 

Advantages and disadvantages of standard deviation

The advantages of standard deviation

  1. The standard deviation is always fixed and well-defined.
  2. It is very sensitive to changes in the data 
  3. It is based on all the data points in the set.
  4. It is less affected by the sampling fluctuation.
  5. Because of it being defined, it can be used for the analysis of huge amounts of varied data.

The disadvantages of standard deviation

  1. Outliers will add a huge value to the numerator when the differences are squared since squaring large values makes them even larger. The standard deviation, therefore, gives extreme values greater weight. As a result, the standard deviation is susceptible to the impact of outliers.
  2. Standard deviation assumes a normal distribution, so it may not be appropriate for data sets that are not normally distributed.

Solved examples

Q 1. A test is conducted for a class of 5 students and the scores out of 10 are as follows.

Solution 1:

1. 2. 3. 4. 5.

4 7 10 8 6

Find the standard deviation of the test result.

First, we find the mean of the data set.

    \[\frac{{4 + 7 + 10 + 8 + 6}}{5} = 7\]

Now we find the sum of the square of the difference of each data point from the mean.

    \[{\left( {7 - 4} \right)^2} + {\left( {7 - 7} \right)^2} + {\left( {7 - 10} \right)^2} + {\left( {7 - 8} \right)^2} + {\left( {7 - 6} \right)^2} = 20\]

The next step is to divide the obtained sum by the total number of data points.

    \[\frac{{20}}{5} = 4\]

Now we take the square root to get the standard deviation which is 2.

2. The average salaries of people working in different fields. Calculate the standard deviation, then interpret what the standard deviation means in terms of each field.

Marketing Education Banking Technology

Mean salary 60,000 45,000 75,000 15,000

Variance 900,000,000 25,000,000 100,000,000 16,000,000

Solution 2:

To find the standard deviation we just need to find the square root of the variance.

Therefore the standard deviation of each of the work fields is as follows.

Marketing Education Banking Technology

Standard deviation 30000 5000 10000 4000

3. Find the mean deviation when the data points and their respective frequencies are given.

Xi 10 30 50 70 90

Fi 4 24 28 16 8

Solution 3:

First, we need to find the product of XiFi

XiFi 40 720 1400 1120 720

Now we find the sum of Fi which is 80.

And then find the sum of XiFi is 4000.

Now we find the mean of the data points 

    \[\frac{{10 + 30 + 50 + 70 + 90}}{5} = 50\]

The next step is to find the mod difference of each data point from the mean.

|Xi-50| 40 20 0 20 40

Now we multiply the respective frequencies with the mod difference.

Fi|Xi-50| 160 480 0 320 320

Now we need the sum of the calculated Fi|Xi-50| which is 1280.

Mean(x)=

    \[\frac{1}{N}\sum\limits_{i = 1}^n {{f_i}} {x_i} = \frac{1}{{80}} \times 4000 = 50\]

Mean deviation about the mean=

    \[\frac{1}{N}\sum\limits_{i = 1}^n {{f_i}\left| {{x_i} - \mathop x\limits^ -  {\mkern 1mu} } \right|}  = \frac{1}{{80}} \times 1260 = 16\]

4. Find the standard deviation of the following data and round off to the nearest two decimals.

x 1 2 3 4 5

f 3 11 4 9 2

Solution 4:

First, we find the square of the data points 

    \[{x^2}\]

1 4 9 16 25

Now we need to calculate the product of f and x
fx 3 22 12 36 10

The next step is to calculate the product of f and

    \[{x^2}\]

f

    \[{x^2}\]

3 44 36 144 50

Finding the sum of f

    \[{x^2}\]

=

    \[3 + 44 + 36 + 144 + 50 = 277\]


Finding the sum of f=

    \[3 + 11 + 4 + 9 + 2 = 29\]


now we have to find the mean of the data set 

    \[\mu  = \frac{{\sum {fx} }}{{\sum f }}\]

Which is 

    \[\frac{{3 + 22 + 12 + 36 + 10}}{{3 + 11 + 4 + 9 + 2}} = \frac{{83}}{{29}} = 2.86\]

Now to calculate the standard deviation we use the below equation

    \[\sigma  = \sqrt {\frac{{\sum {f{x^2}} }}{{\sum f }} - {\mu ^2}} \]

We get 

    \[\sigma  = \sqrt {\frac{{277}}{{29}} - {{2.86}^2}} \]

Which is equal to 1.17

Hence the standard deviation of the set is 1.17.

Conclusion

Standard deviation is a powerful tool to study the various characteristics of given data, albeit it has some drawbacks particularly when it comes to more advanced situations such as Machine Learning and Regression Analysis. 

Standard deviation is a sensitive and well-proven tool to understand the behavior of data, however, it is plagued majorly by outliers which can be very frequently seen in real-world data. Hence, filtering said outliers, which can be an arduous task, is essential to apply Standard Deviation in a conceptually sound manner.

Frequently asked questions (FAQs)

What is a standard error?

The Standard error shows how closely any given sample of a population’s mean will likely be to the actual population mean. Any given mean is more likely to be a subpar representation of the true population means as the standard error increases, suggesting that the means are more equally spread.

What is the difference between variance and standard deviation?

The term “variance” refers to the average squared deviations from the mean, whereas the term “standard deviation” is determined by taking the square root of this number. Despite the fact that both metrics show distributional variability, their units are different.

What does standard deviation indicate in normal distribution?

A higher standard deviation in normal distributions denotes that the values are further from the mean. A reduced standard deviation indicates that the values are closely clustered around the arithmetic mean value.

What is the best measure of dispersion?

Standard deviation is the best measure of dispersion followed by variance.

Why is standard deviation the best measure of dispersion?

It contains information for the entire series because it depends on all values. As a result, the standard deviation can be affected by even a minor change in one variable.

References

Lee, D. K., In, J., & Lee, S. (2015). Standard deviation and standard error of the mean. Korean journal of anesthesiology, 68(3), 220-223.

Altman, D. G., & Bland, J. M. (2005). Standard deviations and standard errors. Bmj, 331(7521), 903.

Written by

Prerit Jain

Share article on

tutor Pic
tutor Pic

First Lesson Free

No Credit Card

No Subscription