Algorithms for calculating variance
The formula for calculating the population variance:
| i | xi | xi-mean | (xi-mean)2 |
| (index) | (datum) | (deviation) | (squared deviation) |
| 1 | 5 | -3 | 9 |
| 2 | 7 | -1 | 1 |
| 3 | 8 | 0 | 0 |
| 4 | 10 | 2 | 4 |
| 5 | 10 | 2 | 4 |
| n=5 | sum=40 | 0 | 18 |
Note: Details of the variance calculation:
338 = [52 + 72 + 82 + 102 + 102]
40 = [5 + 7 + 8 + 10 + 10]
Algorithm
Therefore a simple algorithm to calculate variance can be:
double sum; double sum_sqr; double variance; long n = data.length; // the number of elements in the data array (the actual syntax is language-specific)for i = 0 to n
sum += data[i];sum_sqr += ( data[i] * data[i] );end forvariance = ((n * sum_sqr) - (sum * sum))/(n*(n-1));
Algorithm
Another algorithm which avoids large numbers in sum_sqr while summing up
double avg; double var; long n = data.length; // number of elementsfor i = 0 to n
avg = (avg*i + data[i]) / (i + 1);if (i > 0) var += (var * (i - 1) + (x - avg)*(x - avg)) / i;end forreturn var; // resulting variance