Have your keyword rankings really increased? How to track volatility
If you’re tracking keywords, then you’ll be monitoring the impact of your activity on the rankings to make sure they’re at the very least, not falling. But you’ll probably be tracking them to monitor your performance to make sure they’re increasing. But how do you know that the rankings of your keywords have truly increased?
Please note this post will contain nerdy stats! Read on if you’re brave enough…
The rankings of keywords always have a natural variability. It’s this variability that you would have to take into consideration when assessing your performance. Your keyword set might have a reasonably large mean ranking increase, but it is possible that the natural variability of your keyword set is large enough that the increase is not significant. The table below shows an average increase in rankings, but it is not a statistically significant ranking increase. We’ll explain why later, as well as showing an example with a statistically significant smaller ranking increase.
To provide some statistical evidence that there has been some significant increase (or decrease), then you can perform a ttest on the rankings. In general, a ttest tests whether there are significant differences between two datasets. In this case of keyword rankings, we would perform a paired ttest. This is so that you would be testing the average differences between the rankings of your keywords, rather than just the overall average ranking. Luckily, this can be done quite easily in Excel. You don’t even have to look at any statistical tables!
So how do we do this?


 Calculate your ranking differences.
a) Make sure you calculate the old ranking minus the new ranking so that a ranking increase is positive!
 Calculate your ranking differences.





 Plot a histogram to ensure the datasets looks like a bell curve or normal distribution^{1,2}.
a) If you’re not familiar with the normal distribution and want to know more, try here.
 Plot a histogram to ensure the datasets looks like a bell curve or normal distribution^{1,2}.







 Count the number of keywords in your keyword set, using the COUNT function on the Difference column.







 Calculate the mean of the difference in rankings, using the AVERAGE function.







 Calculate the standard deviation of the differences, using the STDEV.S^{3}







 Calculate the standard error^{4}, by dividing the standard deviation by the square root of the sample size.







 Calculate your degrees of freedom^{5}.
– If you’re not familiar with degrees of freedom and want to know more, try here.
 Calculate your degrees of freedom^{5}.







 Calculate the value of T, by calculating the Mean divided by the Standard Error.







 And finally, calculate your pvalue^{6}, using the T.DIST.RT function.



And, that’s done. Now for interpretation.
Depending on what you are testing, you will have to interpret your pvalue differently. The table below illustrates how you should interpret your pvalue.
Figure 1: Final Solution
And that’s all there is to . In the example above, with a pvalue less than 0.05, we can clearly see there is a significant increase for the rankings. Below, we reconsider the example with the larger mean. We can see that the pvalue is approximately 0.09, so it is not a statistically significant increase despite having a larger mean!
And there you go. With a little bit of statistical rigour, you’ll now be able to say to at least 95% confidence that there is a ranking difference!
TL;DR




 You can have an average ranking increase but not be statistically significant.
 It’s not just rankings that matter. The volatility of your rankings matters too.
 Statistics more useful than you know!



Notes:


 If your dataset is not normally distributed, then you’ll have to transform your data to be normally distributed or choose a different test, but this is beyond the scope of this post.
 With a large enough sample size, the ttest can be used on nonnormal data.
 We use the STDEV.S function as we require the sample standard deviation.
 Standard Error accounts for a smaller sample size. A larger sample size would result in a smaller standard error.
 Degrees of freedom is how many values of a dataset can change without affecting the outcome.
 A pvalue is the probability that the rankings are not significant increased/decreased.
 You could go further and calculate the confidence interval for the true ranking difference, but for testing whether there is a significant increase in rankings, this is probably not necessary.
