Key statistical definitions for AB Testing

Author avatar

Sean Longthorpe

AB Split Testing investigates the impact of changing typically one aspect of your site, to discover how much uplift implementing such a change you could have on your conversions or click-throughs. There’s one word in that previous sentence that has more importance than you initially think, and that is “could”. There’s no guarantee that you will ever see that uplift.

To be able to measure how likely a potential uplift is, a statistical technique that we can use is hypothesis testing. But as always with statistics, there is terminology that needs to be understood to be able to properly understand and action those results.

Hypothesis testing

This is a statistical technique to detect whether there is no difference between two samples of data. In an AB Test, we are interested in whether our variation is better than the control. In other words, will the conversion rate be better for the variation than the control. The most difficult concept to grasp here is that a hypothesis only detects a lack of difference, rather than whether there is a difference.

Significance

This is basically the threshold at which we would consider there to be a significant difference, and is typically set at 0.1, 0.05 or 0.01. This significance level determines how much weight is given to the extreme instances of a test. If you choose a smaller significance level, there would be a much smaller margin for your test to be significant with an extreme result.

Confidence

Confidence is more commonly associated with confidence intervals and is isolated to your test, however it is directly related to significance. If you want to be 90% confident, then you would set your significance at a 0.1 level. Intuitively, this makes sense since if you want to be more confident that you have a significant test, then you want a smaller margin for extreme results. So, the confidence and significance level scale appropriately.

P-values

As we said earlier (and even used before), the significance level is a threshold and it’s the p-value that is the measuring stick here. Statistically, the p-value is the probability that the test is part of the null distribution. In normal AB Testing speak, this is the probability that there is not a difference between the variant and the control, and that the difference you’ve found is completely by chance.

Each one of these plays a part in a basic AB Test, from constructing your hypothesis, to conducting your test and analysing the results. It’s important to understand that with a hypothesis test, the test will never tell you whether there is a difference, only evidence to suggest there is no difference. And even then, you control whether you determine a test is a significant test or not.

P Value interpretation

Summary:

  • Hypothesis tests are what you are testing
  • Confidence and Significance are not the same but linked terms
  • P-value is the probability that you achieved an extreme result and there is no difference between variants

Bath

+44 (0) 1225 480 480

20 Manvers Street

Bath

BA1 1JW

Leeds

+44 (0) 113 260 4010

2nd floor, 2180 Century Way,

Thorpe Park,

Leeds, LS15 8ZB.

London

+44 (0) 20 504 1330

WeWork

41 Corsham Street

London, N1 6DR

Part of the St. Ives Group

  • By pressing submit you consent for Edit to contact you via your email or telephone number for purposes relevant to your request for our goods or services. Your contact details, including your name, company, telephone number and email address will be used by Edit. By contacting you are agreeing to Edit’s Privacy Policy. If you have any questions, please ensure you review this section before submitting.

  • This field is for validation purposes and should be left unchanged.

© 2018 Edit. St Ives Group. Company reg. no. 3624881, All rights reserved. VAT Registered GB 927458295 Privacy Policy | Terms & Conditions | Cookie Policy