A/B tests: Statistical Significance and Confidence Level

Whether advertising material, landing pages or website layouts: with A/B tests you can find out which version performs best. With this calculator it is very easy to calculate which variant is better and how high the confidence level and significance are. Furthermore: A lot of background information, formulas for the evaluation of A/B tests and calculation examples.

Evaluate your A/B test with this tool

  • At a confidence level below 95%, the difference between the original variant and the test variant is statistically not significant
  • If the confidence level is greater than 95 %, the difference is statistically significant
  • If the confidence level is greater than 99 %, the difference is statistically highly significant

What is an A/B test?

An A/B test is a test method in which two variants of a website, design elements or advertising materials such as banners (variant A and variant B) are tested against each other in order to achieve a goal. Over a certain period of time, visitors to a website are randomly played off one of the two variants. The respective conversion rate is measured. The variant that results in a higher conversion rate is then selected and implemented. Conversions are usually not measured with a data warehouse or CRM, but with an analytics program such as Google Analytics, etracker, Adobe or Piwik. For this purpose, an e-commerce tracking system is set up or conversions/events are tracked. Google Analytics offers the possibility to record events automatically.

No gambling and random hits, but hard statistical significance is to be determined in A/B tests. Confidence level and significance provide information on which variant performs better.

No gambling and random hits, but hard statistical significance is to be determined in A/B tests. Confidence level and significance provide information on which variant performs better.

Formula for A/B tests: Calculate Statistical Significance

The chi-square test serves as a means of calculation. The formula is:

\text{Chi}^2=\\\\   \frac{(\text{o}-\text{co}-\frac{\text{o}\times\text{nf}}{\text{n}})^2}{\frac{\text{o}\times\text{nf}}{\text{n}}}   +   \frac{(\text{v}-\text{cv}-\frac{\text{v}\times\text{nf}}{\text{n}})^2}{\frac{\text{v}\times\text{nf}}{\text{n}}}   +   \frac{(\text{co}-\frac{\text{o}\times\text{nc}}{\text{n}})^2}{\frac{\text{o}\times\text{nc}}{\text{n}}}   +   \frac{(\text{cv}-\frac{\text{v}\times\text{nc}}{\text{n}})^2}{\frac{\text{v}\times\text{nc}}{\text{n}}}

The variables are as follows:

  • o: Visitors/Impressions of the original version
  • v: Visitors/Impressions of the comparison variant
  • co: Conversions or Clicks of the original variant
  • cv: Conversions or Clicks of the comparison variant
  • n: Total number of Visitors or Impressions
  • nf: Total number of Visitors or Impressions without conversion
  • nc: Total number of Visitors or Impressions with conversion

The Chi-square test simply explained

Each of the four summands within the chi formula represents one of the resulting expressions:

  • A: Visitors of the original without conversion
  • B: Visitors of the comparison variant without conversion
  • C: Visitors of the original with conversion
  • D: Visitors of the comparison variant with conversion

To simplify matters, we will only talk about visitors and conversions in the following explanation. The measured frequencies are entered in a cross-table. In the cross-table, the two variants (original and comparison variant) are assigned the two characteristics (visitors with conversion and visitors without conversion) and thus produce the four above-mentioned values:

 OriginalComparison variantSum
Sum100012002200
Visitors without ConversionA: 960B: 11202080
Visitors with ConversionC: 40D: 80120

The expected frequencies are then calculated by multiplying the number of visitors of the respective variant by the total number of visitors of the respective characteristic (without conversion, without conversion, with conversion) and dividing it by the total number of visitors of both variants. The expected frequency assumes that both variants are equally likely. To illustrate this, we calculate the expected frequency of characteristic A: Visitors to the original without conversion.

  \text{expected likelihood}=\\\frac{\text{Visitors of the original}\times\text{Total Visitors without Conversion}}{\text{Total Visitors of both Variants}}=\frac{\text{1000}\times\text{2080}}{\text{2200}}=\text{945,45}

The remaining three expected frequencies are calculated in the same way:

 OriginalComparison variantSum
Sum100012002200
Visitors without ConversionA: 945.45B: 1134.552080
Visitors with ConversionC: 54.55D: 65.45120

For each of the four fields, the difference is formed from the measured frequency and the expected frequency, then squared and divided by the expected frequency:

 OriginalComparison variant
Visitors without ConversionA: 0.22B: 0.19
Visitors with ConversionC: 3.88D: 3.23

Finally, all four fields are added together to get the chi-square value:

  \text{Chi}^2=\text{0,22}+\text{0,19}+\text{3,88}+\text{3,23}=\text{7,52}

Now the calculated Chi-square-value only has to be compared with Chi²0.95 (1) and Chi²0.99 (1). 0.95 and 0.99 are the confidence level. A confidence level of more than 0.95 is generally recognised as’ statistically significant’. From 0.99 onward, the term’ statistically highly significant’ is used. The (1) stands for the degrees of freedom. A four-field matrix always has the degree of freedom one.

  • Chi²0,95(1) = 3,84
  • Chi²0,99(1) = 6,63

The difference in the example is therefore highly significant, since the calculated Chi-square value (7,52) is greater than 6,63.