Excellent Analytics Tip#1: Statistical Significance | Occam's Razor by Avinash Kaushik

17 May 2006 12:23 am

Excellent Analytics Tip#1: Statistical Significance

We all wish that our key internal partners, business decision makers, would use Web Analytics data a lot more to make effective decisions. How do we make recommendations / decisions with confidence? How can we drive action rather than pushing data? The challenge is how to separate Signal from Noise and make it easy to communicate that distinction.

This is where Excellent Analytics Tip #1, a recurring series, comes in. Leverage the power of Statistics.

Consider this scenario (A):

Offer One Responses: 5,300. Order: 46. Hence Conversion Rate: 0.87%
Offer Two Responses: 5,200. Order: 55. Hence Conversion Rate: 1.06%

Is Offer Two better than Offer One? It does have "better" conversion rate, by 0.19%. Can you decide which one of the two is better with just 40 to 50 responses? We got 9 more orders from 100 fewer visitors.

Applying statistics tells us that the results, the two conversion rates, are just 0.995 standard deviations apart and not statistically significant. This would mean that it is quite likely that it is noise causing the difference in conversion rates.

Consider this scenario (B):

Offer One Responses: 5,300. Order: 46. Hence Conversion Rate: 0.87%
Offer Two Responses: 5,200. Order: 63. Hence Conversion Rate: 1.21%

Applying statistics will now tell us that the two numbers are 1.74 standard deviations apart and the results rate 95% statistically significant. 95% significance is a very strong signal. Based on this, and only a sample of 5k and sixty odd responses, we can confidently predict success.

Is this really hard to do? No! Simply use this spreadsheet: StatCalc.xls. (While we have a "enhanced" version of this spreadsheet this is the original file we found on the web and the file contains credit to the original author Brian Teasley.)

All you do is simply punch in your numbers in blue highlighted cells and you are on your way. This methodology can be easily apply to all facets of your insights analysis, including:

Search Engine Marketing Campaigns
Various Direct Marketing Campaigns and Offers
Any kind of % metric (% of traffic that reaches a goal from Entry Point 1 or Entry Point 2)
Differences between results for you A/B or Multivariate tests

You can easily adapt the spreadsheet, as we have, to compute statistical difference between absolute numbers (say you want to know if the difference Page Views Per Visitor or Average Time on Site between segment One and Two is Significant)

Powerful benefits to presenting Statistical Significance rather than simply Conversion Rate:

You are taking yourself out of the equation, it is awesome to say "according to the God's of Statistics here are the results…"
Focusing on quality of Signal means that we appear smarter than people give us Analysts credit for.
You take then thinking and questions out of the equation. Either something is Statistically Significant, and we take action, or we say it is not Significant and let's try something else. No reporting, just actionable insights.

Here is one more great resources for tools / spreadsheets that I would like to point out if you want to get deeper into this way of thinking:

http://www.analyticalgroup.com/sigtest.html

Two small tips:

This is a best practice but aim for 95% or higher Confidence. That is not always required but it is recommended.
"Statistics are like a bikini. What they reveal is suggestive, but what they conceal is vital." –Aaron Levenstein

Agree? Disagree? Not really a Excellent Analytics Tip? Please share your feedback via comments.

Social Bookmarks:

42 User Comments

Mark McLaren Says:
May 17th, 2006 at 05:41
Thank you for starting your excellent blog. I found you via Robbin Steif's LunaMetrics Blog.

Regarding the use of standard deviation as a means of interpreting order results. In general, I completely agree about the importance of removing bias from test results as much as possible.

What else do we need to know about the groups involved in the test?

Are they essentially the same group or are they two completely different groups? (I'm assuming you would want to send offers to as many people as possible; hence, they are the same group – less 100 people in the second case.)

How were members of the group(s) selected? Do you need a random sample in order to apply principles of standard deviation? 5,000+ is a good size group from which to draw conclusions, but I take it group members were not chosen randomly.
Jeff Leong Says:
May 17th, 2006 at 09:17
Dear Avinash,

Thank you for sharing your blog. This has been long awaited and definitely worth the time to read – subscribe to.

I'm wondering now based on this article, how this model plays in with UX decision when applying A/B tests. We often mind minimal differences and often base our decision the winner between the two.

Is there a way or method of filtering noise when the difference is minimal? Assuming all multivariate elements are correctly in place?

Congratulations on the great blog, the industry will soon be catching on!

Jeff
June Li Says:
May 17th, 2006 at 10:40
Hi Avinash,
I had the pleasure of hearing you speak at the 2005 eMetrics. I'm very happy that you've decided to blog. I too found your blog through Robbin Steif's .

It's excellent that you are giving us real examples of how statistics can be used, and providing tool references. I look forward to additional case studies and discussions.

Will you also be posting about monitoring and managing outside influences? Sometimes the Noise dampens the signal or deflects the signal.

Thanks,
Web: http://www.clickinsight.ca
Blog: clickinsight.blogspot.com
Avinash Kaushik Says:
May 17th, 2006 at 22:32
Mark McLaren: Thanks for your kind words about the post, I am glad you found it helpful.

What else do we need to know about the groups involved in the test?

Are they essentially the same group or are they two completely different groups? (I’m assuming you would want to send offers to as many people as possible; hence, they are the same group – less 100 people in the second case.)

In the specific example I used, and the spreadsheet, you control for one thing usually. you can have as many groups as you want. For example you can send one offer to people who live in CA and NY and FL and OR and OH and plug that into the spreadsheet against a control and know which works best.

Alternatively you could try 5, 6, 10 whatever number of different offers to a bunch of folks and see which one converts best.

The problem becomes when you want to test different offers to differnt groups (or many different content in different locations on the same page). Now you are in the world of multivariate and need to apply advanced statistics (think Taguchi).

Doing multivariate is awesomely powerful and yields great results, but beyond my humble spreadsheet.

Do you need a random sample in order to apply principles of standard deviation? 5,000+ is a good size group from which to draw conclusions,

The beauty of using statistics is that the standard deviations required, and amount of Statistical Significance (my suggestion of 95% or higher), will drive how big a sample you need. There is no fixed number (like 5k).

Hope this is the kind of information you were looking for.
Avinash Kaushik Says:
May 17th, 2006 at 22:45
Jeff: Glad to see your post…

I’m wondering now based on this article, how this model plays in with UX decision when applying A/B tests. We often mind minimal differences and often base our decision the winner between the two.

If we are doing a/b testing (asuming the Success Goal is clearly articulated and measurable and that it is not "impact on brand") then it would be a sin not to use the spreadsheet in the post above to seperate Signal from Noise. Simply looking at Conversion Rate (or similar metric) difference is very dangerous because of exactly what you say, how much is enough to be confident.

The great news is that most current a/b testing solution (atleast the ones that so "page testing") already include statistical computations to help us make better decisions.

If you don't see atleast 90% plus statistica confidence take the results with a grain of salt.
Jaimie Scott Says:
May 18th, 2006 at 09:34
Hi Avinash,

I too am very happy to see your blog. I found it through Clint Ivy's blog and I am enjoying reading your posts very much. I find them to be quite informative.

You say above:
"You can easily adapt the spreadsheet, as we have, to compute statistical difference between absolute numbers (say you want to know if the difference Page Views Per Visitor or Average Time on Site between segment One and Two is Significant)"

It's not obvious to me how to do this. Can you elaborate?

Thanks.
Aurélie Says:
May 21st, 2006 at 13:37
Hi Avinash,

Good to find you blogging, sharing thoughts and experiences. It's quite some interestign stuff and I hope you enjoy the experience.

I read your different posts on Saturday morning and your thoughts stayed with me for the entire week-end. Thank you.

Yes, statistical significance. I totally join you in the idea and would only add that tests that do not render truely significant results should not be communicated upon. I remember in my first job having warned of the non significance of a test only to find it had heavily influenced a commercial strategy. I vowed never again!

Another pavlovian reaction was to consider that any number of responses under 200 should not be taken into account as it holds high proability of not being representitive. I usually follow this first rule and adapt the variables in order to remain loyal to the statistical representitiveness of a sample. Quite pavlovian, I agree.

And the last thing is that I'll bare statistical significance in mind but would like to suggest another possible subject: correlation between conversion rates.

Siegert suggested this formulation for a client yesterday:
“Is a visitor engaging into A but not engaging into B, converted easier into a lead, than someone engaged into C and B?”
In other words, you've got kind of low level conversion events that influence or not higher goals.
I'm having diffculty formulating this, sorry.
Hope it made sense, keep up the good work, cheers from expensive Brussels ;-)
Aurélie
Avinash Kaushik Says:
May 21st, 2006 at 23:06
Aurélie: Thanks for the thoughtful comment, I am sorry to have spoilt your entire weekend with my posts afterall there are so many more beautiful things in life.:)

I completely agree with the care around communicating anything that is not of significance, there is always a danger that inspite of your warning the will jump into the lake.

Another pavlovian reaction was to consider that any number of responses under 200 should not be taken into account as it holds high proability of not being representitive.

(For our readers here is something on pavlovian reaction.)

In the world of Multivariate we can detect a strong signal even with small samples. We use somethings like This Page to calculate sample set.

“Is a visitor engaging into A but not engaging into B, converted easier into a lead, than someone engaged into C and B?”

Corelations are important, very, and of course my simply little spreadsheet won't account for that. Specially for complex web interactions it is important to understand the lower level conversion events might influence higher level (ultimate) goals.
Kerry Kim Says:
May 30th, 2006 at 17:37
Hi Avinash, my thanks also to you for sharing. Any additional insights you might have about the key drivers of adoption you've experienced would be greatly appreciated.

Regarding statistical significance, it appears that the reference in your post used a one tailed z test for testing whether there is a significant difference between two sample proportions. Wouldn't it have been more precise to use a two tailed test? If not, why?
Avinash Kaushik Says:
May 31st, 2006 at 00:03
Kerry: The example used was quite a simple one to show that we can accomplish much applying statistics to our standard KPI's with very little stress.

Wouldn’t it have been more precise to use a two tailed test? If not, why?

You are right, one can get quite sophisticated and get ever better results. The emphasis of the article was how to detect statistical significance in a simple case. I hope to blog more about how we can apply advanced methodologies in testing (to build on my experimentation and testing post).

Thanks for taking the time to post a comment.
Vicky Brock Says:
June 1st, 2006 at 08:51
Hi Avinash,

I so much agree on the importance of taking into account statistical significance – an essential part of the "so what" factor!

This is a neat chi square tool to test for statistical significance:

http://www.georgetown.edu/faculty/ballc/webtools/web_chi.html

I do love your blog, bye, Vicky
Hakim Aly Says:
April 23rd, 2007 at 21:27
Although a 95% CL seems to be common (other than in a medical/pharmaceutical context), in a marketing context a lower CL may be quite appropriate. As you know, the choice of significance level(or it's complement, Confidence Level) depends on the cost of being wrong.

A 5% significance level (95% CL) means there is a 5% probability of being wrong. This is Type I error, i.e., concluding that one RR% is higher than another (statistically significant)when in fact it is not. Acting on this wrong conclusion may result in incurring costs that do not yield revenue or profit to offset the costs.

Type 2 error is when one does not reject the null hypothesis when if fact it is false. In this
case, the cost associated with the decision to not roll out a marketing tactic is the foregone revenue/profit that would otherwise have been generated.

In many situations, the cost of Type II error exceeds the cost of Type I error. Clearly, a trade-off is involved, but a lower CL of 90% of even 80% may not be out of line.

Ultimately, each business needs to decide for itself what an appropriate CL is for purposes of assessing test results.

Would be interested in your thoughts. Hakim
Hakim Aly Says:
April 23rd, 2007 at 21:35
Regarding the question of 1-tail vs. 2-tail test, the former is appropriate when one wants to determine whether one RR% is statistically HIGHER (or LOWER) than another. The latter is appropriate if one wants to know if a RR% is DIFFERENT FROM another.

I would suggest that in most marketing situations, we are more interested in the former (higher than) than the latter (different from).

In a few cases, we may want to know whether a proposed course of action may harm the response rate, in which case a 2-tail test would be appropriate.
Curtis Says:
August 16th, 2007 at 23:44
Thanks for the great insight. Do you have more details on exactly how the statistical significance is calculated? I'm curious how you derived the std deviations in the scenarios above with just the sample size and order counts.
pabitra chatterjee Says:
November 4th, 2007 at 23:11
this is to continue where mr hakim aly left. you may find this little piece at my blog, http://directindia.blogspot.com/2007/10/no-beta-yes-risk.html, interesting.

i've also given links for templates in the piece.

pac
Web analytics en statistische significantie – Onetomarket Blog Says:
January 28th, 2008 at 08:18
[...] Hiervoor kun je statistiek gebruiken, en dan met name statistische significantie. Je hoort het wel eens in reclames of je ziet het bij onderzoeken in de krant staan. In web analytics loop je de term ook tegen het lijf, zoals in het boek “Web analytics an hour a day” van Avinash Kaushik. Het stuk over statistische significantie staat ook op zijn blog. Wat betekent statistische significant nou eigenlijk, en hoe werken de tools die hij noemt? [...]
Philip Says:
February 18th, 2008 at 10:38
The Analytical Group link for a free spreadsheet download appears to have changed. It's now http://www.analyticalgroup.com/sigtest.html (with html instead of htm).

Thanks for all your great articles, Avinash!

Note : Thanks very much for the correction Philip! -Avinash.
Jbuser Says:
February 20th, 2008 at 13:56
Avinash,

As a stats guy, I am a little concered with the assumptions behind the model. I downloaded it and the first thing I noticed, was that it makes some pretty large assumptions with confidence levels (anything with z-score (I am assuming) of 1.65 and 2.33 = 95%). I understand that this is probably there to make things "easy" but I think it can be misleading. Also of note, was that IF their is a z-score assumption (which unsure of), there are some other assumptions underneath the covers (which I coudn't get to), and z-scores are only for known pop means and sta. dev. Do you know what Brian is using? Is it possible to get this information?

Finally, one concern with the "plug and chug" nature of the spreadsheet is what you always must be wary of, and that is making statistical significance a badge of honor. All it tells you is given the values you have, is there a difference between the two. What you must do, more than anything else, is make sure your testing methods are solid BEFORE the test. Otherwise, you are going to be putting in values after values and getting significance or not and making some very important decisions when the whole test could be wrong. Practical vs. Statistical.
michael choe Says:
March 27th, 2008 at 14:05
all -

i share similar concerns as jbuser…

for example, 1.74 standard deviations is not 95% significance. 1.96 standard deviations is 95% confidence.

also, for computing standard deviation (s) of 2 or more proportions (in this case, conversion rate), i think it's a good practice to assume the largest margin of error. in this case, margin of error = 2 * sqrt(0.5^2/n), where n is the size of your smallest sample. this is what pollsters such as zogby do when communicating poll results about hillary/obama, etc.
Tyranny of numbers — checking for statistical significance | Ubermarketer Says:
March 30th, 2008 at 11:21
[...] RKG on finding statistical significance in two Adwords tests Interesting commentary on the value (or lack-of-value) in copy testing in PPC Avinash Kaushik on separating signal from noise with statistical significance [...]
Barbara Says:
April 13th, 2008 at 06:15
Hi Avinash,

Thanks a million for your great posts. I just got into web analytics and have found both this blog and your book extremely helpful. You mentioned utilizing the statcalc.xls spreadsheet to measure significance between pageviews. Can you kindly advise on how to do this? Your response is eagerly awwaited. :)
Andrew Blank Says:
July 16th, 2008 at 07:45
The link for more advanced stats – http://www.mwrms.com/wwwRMS/DirectMarketing/MarketingCalc2.asp does not seem to have anything to do with stats anymore. Unfortunately I couldn't find a live replacement.
Web analytics en statistische significantie – Onetomarket Says:
December 9th, 2008 at 08:06
[...] Hiervoor kun je statistiek gebruiken, en dan met name statistische significantie. Je hoort het wel eens in reclames of je ziet het bij onderzoeken in de krant staan. In web analytics loop je de term ook tegen het lijf, zoals in het boek “Web analytics an hour a day” van Avinash Kaushik. Het stuk over statistische significantie staat ook op zijn blog. Wat betekent statistische significant nou eigenlijk, en hoe werken de tools die hij noemt? [...]
Lea SP Says:
February 18th, 2009 at 13:17
What an insightful post Avinash.

For someone who is just approaching web analytics from a statistical perspective, I'm constantly asked whether a particular report figure is "statistically significant" in terms of sample size. I readily understand the statcalc worksheet purpose of seeing whether the difference between two metrics is stat. significant, but how do you know if the metrics have enough sample size to gauge effectively?

For example, I have two conversion rates: 0.7% and 8.8%. The sheet shows that there is a 99% confidence that these are statistically different which is a good start. But my rates are based off of A(4,155 clicks & 6 conversions) and B(80 clicks and 7 conversions). Are the conversion samples too small to be effective indicators in this case? What is an acceptable threshold here, and how do you find it on a case-by-case basis?

Thank you!
Garry Przyklenk Says:
May 6th, 2009 at 05:17
Avinash, this post is giving me nightmares and heart palpitations.

My problem is similar to Lea SP's, but on steroids. I have two sample populations:

Pop A: 200,000 participants, 800,000 conversions
Pop B: 1,000 participants, 8,000 conversions

Unfortunately, conversions in this case don't generate direct cash, or else I'd be buying an island somewhere, and not worry about calculating significance!

But obviously, there should be something to be said of Pop B having way lower sample size and not being representative or even warranting comparison to Pop A, right?

Regards,
Garry
Joshua Daniel Egan Says:
June 1st, 2009 at 06:07
Since i am new to seo,i want to implement statistics in seo . i thought that statistical techniques cannot be implemented in seo or by the data given by google analytics. so can you suggest any statistical technique with example to my mail id it will be useful for me.i will be greatful to u
Optimize Ad Texts | datumSense Says:
June 23rd, 2009 at 05:45
[...] has an excellent post regarding statistical significance. Reading this post will help you understand what is correct way to choose winners and test Ad text [...]
How to Optimize PPC Tip #1 : Optimize Ad Texts | datumSense Says:
September 22nd, 2009 at 05:19
[...] has an excellent post regarding the use of applying statistics to tell us when an ad’s outcome is statistical significant to confidently predict success. This post will help you understand the correct way to pick the best [...]
Ultimate Web Analytics Training Guide: From Click to Close Says:
October 10th, 2009 at 15:22
[...] 8. Performing statistical analysis to eliminate noise. [...]
Adrian Palacios Says:
November 29th, 2009 at 21:33
I am on my way to becoming an analysis ninja. My two roadblocks thus far are JavaScript and statistics (KPI's? check. Segmentation? double check.)

I am wondering if you could recommend any good entry-level books that could help teach me to do statistical analysis? Anything that uses web analytics scenarios is a *huge* plus.

Thanks!
Avinash Kaushik Says:
November 29th, 2009 at 23:34
Adrian: I am afraid I don't know any book on Statistics that covers web analytics scenarios. Though if you understood statistics I don't think anything would stand in your way in terms of applying it to challenges you face in your web analytics job.

Taking a Statistics 101 or 201 course at your local university might just do the trick.

This might seem odd but one of the best books I have read on Statistics is this one:

Cartoon Guide to Statistics by Larry Gonick, Woollcott Smith

Simple and effective.

Not necessarily just about statistics but this is an awesome book if you want to be a great analyst:

Fooled by Randomness: The Hidden Role of Chance in Life and in the Markets
by Nassim Nicholas Taleb

Hope this helps a bit.

Avinash.
gestion, analyses et créativité | gestion analytique | gestion créative | site Web Says:
February 14th, 2010 at 21:12
[...]
Ce qui me fascine le plus avec le concept de box-plot, ou l’utilisation des statistiques en général pour surveiller les performances d’un site Web est que ça nous permet d’être créatifs quand il faut être créatif et de faire autre chose quand tout va bien. Quand les limites de contrôle ne sont pas dépassées, ça ne sert à rien de perdre son temps à prendre des actions. Wow. Je suis quelqu’un d’intuitif et je ne comprends pas encore toute la portée de cet outil de mesure, mais j’entrevois des possibilités vraiment intéressantes.
[...]
Análisis “alternativo” del éxito de campañas CPC en B2B « Blog de Un Analista Web Says:
March 26th, 2010 at 03:04
[...]
Sin ser un estadista, creo que es básico que un analista web profesional muestre sus resultados sobre una base estadística. Sin meterme a detalles (por miedo a perderme y confundirme!) y basado en un post de Avinash que habla sobre diferencias significativas, he estado utilizando la herramienta llamada: “The Teasley Statistical Calculator”:
[...]
Making decisions with confidence – free statistical significance tool | Value Propositions Says:
April 30th, 2010 at 22:23
[...]
Here’s a drop-dead simple statistical significance calculator tool, an excel spreadsheet by Brian Teasley, which I think you will find useful. I know I do.

For more on this, read
Excellent Analytics Tip#1: Statistical Significance by Avinash Kaushik
[...]
Dave Rekuc Says:
November 1st, 2010 at 08:18
Thank you for the easy to understand description above. I've found it very useful, however, I work for an ecommerce site that has a price range of anywhere from $3 an item to $299 an item. So, I feel like in some situations only looking at conversion rate is looking at 1 piece of the puzzle.

I've often used sales/session or tried to factor in AOV when looking at conversion, but I've had a lot of trouble coming up with a statistical method to ensure my tests' relevance. I can check to see if both conversion and AOV pass a null hypothesis test, but in the case that they both do, I'm back at square one.

Can anyone recommend a statistical method for this scenario?

Thanks Avinash for the article, love your blog (and books)!
Rags Srinivasan Says:
November 1st, 2010 at 20:55
Dave,

You are correct in stating that looking at conversion rate alone is looking at one part of the puzzle.

When you have items that vary in price, like you said from $3 to $299, your test for statistical significance of difference between conversion rates assumes an implicit hypothesis that is treated as given.

A1: The difference in conversion rates does not differ across price ranges.

and your null hypothesis (same, just added for completeness)

H0: Any difference between the conversion rates is due to randomness

When your data tells you that H0 cannot be rejected, it is conditioned on the implicit assumption A1 being true.

But what if A1 is false? Either you explicitly test this assumption first or as simpler option, segment your data and test each segment for statistical significance. Since you have a range of price points I recommend you test over 4-5 price ranges.

This is same as the case when you are A/B testing simple conversion rates and treat the population as the same (no male/female difference, no Geo specific difference etc).

Hope this helps.

-rags
Dave Rekuc Says:
November 2nd, 2010 at 06:01
Thank you Rags, very helpful. I'll use the segmentation method in my next test. Unfortunately, this means waiting for a larger sample size than non-segmented data. However, I suppose it's worth it. Thanks!
Web Analytics TV #14 – Just Wow | Google Analytics Blog Says:
November 19th, 2010 at 19:25
[...]
Avinash describes how to calculate statistical significance with analytics data
Integrating a PayPal shopping cart with Google Analytics
[...]
The Hidden Hypotheses We Take For Granted « Iterative Path Says:
February 13th, 2011 at 10:42
[...]
Hidden in this hypothesis testing are many implicit hypotheses that we take as truth. If any one of them prove to not true then our conclusion from the A/B testing will be wrong.
Dave Rekuc ,who runs an eCommerce site, posed a question in Avinash Kaushik’s blog post on test for statistical significance and A/B testing. Dave’s question surfaces the very issue of one such hidden hypothesis
[...]
Avinash responde a mi pregunta | boluda.com Says:
February 15th, 2011 at 00:38
[...]
En cuanto a decisiones más importantes, como por ejemplo decidir con que campaña nos quedamos basándose en su porcentaje de conversión, debemos basarnos siempre en datos significativamente estadísticos. Para poder comprobar si nuestros datos cumplen ese criterio, nos recomienda leer su artículo “Excellent Analytics Tip#1: Statistical Significance“, en el que explica como calcular ese factor, y donde nos podemos descargar un Excel que nos lo calcula todo. ¡Gracias Avinash! :)
[...]
» Uncertainty in web optimization Crunching the Web Says:
March 31st, 2011 at 22:47
[...]
You have probably at some point used a spreadsheet like the one explained in a post by Avinash Kaushik to test for differences between two conversion rates. This spreadsheet uses the same method as I described in Multivariate testig part II: Associations where we assume that the conversion rates are independently normally distributed.
[...]
Analysing A/B Tests Beyond Visitors and Conversions | Nathan Jackson Says:
April 20th, 2011 at 03:31
[...]
Statistical Significance
Avinash Kaushik has a great article on applying statistical significance in A/B testing. I’d advise you to read it!
[...]