center magnified1Two truths:

1] Turns out the readers of Occam's Razor are exceptionally gifted, they understand the challenges of web data

2] They are deeply motivated to do something about it, just not totally sure what.

This is a special unplanned post just for you, to help with issue #2.

My last post, Web Data Quality: A Six Step Process To Evolve Your Mental Model , unleashed an unusually exceptional set of comments from you all (sweet!). Today I want to share a cogent set of "next steps" ideas.

Practical strategies to deal with the problems you highlighted, nuances you can exploit, things you need to give up on, things you might consider doing more, bosses you need to ditch (!!).

Here's my core premise:

Uno.You understand that data collection is imperfect (even as we collect more and better data than any other channel on the planet bar none).

Dos.You accept (or soon plan to) the Six Step Mental Model on data quality .

Tres.Your gallant efforts to make progress with the data you have are stymied by the Overlords (or as we often lovingly refer to them as: the HiPPO's).

You there? Then let's go rock this thing.

the big ten 11

My recommendations, mostly in the order of importance:

#1: Give up. Pick a different boss.

#2: Educate them about the "perfect" source they love.

#3: Distract your HiPPO's from data quality by giving them actionable insights.

#4: Dirty Little Secret One: "Head" data can be actionable in the first week / month.

#5: Dirty Little Secret Two: Data precision actually goes up lower in the "funnel".

#6: Realize the solution to your problem is not implement one more tool!

#7: Pattern your brain to notice when you've reached Diminishing Margins of Return.

#8: If you have a small site, you have bigger problems than data quality.

#9: Be Aware of two upsetting distractions: Illogical customer behavior. Inaccuracy benchmarks.

#10: Remember you can fail faster on the web.


The rubber meets the road now. . . .

#1: Give up. Pick a different boss.

Did you think I was kidding?

There is a entire generation of leaders in place today that don't get it. Many of them, sadly, will never get it. I don't blame them. They have seen the world in one way and they can't change now.

We simply have to wait that generation out. For now we have to wait for them to get promoted / take on other life challenges.

When I have found myself in situations where there is just no chance of movement in the HiPPO's mental model, I try to switch bosses.

tough boss1

Life is too short. There is too much money to be made. There are too many customers to be satisfied. Why waste your time?

If you can move on.

Find someone who is open to accepting the new data quality mental model. Someone who will take actionable recommendations and action them (even if they agree to just try one or two things first, perfectly ok).

One I have that small opening I work really really hard to make my new boss a hero. When I have done my job well the impact of that is huge. For me, for the boss and in turn on the company.

I realize that if you work at a small company this is a non-choice, you have one boss and She's all your company's got. In that case try to see if any of the things recommended below work.

Meanwhile remember to polish up your resume so you can find a better place of employment in case it simply does not work out. [The economic climate is bad right now, but it won't always be that way.]

#2: Educate them about the "perfect" source they love.

[Important: I am NOT saying that it is every a good idea to say: "look I am better because your favorite child is not perfect either!".]

More than once I have gotten a more open mind after I detailed to my executives the (irrational) faith they put in other sources of (what they don't know are) imperfect data.

Take TV as an example (a fav of med-large companies, sorry not small ones).

Nielsen uses a few thousand people (between 18k – 30k) to measure the viewing habits of 200 million plus Americans. I am sorry but in this world fragmented consumption (tons of choice) it does not matter how much sophisticated math you put on that data to account for anomalies, you are left with high grade non representative "data".

Consider this: Even Big 3 network CEO's have been forced to put back shows that they canceled because of "low" Nielsen ratings only to be astonished by massive fan rebellions or huge DVD sales.

Just imagine what happens to a 18 to 30k dataset's capacity to measure the non-major networks or the really long tail. CurrentTV anyone? : )

Yet those ratings and GRP's are taken as God's own word.

comparison golden egg with white egg1

Even with 30% inaccuracy and the third-party sub optimal Omniture's cookie your web analytics data is better than that.

Or here's another one. Try to really understand the impact of a 180k panel data set from ComScore that monitors a couple hundred million Americans (in a even longer tail and more fragmented than TV world of the web). Contrast that with data that comes from HitWise (15 mil). Or is in the Google AdPlanner. Both substantially better (for similar data).

Yet the former is accepted as the truth. The latter are not. Because your HiPPO does not know any better.

1942 quit india postage stamp1Start a revolution:
1) Solve the major problem: Educate yourself. This is often the key flaw.

2) Present a dispassionate and non-personal education of each data source and its value.

3) Highlight how Web Data is less imperfect (if that is what you find) and how it provides more information (missing in other sources).

4) Ask for implementation of actionable insights (small at first) from web data.

[Big PS: Here's what I am not saying: I am not saying Nielsen (or ComScore) is not trying hard enough. I am not saying they are not applying the best mathematical algorithms Humanity has created. The problem is not either one of those issues. It is the core data they collect and how much of it. No amount of pretty Math can now accommodate for the new world order of content consumption on TV in their old word data set.]

#3: Distract your HiPPO's from data quality by giving them actionable insights.

Dazzle them with your intelligence!

Like you distract a baby by jingling your key chain.

This is what I am talking about:

Change the focus from silly unactionable aggregated numbers like Visits or Avg Page Views Per Visitor etc. Instead you can find key sources of traffic. You can run controlled experiments to measure offline impact. You can figure out how to get existing website customers to buy more or more frequently or abandon carts less.

web metrics analysis insights1

Because your Senior Management does not know what the heck to do with total Visitors and the caveats associated with Unique Visitors they send you back to the data quality torture chamber. If you can distract them by giving them interesting insights they'll focus on the value.

[I have the privilege and the good luck to speak to lots of C-level folks at conferences or 1:1 meetings. I want you to know that I never lose an opportunity to educate them about the data quality issue and why they MUST look past it and focus on taking action. I am doing this every day. Every week. Every month. My tiny contribution to the Cause.]

#4: Dirty Little Secret One: "Head" data can be actionable in the first week/month.

I don't know why many people wait for 18 months to implement Omniture completely. Or WebTrends. Or NedStat. Ok ok ok, or even Google Analytics! : )

Yes the implementation has to be "complete" (translation: never going to happen). But there are things that are "big enough" (head) in the first week and getting complete data for them is irrelevant because it won't change your decision / insight.

Some of your data is good enough very quickly (dare I say even if not all your pages are tagged or you are still using third party cookies or have minor implementation issues).

data quality actionability long tail1

You job during Week One is to look for the "head data", places with big numbers / happenings.

Say your imperfect data shows that 60% of your traffic comes from Google and the keywords "Avinash rocks", "Michelle is awesome" and "HiPPO's stink" account for 40% of that traffic.

You can start taking SEO / PPC action right away because marginal improvements in those big numbers won't really change what you do.

Or say you find, surprisingly, is sending you huge traffic to a part of your website that is related to porn (what!). You can start moving on that now.

Or the bounce rate on your home page is 65% (kill me now!).

Some things you don't want to know with full confidence before you start moving.

I recommend your web analytics approach have a more nuanced approach.

Tell your boss: "We have to start moving on these things because the numbers are large enough and they indicate we need to monetize opportunity x / we need to fix problem y. But as to how many people look at your bio on our website, I am afraid we might have to wait a little while on that "tail data" until after we complete our audit."

: )

#5: Dirty Little Secret Two: Data precision actually goes up lower in the "funnel".

What funnel you say?

Here's the one I am thinking about:

All site visitors ->
the # that see category (main cluster) pages ->
the # that see product pages ->
the # that add to cart ->
the # that start checkout ->
the # that abandon ->
the # that make it through ->
revenue, leads, average order size, etc.

As you go deeper into the "funnel" you are dealing with fewer and fewer people / visitors / sources / keywords / pages / vagaries of nature.

The implication of having done all normal things (start at the bottom of the funnel1tagged your site completely, are using first party cookies and the right ecommerce tag on your thankyou.html page) is that there will be few things that could mess up data at the end of your "funnel". The dataset is smaller, impacted with fewer vagaries of nature.

So when you start your web analytics journey start at the bottom of the funnel and not the top. You won't find yourself mired in quicksand on day one. And it is easier to reconcile data at the bottom of the funnel.

Compare your orders in IndexTools with your ERP system. Compare your leads in Google Analytics with Salesforce. They won't match, but it will be a million times easier to discover why (when compared to reconciling sources of data or average page views per visitor).

Here is the other psychological beauty: You know my utter devotion to measuring Outcomes. You start at the bottom of the funnel and you are starting with measuring Outcomes (inc rev, reduce cost, inc loyalty). Guess what? All HiPPO's LOVE Outcomes.

By the time you get to the top of the funnel 1. You'll actually be smarter and 2. Your management will be significantly more evolved in their thinking.

#6: Realize the solution to your problem is not implement one more tool!

Talk about compounding your problem.

I know bigamy, on surface, sounds really attractive. It is not. Monogamy rules.

I know. I know you prefer the former. : )

You believe data collected by WebTrends is of bad quality and so you implement Omniture (believe it or not I ran into two companies that have done exactly this!). Or you think Omniture is not working right so you implement Google Analytics as well.

You are just compounded your problem.

It is hard enough to follow the Six Step Decision Making Mental Model with one tool. It takes a lot of effort to understand one tool, get it right, move on to making decisions (remember your job is not to collect 100% accurate data, it is to find actionable insights!).

are two better than one1

Two tools means reconciling a lot more, it means understanding sub nuances of two or three tools, it means chasing two vendors, it means more confusion, it means minor hell.

Remember there is nothing particularly magnificent about how Omniture collects data. Google Analytics does not have patent pending exclusive CIA techniques in its tags. WebTrends does not have any secret sauce.

Just use tags. Have 'em on all the pages. Use first party cookies. After this all tools are pretty close in data collection.

It is ok to date many, it is even ok to get engaged to a couple of 'em (hopefully at different times), marry one, then try to make that person perfect!!

I am going to get killed for that last one aren't I? :)

[PS: If you can please don't use multiple paid tools. A. You are wasting money. B. These tools come with so many svars and eprops and variables and massive customizations in implementation that reconciling data between them will make finding life on Mars look like a cake walk.]

#7: Pattern your brain to notice when you've reached Diminishing Marginal Returns.

I have come to love and adore this classic principle.

You should work to improve data quality (especially if you find problems :)). But realize that after a certain point it is simply not worth it.

diminishing marginal returns1

You can improve quality by another 3% but is the effort you put into that worth the ROI you'll get?

The fact that you'll feel good does not count.

Data quality seems to be such a holy crusade that it is hard to consciously walk away. The wise know when to walk away.

Remember your job is not to collect perfect data. Your job is to: Increase Revenue. Reduce Cost. Improve Customer Satisfaction/Loyalty.

To me the principle of Diminishing Marginal Returns is lovely because it both says you should work really really had to do the best you can but realize that beyond a certain point it is simply not work the effort.

Be rigorous about realizing you have reached that point. Then move on!

#8: If you have a small site, you have bigger problems than data quality.

You are a part time analyst, or a GAAC, hired to do Omniture analysis at a company and you find that even a 3 – 5% error turns out to be a big deal (because of small overall numbers).

Yes true. Realize that if you are a small company and a small number of people on your site then you have bigger problems than data quality.

For one perhaps focus on doing SEO to get more free traffic? Perhaps mine your existing customer data to find new ideas for product or customer sources? Maybe as an Analyst spend three weeks doing Marketing?

antique map of the turkish empire1

My point is: Is the best use of your time chasing the 5% error or getting an additional 150 people to your site (data be dammed!)?

Sometimes in life data does not become a problem until it becomes a priority.

My advice: If you are a small site focus on recommendation #4 above.

#9: Be Aware of two upsetting distractions: Illogical customer behavior. Inaccuracy benchmarks.

This will drive your bonkers but a lot of data accuracy challenges stem from the clash of the logical tools with illogical customer behavior.

Web Analytics tools expect and work on the basis of a set of logical rules.

The internet is fundamentally illogical. Because we, the inter-dweebs exhibit illogical behavior.

Now like all mostly rational beings we only behave illogically x% of the time (quickly bouncing between sites, changing our minds constantly, never seeing obvious buttons, missing relevant results etc etc).

I have never seen a case where with enough work and experimentation I could not explain even the most illogical behavior. In most of those cases at the end all I had was a regret that I did not focus on doing better things!

wrong way1

Second, if the data is not perfect why aren't there benchmarks for how "bad" the data is?

In asking for benchmarks you are asking for what Donald Rumsfeld famously called the Unknown Unknown. The impossible.

The web is such a complex ever evolving beast that getting ranges for "inaccuracy" is just not possible right now. The huge difference between how sites are built, experiences are created, technologies at play, needs of each tool for each site does not make life easier.

You know a lot of known knowns in web analytics. Take action on that. Try to identify the known unknowns (do audits using tools like maxamine or observepoint or wasp), try to fix them. Then take action.

Benchmarks can become crutches / excuses. I am kinda sorta against that.

#10: Remember you can fail faster on the web.

The greatest gift the web gives you is the ability to fail faster. At low cost.

to win fail faster1

This translates into a insanely awesome ability to take higher risk. It also means you can move fast with less than 100% confidence and in the worst case that you are 1000% wrong that you can control the amount of damage.

This is not a privilege that exists in the offline world.

If I have only 80% confidence in the data I can send a small, 1,000, email blast and test the waters to see what will happen. I can send 3 different offers to different geo's to validate my hypothesis.

I can try 5 versions of the home page and see which world because I am not designing the "you can only try once" cover of the catalog or newspaper ad.

If you had 100% confidence in the data you would commit to spending $500k on affiliate marketing. But if you only have 98% confidence you can commit to a four week pilot program with a budget of $50k. Lower risk, still the possibility of high reward, and a near 100% possibility of making a more confident decision about the remaining $450k.

Don't wait. Just go.

Ok now its your turn.

What techniques you have used in improving data quality or simply getting around the nagging problem of data quality? What was your most successful "lets all get over this and move on" tactic? If you have come close to web metrics data perfection what did you do?

Which of the above ten strategies is your favorite? Which one do you think is simply baloney? It's ok. Be honest. I can take it. :)

Thanks much.

Couple other related posts you might find interesting:

Social Bookmarks:

  • services sprite
  • services sprite
  • services sprite
  • services sprite
  • services sprite
  • services sprite
  • services sprite
  • services sprite
  • services sprite
  • services sprite
  • services sprite