The cult of data has gone too far
If you ask someone for a number, you will eventually get a number.
(Like this article? Read more Wednesday Wisdom! No time to read? No worries! This article will also become available as a podcast on Thursday)
I love a good cult. Arguably, way back when I accepted employment at Google, I did in fact join a cult of sorts. It was a nice cult though, with free sushi, pinball machines, and, depending on the office, a climbing wall, a bowling alley, or a swimming pool in the basement. To stress the cult-like aspect of our company I used to ask new employees (a.k.a. Nooglers) if they had any friends or family. If they did, I would tell them to forget those because: “We are your friends and family now!” This all went fine and was even somewhat funny, until such time as Google started laying off people en-masse, which is something that most cults do not do and which drove home the fact that your colleagues are in fact not your friends or family and that the company is neither Mother, nor Father, even though they might cook for you and give you presents at Christmas.
By joining Google, I also joined the Cult of Data. I cannot quite trace the cult of data through history, but I have a feeling it originated somewhere within the consulting and tech industries. As far as I know, Amazon and Google were among the first big popular proponents of the cult and these companies relentlessly propagated the use of data in making all sorts of business and technical decisions. Using your experience and gut feeling was out and numbers were in: Cold, hard, objective numbers that don’t lie. Using data to drive decision making was a remarkable improvement in corporate decision making and so in a remarkably short time you could hear the motto of the cult everywhere: “In God we trust, all others bring data.”
Now don’t get me wrong, I love a good number and I have been known to confuse mobile phone sales associates thoroughly by quickly drawing up the equations for two competing cell phone plans and then determining where these lines intersected to find out at how many minutes of calling time one plan ended up becoming cheaper than the other one. Modern industry took this attitude to its extreme however and “show me the data” became the new “where’s the beef.”
Using data works exceptionally well for simple decisions such as which cell phone plan to get. It even works well for more complicated affairs such as figuring out if it is worthwhile to spend a month of time coding in order to improve performance by 5% or if, all things considered, you should run your own GFS cell (because that gives more reliable performance than using a shared cell). But by now we have entered an era where you need data for everything, even for things you don’t know, don’t understand, and even if you have no hope in hell to get decent numbers that you can trust.
The effective use of data to drive decisions hinges on two critical factors: First, you need to be able to create value functions that capture all of the costs and benefits of a potential decision. Second, you need to be able to accurately determine the weights and measure the values of the variables involved. If you can do that, making the decision is easy: Just run the numbers and choose the outcome with the highest or lowest score (depending on how you created the value function).
In the 1990s, we used this approach frequently for making software selections. We built extensive matrices with required features, gave each feature a weight, and scored each software package against each feature. Some multiplications and additions later we would know which solution would best meet our requirements. Unfortunately, this process frequently yielded the “wrong” answer and we ended up having to fudge the weights, adapt the scores, and add variables until the right answer won out. This was very much an in-demand skill and a friend of mine made decent money for a while as a consultant by advising government IT departments on how to draw up the equations so that their preferred solution won without immediately running afoul of European Union tendering rules.
During one of my consulting gigs, we were pitched by well-known crystal ball gazers Gartner about a product they sold where they had already done all the product investigations and scoring for certain categories of banking software. The only thing we needed to do was to provide the weights for each feature and then their product would immediately give us the winning application, saving us a lot of spreadsheet wrangling and talking to vendors. During the pitch Q&A, I asked the Gartner sales associates if their product also had a reverse option: Could we for instance select the solution we wanted to win and would their product then tell us the weights we needed to assign to make that so. It was not a popular question.
This is one of the problems of the cult of data: Data can be used rationally, but it can also be used to rationalize, and it takes quite some insight to figure out which of these two modes we are in at any given time.
Even if we are not trying to subvert the decision making process, it is sometimes impossible or impractical to determine the value function with any level of accuracy. This is of course the case with very complex problems, but also with more mundane problems that we just don’t understand well enough yet. Figuring out the right shape for the function is not uncommonly a project in itself; a project however that many do not have the stomach for because it takes too long or is too costly or both. Even if you do have a decent grasp on the overall shape of the function, the right values of the weights and/or variables might be too hard to determine accurately or might be subject to opinions and debate. This leads to guessing and other less than scientific methods to determine these numbers. To the uninitiated, data might look cold and objective, but there is often way more subjectivity to these numbers than the casual observer might suspect.
None of this is particularly problematic, as long as we are honest about it and that is typically where it goes wrong. The use of data creates a semblance of certainty that is not warranted by the process through which the numbers came to be or by the nature of the problem we are trying to solve. Scientists know it is useless to talk about a number without also discussing the confidence interval around it, but too often that confidence interval disappears in the debate and the numbers are taken as gospel, regardless of how inaccurate they are.
I have sat through many meetings where people asked for numbers that were basically impossible to come up with because either we didn’t understand the problem deeply enough yet or we just lacked the required 20/20 vision into the future. But, if you ask someone for a number long enough, you will eventually get a number. It might be a bad number. It might be a number with a huge (but undisclosed) confidence interval around it. It might be pulled out of a very tight arsehole under huge pressure. But it is a number and for the fanatics of the cult of data, that is all that matters because, well, hips and numbers don’t lie. I mean, agile is all fine and dandy, but you have to tell me now how many developers you need in October of next year, exactly what you are going to do with them, and by how much that will improve the click-through rate.
Another problem is that you can often not measure the number you are interested in. So instead of measuring that hard-to-measure number, you measure something else and hope that it can function as a stand-in. This comes with uncertainty and with the problem that you might not fully understand the underlying correlations and consequently you need to be careful when interpreting these numbers. Again, this is not a problem as long as you understand this and are honest about it. Unfortunately, both corporations and societies seem to be getting worse in dealing with uncertainty. Commonly, the politics of the situation either demand certainty or the number (simple, straightforward, clear) eclipses the nuanced explanation of how it should be used.
Numbers are often also used for setting goals. The fantastic book “Measuring and Managing Performance in Organizations” explains in quite some detail what happens when a complex real-world process is captured in some metrics and then target values for these metrics get assigned as goals for an organization. Every model of reality is a simplification and there is a lot of room for divergence between the actual performance of the real world process and the goal values of the model’s metrics. There are many spectacular examples of teams that completely let go of the real world process and only chase their numbers. For individual contributors that often makes perfect sense as not meeting your numbers typically means you will be sacrificed on the altar of the God of the Holy Number, regardless of the state of the real world process and regardless of your perfectly lucid explanation of what is going on.
Don’t get me wrong, using data intelligently is incredibly useful and has allowed many companies to make the right decisions. But the key word here is “intelligently”. Numbers should obviously not be used by people who don’t understand numbers and neither by people who demand certainty where a limited amount of that is to be had. Sadly they often are. And that is exactly what is wrong with the cult of data: Slavish trust in numbers by people who don’t understand numbers. Unfortunately, there are a large number of those…
I very much agree.
I think somebody famous once said “there are lies, damn lies, and statistics”.
When someone asks me for a number, I ask them what they plan to do with it/how will it affect decisions being made/etc.
If they don't know, I'd argue that getting them the number is a waste of everyone's time.