Analytics needs to walk around some holes in the
sidewalk.
A wonderful book of poems by Portia Nelson, There's
a Hole in My Sidewalk: The Romance of Self-Discovery, addresses the struggle
to stop falling into the same psychological/behavioral hole, how to walk around
it and “go down another street”…and grow as a person.
The field of analytics has fallen into a few big holes lately
that represent both its promise and its peril.
These holes pertain to privacy, policy, and predictions.
Privacy. $1B. Target, the retailer, was the poster
child for using big data for customer analytics to pump up sales. It unabashedly collected lots of data on its
customers, from a variety of sources, integrated it, and used it for predictive
modeling to identify segments that are experiencing “moments that matter” when
habits can be influenced to buy new products.
Target
touts that “we’ll be sending you coupons for things you want before you
even know you want them.” For example,
it developed algorithms about the probability of pregnancy and the delivery
date to sell specific products that women buy at different times during their
pregnancy. It identified the women, sent
them coupons, and opened its cash registers to amazing profits. However, as we have learned, it also opened
its cash registers, credit card machines, and databases to cybercriminals who
stole the personal data of tens of millions of customers. It is estimated that this error will cost
Target over $1B in fraud claims. Its
stock price has fallen over 25% since the incident.
The “hole” is a comfortable one for analytics. The habit
is to uncork technology before its time.
For example, the NSA exploited
the technology to tap telephone calls and scrape peoples’ metadata into a
database before it confronted the likelihood that world leaders and the public
at large would condemn it and it could not defend it in terms of averting terrorism. Similarly,
there was a lot of talk about the “creepiness” of retailers collecting personal
data on customers by whatever means possible.
The big appetite for the data to improve sales may have
blinded companies from thinking about the consequences and “forgetting” the
basic responsibility to protect it. In the
Target case, there are known credit card technology safeguards, including the
use of a security microchip, that were ignored. Additionally, there must be encryption
protocols and firewalls to decouple data so that cybercriminals would not find
personal identity information. The simple lesson is that just because the
technology exists does mean that it should be used. Perhaps one route around the “hole” is to “count
to ten” before technology genies are let out of the bottle.
Predictions: 43-8.
The great hope to demonstrate the value of analytics is (advanced)
predictions. It uses all the breadth
and depth of big data to go beyond reporting on the past to predicting the future. So, how could the predictions about the 2014
Super Bowl game between the Sea Hawks and the Broncos be so far off? The point spread was 3 points but the actual
spread was more than 10 times that as the Sea Hawks routed the Broncos and
Peyton Manning from the first (mis) play of the game. Perhaps there is a tribe of analytics
“sharps” who are making it big in sports wagering but the facts are that the
best of them only win about 53% of the time.
The irony perhaps is that football, like baseball and
basketball, is a fully digitized industry unlike most others including healthcare
which still struggles to use electronic medical records to capture its key
transactions information. In sports, every
play action on the field is captured, recorded, and discussed, resulting in a
rich performance database of players in almost every conceivable context, e.g.
how a baseball hitter performs relative to a specific pitcher, playing field, regular
or post-season game, and so forth.
But, it is clear from the big-miss prediction of the Super
Bowl game that some important data that would improve the precision of the model are missing. The “squares”, who rely on softer data (intuition),
think they know this realty of the shortcomings of quant data, although their
win rate is no better than that of the sharps.
My personal insight on this is when I was 16 years old I worked as a dog
handler at a greyhound racing park. I
took a dog from its pen, to the viewing stand, into the starting gate, and
picked it up at the conclusion of the race.
I knew when the dog was nervous, sick, and hyped up. And I knew when they hit their head going
into the gate that they would not recover to win the race.
The “hole” here is the reliance on the big data that is under
the lamppost. In this case, it is the
big sports data, most of which is collected…because it can be… without a model
in mind and mostly for its entertainment value.
The big data presumption is that if you build it (the database), the
predictions will come. That ain’t
necessarily so, even if one runs zillions of simulations on all the
yottabyte of big data. The data have to be right
for the model to work. In the case of
sports, there are lots of (“soft”) untapped personal data such as health, resilience,
and response to certain threats (and more) that may be important factors in big
game performance. It’s a real short circuiting of predictive
modeling to be carried away with the technologies of the yottabytes while
avoiding a full understanding of the phenomena under study.
Policy. 2.2/7.
The biggest analytics project in recent history is the $6 billion
federal investment in the health exchanges.
The goals of the health exchanges are to enroll people in the health
insurance plans of their choice, determine insurance subsidies for individuals,
and inform insurance companies so that they could issue policies and
bills. The project touches on all the
requisites of analytics including big data collection, multiple sources,
integration, embedded algorithms, real time reporting, and state of the art
software and hardware. As everyone
knows, the implementation was a terrible failure. The CBO’s conservative estimate was that 7
million individuals would enroll in the exchanges. Only 2.2 million did so by the end of
2013. (This does not include Medicaid
enrollment which had its own projections.)
The big federal vendor, CGI, is being blamed for the mess. Note that CGI was also the vendor for the
Commonwealth of Massachusetts which had the worst performance of all states in
meeting enrollment numbers despite its long head start as the Romney reform
state and its groundbreaking exchange called the Connector. New analytics
vendors, including Accenture and Optum, have been brought in for the
rescue.
Was it really a result of bad software, hardware, and
coding? Was it that the design to enroll and determine
subsidies had “complexity built-in” because of the legislation that cobbled
together existing cumbersome systems, e.g. private health insurance
systems? Was it because of the incessant politics
of repeal that distracted policy implementation? Yes, all of the above.
The big “hole”, in my view, was the lack of communications
between the policy makers (the business) and the technology people. The technologists complained that the business
could not make decisions and provide clear guidance. The business expected the technology
companies to know all about the complicated analytics and get the job done, on
time. This ensuing rift where each group did not
know how to talk with the other is recognized as a critical failure point. In fact, those who are stepping into the
rescue role have emphasized that there will be management status checks daily
“at 9 AM and 5 PM” to bring people together, know the plan, manage the project,
stay focused, and solve problems. Walking
around the hole will require a better understanding as to why the business and
the technology folks do not communicate well and to recognize that soft people
skills can avert hard technical catastrophes.
In summary, these
three holes in the sidewalk of analytics are recurrent themes and threats to
fulfilling the promise of analytics.
First, the technology cannot zoom ahead of the sociology. The need for business results cannot err on
the side of the creepy use of personal data to increase sales without a full
respect of the need to protect privacy and to honor customers. Second, big data is not the answer if it is
not the right data. The full potential
of predictive modeling requires more thinking and less data processing. And lastly, the big failures in analytics
have less to do with bad machines and buggy software and much more to do with
people on either side of the business and technology fence just not talking
with one another.
No comments:
Post a Comment