The data gold rush in healthcare
is on. It's the wild, wild West. We are told that the data produced in U.S. healthcare will soon be
counted in yottabytes, or a million trillion megabytes, or
1,000,000,000,000,000,000,000,000 bytes. McKinsey & Company asserts that
creative and thoughtful extraction of all the healthcare big data is worth at
least $300 billion a year. Where are all these data coming from, what value is
being extracted from them, and where are the untapped opportunities?
Where’s the present data coming from:
- Transactions. The traditional sources of usable (structured) healthcare data come mostly from billing (claims) data.
- Electronic medical records. EMRs produce useful clinical data in mostly unstructured and semi-structured data.
- Machine-emitted. Most of the yottabytes come from this source, which includes readings from medical sensors and “scrapings” from Web and social media sources including clickstream and social interaction data.
- Biometric devices. These are all the findings from medical measurements such as blood pressure readings and x-rays and other monitors of everything from steps taken (e.g., fitbit) to places visited (GPS).
- Research. Data on individuals from clinical trials, registries, and other sources.
- DNA sequencing. Genomic data to support personalized medicine are not widely available now but are on the verge of becoming accessible and reasonably priced.
Is
it producing big value today?
It is very hard to know because much of
the analytics activity in healthcare is about digitizing the business and building
data warehouses, but not using it to add value to the business or to clinical
outcomes. It is also difficult to know
how much is being spent on analytics. So,
let’s get back to gold mining to infer some answers.
There are 2,500 metric tons of gold
produced annually. At the current price of $1,300 per ounce this amounts to
revenues of about $100 billion. It takes, on average, 30 tons of rock to
produce one ounce of gold. Hence, the final product amounts to .000001042 of
the rock that needs to be worked through to harvest it. There are also by-products
of this extensive mining including the use of cyanide to extract it and huge
open pits and large mounds of waste rock across the countryside where it is
produced. The mining processes include huge investments in monster shovels and
trucks to extract and transport rock to the plant and warehouse for processing
and storing. Gold mining is hypothesis-driven, that is, mine rock in a specific
place and in a specific way and you get gold. This is quite different from a
hypothesis-free approach, which is to take all the rock and do a lot of tests
on it to see whether there is anything in it of value.
Yotta-driven analytics in healthcare is
mostly hypothesis-free, akin to analyzing the whole mountain and looking to
discover “similarities” that may provide new understandings about the delivery
of healthcare. The monster computing technology available today, at relatively low
costs, can enable seemingly limitless simulations to do this. So, the question is, how much rock will it
take to find the gold in healthcare? Will the conversion rate be +/-
.000001042?
Some of the gold from the healthcare data
rush is palpable and it is “small.” The integration of genomic data with
clinical data could lead to answers to important questions, such as whether a
certain chromosomal variation is related to a disease, which could then fuel
individually tailored treatments. For example, Tamoxifen has been an effective
drug for the treatment of breast cancer. On average, about 80% of patients
benefit from it. The potential with personalized treatment is to become 100%
effective in 80% of patients because genetic markers can improve the knowledge
of who does and does not benefit from the treatment. There are many instances
of “small” hypothesis-driven data that can have a precise impact on business
and health outcomes. Other rock in the yotta may not be as clearly useful. For
example, much of the yotta is comprised of data emitted from machines, and much
more research needs to be conducted to home in on likely ways it can contribute.
Are
there other mines that healthcare is missing?
There are two types of
data missing from the previous list. These data do not necessarily add a lot to
the yotta stats. They are “small” and have specific and targeted purposes. These
include extra industry personal data and people-generated data.
Extra Industry Personal Data
The world is full of
relevant data and a lot of it resides outside of health care. External data can
address specific health care issues, for example, to change people’s behavior,
ranging from marketing to early detection of diseases. These data come from
privately aggregated and publicly available databases on a wide range of
personal attributes that can define microsegments that can be precisely
targeted with specific interventions to improve health. For example, data on
height and weight are available from external sources (and not easily collected
or extracted from usual health care data) and can be used to calculate the body
mass index (BMI) to determine premorbid obesity. Additionally, when personal
data are integrated with medical data and in combination with the right channel,
especially mobile, it can produce a much better identification of high-risk
patients, with more effective interventions mapped to their specific needs, and
include closer monitoring over time.
People Generated Data
Another source of
untapped data is people. This is another type of “small data” with big
potential benefits. Most of the data sources listed previously do not involve
the active participation of people. The real potential lies in gathering much
more relevant data from individuals with their consent and engendering their
partnership to engage in data-sharing activities that help them improve their
life. After all, people know more about their own health and illnesses and can
monitor it better than any doctor could possibly hope to do. There is much more
to be learned from a person’s head than from their data streams. There are
indications that this is happening without, and perhaps in spite of, the active
strategies of traditional health care. For example, networks of patients with
the same condition are sharing data and creating large databases that are
beginning to approximate crowd-sourced clinical outcomes research. For example,
as of the end of 2011, PatientsLikeMe had more than 120,000 patients in 500
different condition groups; ACOR (Association of Cancer Online Resources) had more
than 100,000 patients in 127 cancer support groups; 23andMe has more than
100,000 members in their genomic database. People also engage in their own data
sharing through mobile and social media. And people have been responsive to
surveys when the purpose is big (like polling in a presidential campaign) and
when the rewards for participation are adequate.
Conclusion
A mountain of data is
available for analytics in healthcare. Some
of it is really big and has unknown uses but is intriguing, and the technology may
be able to find the gold although the conversion rate may be infinitesimally
small. Some of it is small and can have immediate applications to produce
value. And some that is potentially very valuable and comes directly form
people is not included in the count and is not collected. Certainly, healthcare
lags other industries in its use of big data because of the challenges with
complex and unstructured data, the reluctance to use external data, data
integration issues, and concerns about patient confidentiality. And IT folks
say there is enough unused healthcare industry data to keep them busy for a
very long time. Threading the needle for the most productive use of data, whether
big or small, hypothesis driven or free, depends on analytics making the case
that it is worth the investment and an innovation worth adopting.
This blog is an extract from Dwight’s new (edited) book, Analytics in Healthcare and the LifeSciences.
No comments:
Post a Comment