Thursday, November 14, 2013

What gold mining can tell us about healthcare yottabytes?

The data gold rush in healthcare is on. It's the wild, wild West.  We are told that the data produced in U.S. healthcare will soon be counted in yottabytes, or a million trillion megabytes, or 1,000,000,000,000,000,000,000,000 bytes. McKinsey & Company asserts that creative and thoughtful extraction of all the healthcare big data is worth at least $300 billion a year. Where are all these data coming from, what value is being extracted from them, and where are the untapped opportunities?

Where’s the present data coming from:
  • Transactions.  The traditional sources of usable (structured) healthcare data come mostly from billing (claims) data.
  • Electronic medical records.  EMRs produce useful clinical data in mostly unstructured and semi-structured data.
  • Machine-emitted.  Most of the yottabytes come from this source, which includes readings from medical sensors and “scrapings” from Web and social media sources including clickstream and social interaction data.
  • Biometric devices.  These are all the findings from medical measurements such as blood pressure readings and x-rays and other monitors of everything from steps taken (e.g., fitbit) to places visited (GPS).
  • Research.  Data on individuals from clinical trials, registries, and other sources.
  • DNA sequencing.  Genomic data to support personalized medicine are not widely available now but are on the verge of becoming accessible and reasonably priced.

Is it producing big value today?

It is very hard to know because much of the analytics activity in healthcare is about digitizing the business and building data warehouses, but not using it to add value to the business or to clinical outcomes.  It is also difficult to know how much is being spent on analytics.  So, let’s get back to gold mining to infer some answers.
   
There are 2,500 metric tons of gold produced annually. At the current price of $1,300 per ounce this amounts to revenues of about $100 billion. It takes, on average, 30 tons of rock to produce one ounce of gold. Hence, the final product amounts to .000001042 of the rock that needs to be worked through to harvest it. There are also by-products of this extensive mining including the use of cyanide to extract it and huge open pits and large mounds of waste rock across the countryside where it is produced. The mining processes include huge investments in monster shovels and trucks to extract and transport rock to the plant and warehouse for processing and storing. Gold mining is hypothesis-driven, that is, mine rock in a specific place and in a specific way and you get gold. This is quite different from a hypothesis-free approach, which is to take all the rock and do a lot of tests on it to see whether there is anything in it of value.

Yotta-driven analytics in healthcare is mostly hypothesis-free, akin to analyzing the whole mountain and looking to discover “similarities” that may provide new understandings about the delivery of healthcare. The monster computing technology available today, at relatively low costs, can enable seemingly limitless simulations to do this.  So, the question is, how much rock will it take to find the gold in healthcare? Will the conversion rate be +/- .000001042?

Some of the gold from the healthcare data rush is palpable and it is “small.” The integration of genomic data with clinical data could lead to answers to important questions, such as whether a certain chromosomal variation is related to a disease, which could then fuel individually tailored treatments. For example, Tamoxifen has been an effective drug for the treatment of breast cancer. On average, about 80% of patients benefit from it. The potential with personalized treatment is to become 100% effective in 80% of patients because genetic markers can improve the knowledge of who does and does not benefit from the treatment. There are many instances of “small” hypothesis-driven data that can have a precise impact on business and health outcomes. Other rock in the yotta may not be as clearly useful. For example, much of the yotta is comprised of data emitted from machines, and much more research needs to be conducted to home in on likely ways it can contribute.
Are there other mines that healthcare is missing?
There are two types of data missing from the previous list. These data do not necessarily add a lot to the yotta stats. They are “small” and have specific and targeted purposes. These include extra industry personal data and people-generated data.
Extra Industry Personal Data
The world is full of relevant data and a lot of it resides outside of health care. External data can address specific health care issues, for example, to change people’s behavior, ranging from marketing to early detection of diseases. These data come from privately aggregated and publicly available databases on a wide range of personal attributes that can define microsegments that can be precisely targeted with specific interventions to improve health. For example, data on height and weight are available from external sources (and not easily collected or extracted from usual health care data) and can be used to calculate the body mass index (BMI) to determine premorbid obesity. Additionally, when personal data are integrated with medical data and in combination with the right channel, especially mobile, it can produce a much better identification of high-risk patients, with more effective interventions mapped to their specific needs, and include closer monitoring over time.
People Generated Data
Another source of untapped data is people. This is another type of “small data” with big potential benefits. Most of the data sources listed previously do not involve the active participation of people. The real potential lies in gathering much more relevant data from individuals with their consent and engendering their partnership to engage in data-sharing activities that help them improve their life. After all, people know more about their own health and illnesses and can monitor it better than any doctor could possibly hope to do. There is much more to be learned from a person’s head than from their data streams. There are indications that this is happening without, and perhaps in spite of, the active strategies of traditional health care. For example, networks of patients with the same condition are sharing data and creating large databases that are beginning to approximate crowd-sourced clinical outcomes research. For example, as of the end of 2011, PatientsLikeMe had more than 120,000 patients in 500 different condition groups; ACOR (Association of Cancer Online Resources) had more than 100,000 patients in 127 cancer support groups; 23andMe has more than 100,000 members in their genomic database. People also engage in their own data sharing through mobile and social media. And people have been responsive to surveys when the purpose is big (like polling in a presidential campaign) and when the rewards for participation are adequate.

Conclusion

A mountain of data is available for analytics in healthcare.  Some of it is really big and has unknown uses but is intriguing, and the technology may be able to find the gold although the conversion rate may be infinitesimally small. Some of it is small and can have immediate applications to produce value. And some that is potentially very valuable and comes directly form people is not included in the count and is not collected. Certainly, healthcare lags other industries in its use of big data because of the challenges with complex and unstructured data, the reluctance to use external data, data integration issues, and concerns about patient confidentiality. And IT folks say there is enough unused healthcare industry data to keep them busy for a very long time. Threading the needle for the most productive use of data, whether big or small, hypothesis driven or free, depends on analytics making the case that it is worth the investment and an innovation worth adopting.

This blog is an extract from Dwight’s new (edited) book, Analytics in Healthcare and the LifeSciences.

No comments:

Post a Comment