dwight n mcneill: November 2013

It’s that time of year again, when stathead’s thoughts turn to sugar plums and new gadgets to make their jobs more interesting and productive. It’s amazing how technology keeps getting better and cheaper. Recall that Moore’s law predicted that chip performance would double every two years, which would increase processing speed, memory capacity, sensors, and even the pixels in digital cameras proportionately. For example, comparing the IBM PC released in August 1981 with the Apple iPhone 4 released in June 2010, the CPU clock speed of the PC was 4.77MHz compared to iPhone at 1GHz; the processor instruction size was 16 bits for the PC and 128 bits for the iPhone; the storage capacity of the PC was 160KB and that of the iPhone (base model) was 16GB; and the installed memory (RAM) was 64KB for the PC and 512MB for the iPhone.⁹ Additionally, the list price on release of the PC was $3,000 (or about $7500 adjusted for inflation) and the iPhone was $199, or about 2.5% of the cost of the original PC. This exponential growth in computing performance has driven the impact of digital devices from computers to household appliances in every segment of the world economy.

So, what’s in the Holiday Gift Catalogues for Analytics this year? Here is a sampler of my five favorites.

NoSQL (Not Only SQL) databases are an alternative to traditional, relational databases and are especially suited for unstructured big data, Web 2.0, and mobile applications. It uses open source software that supports distributed processing. It scales “out” to the cloud, rather than “up” with more servers. It has fewer data model restrictions than relational databases management systems, which allows more agile changes and less need for database administrators. It can use low cost commodity hardware. The bottom line is that it is faster and much cheaper. Examples of popular NoSQL databases include Cassandra, Hadoop, and BigTable. Companies that use it include Facebook, Netflix, LinkedIn, and Twitter. For more information see the NoSQL website, which touts itself as “your ultimate guide to the non-relational universe.”

High performance computing (HPC) allows users to solve complex science, engineering, and business problems using applications that require high bandwidth, low latency networking, and very high compute capabilities. This is the computing capability needed for mining mountains of data. This capacity can be provided by dedicated computer clusters or by cloud clusters. Dedicated, custom-built, supercomputer infrastructure requires significant capital investments, long procurement times, long queues, and extensive database management. Buying HPC services from the cloud provides definite cost advantages, short lead teams, access to the scale required for a given project, and on-demand capacity. An example of such an offering is from Amazon Web Services called Cluster Compute Instances. In healthcare, the biopharma sector uses HPC for genome analysis. Other industries, including oil and gas, financial services, and manufacturing, use it for modeling.

The idea that machines could replace humans for certain functions has been around a long time. And it certainly has become commonplace in industries such as automotive with robots on the assembly line. But can the machines actually “learn” and improve functioning on their own beyond being explicitly programmed? There are good examples of this with Google Search and Amazon purchasing recommendations, and with voice and facial recognition applications. In healthcare, IBM demonstrated a compelling use of machine learning (and natural language processing and predictive analytics) with its Watson technology by beating two grand champions on the Jeopardy! TV quiz show. IBM is working on healthcare solutions. It has partnered with is Memorial Sloan-Kettering Cancer Center to have the technology gather and assimilate information from the research literature and from the Center’s clinical experience documented in its medical records and other files to “bring up-to-date knowledge to the bedside of every cancer patient.” Watson might be able to do this through its capabilities to read and understand language, interact with humans, remember everything, and provide answers to real-time questions. How the information will be delivered to the physician, how it might transform the practice of medicine, and whether physicians will embrace the technology are all important, open questions.

The Internet has transformed the way businesses communicate, market, do commerce with customers, and collect data about them. In retail, clicks are challenging the bricks. What could be more indicative of shifting paradigms than the collapse of the structures in which people do business (stores). One example is the capability to do virtually instantaneous randomized trials of alternative Web site features, e.g., how to get the most contributions during a political campaign. Another is Web page “scraping” in which all types of data about people’s Web wanderings are turned into ratings about their suitability for a job, a loan, and a date.

More than half of the adult population in the United States have smartphones. Facebook has more than 1 billion monthly users. The hot combination of these clicks and mobile produces a platform for easy, convenient, and quick communications that also enable e-commerce, uber-targeted marketing, location monitoring, and much more. An opportunity going forward in healthcare is to create closer relationships with people to help them get healthier by tapping into data that are freely exchanged and by supporting the continual, fast evolution of new applications to support health.

Technology has been awesome in increasing computing capacity with hardware (speed, memory, storage, access, etc.) and with software (to manage all the data and make sense of it). But if the technology is so great, why is the uptake of healthcare analytics so low in comparison to its potential and relative to the performance of other industries? The answer is complicated but one compelling reason is that the technology of making change happen, of getting from a good idea to its being embedded in operations, is unappreciated and untapped. And analysts are enamored with computing technology and may take their eyes off the prize…making behavioral changes to improve clinical and business outcomes.

This holiday gift idea is cheaper than all the others and may be more consequential. A guide is available in my book, A Framework for Applying Analytics in Healthcare: What Can be Learned from the Best Practices in Retail, Banking, Politics and Sports.

The data gold rush in healthcare is on. It's the wild, wild West. We are told that the data produced in U.S. healthcare will soon be counted in yottabytes, or a million trillion megabytes, or 1,000,000,000,000,000,000,000,000 bytes. McKinsey & Company asserts that creative and thoughtful extraction of all the healthcare big data is worth at least $300 billion a year. Where are all these data coming from, what value is being extracted from them, and where are the untapped opportunities?

Where’s the present data coming from:

Transactions. The traditional sources of usable (structured) healthcare data come mostly from billing (claims) data.
Electronic medical records. EMRs produce useful clinical data in mostly unstructured and semi-structured data.
Machine-emitted. Most of the yottabytes come from this source, which includes readings from medical sensors and “scrapings” from Web and social media sources including clickstream and social interaction data.
Biometric devices. These are all the findings from medical measurements such as blood pressure readings and x-rays and other monitors of everything from steps taken (e.g., fitbit) to places visited (GPS).
Research. Data on individuals from clinical trials, registries, and other sources.
DNA sequencing. Genomic data to support personalized medicine are not widely available now but are on the verge of becoming accessible and reasonably priced.

Is it producing big value today?

It is very hard to know because much of the analytics activity in healthcare is about digitizing the business and building data warehouses, but not using it to add value to the business or to clinical outcomes. It is also difficult to know how much is being spent on analytics. So, let’s get back to gold mining to infer some answers.

There are 2,500 metric tons of gold produced annually. At the current price of $1,300 per ounce this amounts to revenues of about $100 billion. It takes, on average, 30 tons of rock to produce one ounce of gold. Hence, the final product amounts to .000001042 of the rock that needs to be worked through to harvest it. There are also by-products of this extensive mining including the use of cyanide to extract it and huge open pits and large mounds of waste rock across the countryside where it is produced. The mining processes include huge investments in monster shovels and trucks to extract and transport rock to the plant and warehouse for processing and storing. Gold mining is hypothesis-driven, that is, mine rock in a specific place and in a specific way and you get gold. This is quite different from a hypothesis-free approach, which is to take all the rock and do a lot of tests on it to see whether there is anything in it of value.

Yotta-driven analytics in healthcare is mostly hypothesis-free, akin to analyzing the whole mountain and looking to discover “similarities” that may provide new understandings about the delivery of healthcare. The monster computing technology available today, at relatively low costs, can enable seemingly limitless simulations to do this. So, the question is, how much rock will it take to find the gold in healthcare? Will the conversion rate be +/- .000001042?

Some of the gold from the healthcare data rush is palpable and it is “small.” The integration of genomic data with clinical data could lead to answers to important questions, such as whether a certain chromosomal variation is related to a disease, which could then fuel individually tailored treatments. For example, Tamoxifen has been an effective drug for the treatment of breast cancer. On average, about 80% of patients benefit from it. The potential with personalized treatment is to become 100% effective in 80% of patients because genetic markers can improve the knowledge of who does and does not benefit from the treatment. There are many instances of “small” hypothesis-driven data that can have a precise impact on business and health outcomes. Other rock in the yotta may not be as clearly useful. For example, much of the yotta is comprised of data emitted from machines, and much more research needs to be conducted to home in on likely ways it can contribute.

Are there other mines that healthcare is missing?

There are two types of data missing from the previous list. These data do not necessarily add a lot to the yotta stats. They are “small” and have specific and targeted purposes. These include extra industry personal data and people-generated data.

Extra Industry Personal Data

The world is full of relevant data and a lot of it resides outside of health care. External data can address specific health care issues, for example, to change people’s behavior, ranging from marketing to early detection of diseases. These data come from privately aggregated and publicly available databases on a wide range of personal attributes that can define microsegments that can be precisely targeted with specific interventions to improve health. For example, data on height and weight are available from external sources (and not easily collected or extracted from usual health care data) and can be used to calculate the body mass index (BMI) to determine premorbid obesity. Additionally, when personal data are integrated with medical data and in combination with the right channel, especially mobile, it can produce a much better identification of high-risk patients, with more effective interventions mapped to their specific needs, and include closer monitoring over time.

People Generated Data

Another source of untapped data is people. This is another type of “small data” with big potential benefits. Most of the data sources listed previously do not involve the active participation of people. The real potential lies in gathering much more relevant data from individuals with their consent and engendering their partnership to engage in data-sharing activities that help them improve their life. After all, people know more about their own health and illnesses and can monitor it better than any doctor could possibly hope to do. There is much more to be learned from a person’s head than from their data streams. There are indications that this is happening without, and perhaps in spite of, the active strategies of traditional health care. For example, networks of patients with the same condition are sharing data and creating large databases that are beginning to approximate crowd-sourced clinical outcomes research. For example, as of the end of 2011, PatientsLikeMe had more than 120,000 patients in 500 different condition groups; ACOR (Association of Cancer Online Resources) had more than 100,000 patients in 127 cancer support groups; 23andMe has more than 100,000 members in their genomic database. People also engage in their own data sharing through mobile and social media. And people have been responsive to surveys when the purpose is big (like polling in a presidential campaign) and when the rewards for participation are adequate.

Conclusion

A mountain of data is available for analytics in healthcare. Some of it is really big and has unknown uses but is intriguing, and the technology may be able to find the gold although the conversion rate may be infinitesimally small. Some of it is small and can have immediate applications to produce value. And some that is potentially very valuable and comes directly form people is not included in the count and is not collected. Certainly, healthcare lags other industries in its use of big data because of the challenges with complex and unstructured data, the reluctance to use external data, data integration issues, and concerns about patient confidentiality. And IT folks say there is enough unused healthcare industry data to keep them busy for a very long time. Threading the needle for the most productive use of data, whether big or small, hypothesis driven or free, depends on analytics making the case that it is worth the investment and an innovation worth adopting.

This blog is an extract from Dwight’s new (edited) book, Analytics in Healthcare and the LifeSciences.

dwight n mcneill

Wednesday, November 20, 2013

What’s on Santa’s List for Analytics?

Thursday, November 14, 2013

What gold mining can tell us about healthcare yottabytes?