“Big Data” has become as prevalent a term as “EMR” and “HIE” in the healthcare industry. Is it the new technology that holds the promise to revolutionize healthcare? Should a CIO consider adopting it? What are the real benefits? What are some of the key factors to consider when reviewing the multitude of options available?
While carrying out the Monitoring and Diagnostics mission of the Microsoft Autopilot team, we had to process petabytes of data on a daily basis. Big Data was not science fiction; it was a fact of our daily lives. Our customers, and our own infrastructure team, demanded fast and efficient processing of huge amounts of data to facilitate operational, business, and engineering decisions critical to their success. Many of the insights gleaned from this data relied on Cosmos, Microsoft’s internal big data store.
In healthcare, as I outlined in my earlier blog and whitepaper, we are witnessing an explosion of digital data collected from cell phones, voice, images, notes, EMRs, HIEs, and even social media, coupled with an ever increasing number of medical devices that generate large amounts of healthcare tests and diagnostics data. Hospitals of all sizes are finding themselves overwhelmed with this growing data asset. As a result, this data asset may go unused or worse be purged periodically due to a lack of perceived value or the assumed complexity and cost of archival storage!
One way to tackle this data explosion problem is big data. So what is big data? Big data refers to the collection and management of very large data sets and storage facilities. Big data can offer solutions for data that is described as high in:
• Volume – the large amounts of data,
• Velocity – the speed of growth of data, and
• Variety – the mix of structured and unstructured data.
Some organizations also add Veracity, the quality of data, as a challenge because more often than not, data requires cleansing before use. Some of the key advantages of adopting big data stores in your IT infrastructure include:
• Cost & Reliability – Low cost archival storage of historical data that can be retrieved as new applications are developed. Low cost does not equate to low availability or complexity as most big data stores offer triple storage redundancy and a host of management and monitoring tools to track your data.
• Scalability & Elasticity – Big data is not just about storage; rather, it entails efficient data processing that can scale up or down based on your needs without requiring costly dedicated or upfront investment in expensive SAN (storage area network) or data processing servers in your data centers.
• Performance – Processing large amounts of data—whether it is structured, semi-structured, or unstructured,—requires a massive amount of storage bandwidth and a considerable amount of processing power. Most big data engines support distributed redundant storage and some form of MapReduce distributed processing job execution engine, delivering unprecedented performance that can scale up or down based on an organization’s changing needs.
• Flexibility – Multiple offerings exist today that emphasize one or more areas spanning performance, atomicity, read vs write, and data query flexibility based on an organization’s specific needs. These solutions range from publicly available open source libraries to more professional SaaS (software as a service) based commercial offerings.
Caradigm was able to recently leverage Microsoft’s HDInsight, a service that deploys and provisions Apache Hadoop clusters in the cloud, to establish a software framework designed to manage, analyze, and report on big data – see the illustrative diagram below. Caradigm’s population health and analytics solutions, with the power of HDInsight and other complementary big data solutions, can now be configured to allow for archival storage, unstructured data queries using your choice of NLP (natural language processing), and data analytics capabilities that allow care managers to glean actionable insights from volumes of data.
Make no mistake, big data alone will not be sufficient to address all of the data storage and data management needs of healthcare organizations. Caradigm uses a hybrid model that enables us to augment our near real time transactional and analytics data store with a big data store to deliver the most value to healthcare IT, providers, and patients.