Identify the algorithms applied by Big Data in healthcare and evaluate the impact of these on noncommunicable disease prediction strategies

01 Sep 2019

Noncommunicable diseases (NCDs), or Chronic diseases, such as heart disease, stroke, and cancer are the main causes of death in the world. In 2005, about 60 percent (35 million)of 58 million global deaths were because of NCDs(World Health Organization, 2005). Ten years later, the figure increased to 71 percent(41 million), while the total number of global deaths decreased to 57 million(World Health Organization, 2018). NCDs have become the main causes of death in the world. Moreover, the treatment of NCDs brings tremendous economic pressure to patients and the whole society. For example, the United States’ budget for this treatment in 2011 was about 38 billion USD(Groves, Kayyali, Knott, Kuiken, 2013). Therefore, it is necessary to have effective risk assessments and prediction algorithms for NCDs. This essay will argue that the Convolutional Neural Network (CNN)-based algorithm is most effective one for most healthcare communities to predict NCDs. The essay is going to describe some characters of Big Data of healthcare community, such as its definition and the applications. Details of prediction algorithm will also be described in the essay, including structure, and the advantages and disadvantages of these algorithms. Finally, we will get the conclusion of the essay.

As an abstract term, Big Data has ‘high-volume, high-variety, and high-velocity information’( Barrett et al., 2013). In the past few years, medical and healthcare data collected from healthcare communities exploded(Galetsi, Katsaliaki, 2019). It is not only because of the development of public medical facilities with patients’ detailed case records(Stephan, 2010, pp. 211-223), but also mobile and ‘Internet-of-Things’ (IoT) (Atzori, Ierab, Morabitoc, 2010) technologies that generate more individual data every day. ‘Mobile health’ system(Kumar, Nilsen, et al., 2012) could provide many individual services rapidly by monitoring users’ daily status. The system has some on-body sensors which connect with the data collector applications on users’ smart phone. By the system, a lot of data including heart rate, blood pressure, geographical position, weather condition and even personal feelings could be collected quickly and completely. With these individual medical data, doctors can diagnose diseases accurately and apply right treatment strategies to specific patients. To optimize the data collection, some researchers prefer more convenient and smarter healthcare systems based on IoT. Min et al.(Min, Yujun, et al., 2017) designed ‘Smart clothing’ system. This system could collect more detailed data without too much burden to users by applying mini sensors on users’ cloth. The individual data collected by this system would be sent to cloud servers, logged and analyzed by online algorithms. Then, with doctors or experts experience, accurate disease risk assessment and prediction will be made. And healthcare communities could formulate suitable treatment plans for patients. Big Data from healthcare community can promote the development of new medical technology and healthcare system. This makes personalized medicine and disease prediction possible.

Based on Big Data, a lot of researchers designed disease prediction algorithms. Palaniappan and Awang(2008) developed Intelligent Heart Disease Prediction System(IHDPS) to predict the possibility of one patient having heart diseases. The system used Decision Trees, Naive Bayes and Neural Network to model the heart disease prediction with medical data. The data consisted patients’ gender, age, blood pressure and other structured data. With the mining model, IHDPS could give a value as the possibility. In their experiments, the prediction accuracy of their system was 86.12%. IHDPS could provide effective support to doctors in making medicine decision and suitable disease treatment. In addition, the algorithm of the system is simple and feasible, which could be built up and provide services quickly. But the system only used discrete data to predict heart diseases, while continuous data and unstructured data such as disease history are also necessary in disease prediction(Ziad, Ezekiel, 2016).

In order to assess the risk of chronic diseases, Min et al.(2017) designed a CNN-based algorithm which could analyze medical data of hospitals. The data they used to model disease prediction consisted of 20320848 medical records of 31919 hospitalized patients from 2013 to 2015. The other feature of the data is that unstructured text data are also included in the data. To deal with the unstructured data, they used Word Embedding(Omer, Yoav, 2014), which could accelerate the modeling process of the algorithm. The algorithm has a convolution layer with 100 convolution filters, and the unstructured data in this layer would process convolution with Word Embedding. This operation could extract text vector features which would be added in the next calculation. Max pooling and full connection are also considered in the algorithm to reduce the model complexity and get accurate outcome. The output of the algorithm is a value to predicate whether or not one patient is at high risk of NCDs. For the validity evaluation of this algorithm, Min put Naive Bayesian(Wang et al., 2007), K-nearest Neighbor(García-Laencina et al., 2015), and Decision Tree(Sharma et al., 2013) algorithms in the control group and tested these algorithms by the same data of a hospital. The experiment results showed that the CNN-based algorithm had the best accuracy and recall, which was 94.80% and 99.9923% respectively. This algorithm could extract the useful information of the medical data and deal with the text data automatically. it could also reduce the workload and mistakes of doctors in disease prediction. And it is better than data mining algorithms like Naive Bayesian and Decision Tree in disease prediction. Moreover, the prediction performance of this algorithm could be improved by expanding the volume of training data. However, this algorithm requires a lot of computing resources in model training, which might prevent the use of small healthcare communities. The complexity of the algorithm influences its expansion with different kinds of data, and brings more work in data pretreatment.

Unlike above researchers, some researchers pay more attention to personalized medicine, which is also improved by the development of Big Data. Chawla and Davis(2013) developed the Collaborative Assessment and Recommendation Engine (CARE) based on the patient-centric model(Stanton, 2002). The engine used the ‘collaborative filtering methodology’ to analyze the data(Breese, Heckerman, Kadie, 1998). The engine has a data set which consists all patients’ medical history. So it can compare individual medical history and data in the data set, and use collaborative filtering to process the data. In their experiments, the engine could predict about 50% of NCDs that a person might have in the future. The engine could utilize the medical data to predict multiple diseases, while the data mining and machine learning algorithms prefer to predict one specific disease. But the amount of diseases in the data set may increase the complexity of the engine and the comparison between individual medical history and the data set will generate a bottleneck of the engine’s performance. That is a serious problem if the engine is used in large-scale healthcare communities to provide medical services.

To sum up, the essay has described the NCDs prediction with Big Data of healthcare community. It has shown some characters of medical Big Data, including the source, structure and content. Big Data of healthcare can promote the development of new medical technology and change our viewpoints of healthcare. Moreover, the essay has also present 3 different NCDs prediction algorithms: Data mining algorithm, CNN-based algorithm and CARE. These algorithms are based on the medical Big Data and can predict NCDs accurately by modeling the health. However, the experiments of CNN-based algorithm have shown that this algorithm is better than the Data Mining algorithm in NCDs prediction. And CNN-based algorithm is also better than CARE for healthcare communities with less performance bottlenecks, though the latter one could predict more diseases than the front. CNN-based algorithm could adapt to the increase of data while CARE may be stuck. Therefore, CNN-based algorithm is the best choice for most healthcare communities and it can incorporate big data into NCDs predicting strategies for them.

Reference

Atzori, L., Iera, A. and Morabito, G., (2010). ‘The internet of things: A survey’, Computer networks, 54(15), pp.2787-2805.
Barrett, M.A., Humblet, O., Hiatt, R.A. and Adler, N.E., (2013). ‘Big data and disease prevention: from quantified self to quantified communities’, Big data, 1(3), pp.168-175.
Breese, J.S., Heckerman, D. and Kadie, C., (1998), July. ‘Empirical analysis of predictive algorithms for collaborative filtering’, In Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence (pp. 43-52). Morgan Kaufmann Publishers Inc..
Chawla, N.V. and Davis, D.A., (2013). ‘Bringing big data to personalized healthcare: a patient-centered framework’, Journal of general internal medicine, 28(3), pp.660-665.
Chen, M., Ma, Y., Li, Y., Wu, D., Zhang, Y. and Youn, C.H., (2017). ‘Wearable 2.0: Enabling human-cloud integration in next generation healthcare systems’. IEEE Communications Magazine, 55(1), pp.54-61.
Chen, M., Hao, Y., Hwang, K., Wang, L. and Wang, L., (2017). ‘Disease prediction by machine learning over big data from healthcare communities’, Ieee Access, 5, pp.8869-8879.
Galetsi, P. and Katsaliaki, K., (2019). ‘A review of the literature on big data analytics in healthcare’, Journal of the Operational Research Society, pp.1-19.
García-Laencina, P.J., Abreu, P.H., Abreu, M.H. and Afonoso, N., (2015). ‘Missing data imputation on the 5-year survival prediction of breast cancer patients with unknown discrete values’, Computers in biology and medicine, 59, pp.125-133.
Groves, P., Kayyali, B., Knott, D. and Kuiken, S.V., (2016). The 'big data' revolution in healthcare: Accelerating value and innovation.
Kumar, S., Nilsen, W., Pavel, M. and Srivastava, M., (2012). ‘Mobile health: Revolutionizing healthcare through transdisciplinary research’, Computer, 46(1), pp.28-35.
Levy, O. and Goldberg, Y., (2014). ‘Neural word embedding as implicit matrix factorization’, In Advances in neural information processing systems (pp. 2177-2185).
Obermeyer, Z. and Emanuel, E.J., (2016). ‘Predicting the future—big data, machine learning, and clinical medicine’, The New England journal of medicine, 375(13), p.1216.
Palaniappan, S. and Awang, R., (2008), March. ‘Intelligent heart disease prediction system using data mining techniques’, In 2008 IEEE/ACS international conference on computer systems and applications (pp. 108-115). IEEE.
Sharma, N. and Om, H., 2013. ‘Data mining models for predicting oral cancer survivability’. Network Modeling Analysis in Health Informatics and Bioinformatics, 2(4), pp.285-295.
Stanton, M.W., (2002). Expanding patient-centered care to empower patients and assist providers.
Stephan P. K., (2010). ‘Data mining in health care’, Healthcare Informatics: Improving Efficiency and Productivity, pp. 211-223.
Wang, Q., Garrity, G.M., Tiedje, J.M. and Cole, J.R., (2007). ‘Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy’, Appl. Environ. Microbiol., 73(16), pp.5261-5267.
World Health Organization (2005). Preventing chronic diseases: a vital investment. World Health Organization.
World Health Organization (2018). Noncommunicable diseases country profiles 2018. World Health Organization.
Zolbanin, H.M., Delen, D. and Zadeh, A.H., (2015). ‘Predicting overall survivability in comorbidity of cancers: A data mining approach’, Decision Support Systems, 74, pp.150-161.

安南到南安

Good Good Study, Day Day Up

Identify the algorithms applied by Big Data in healthcare and evaluate the impact of these on noncommunicable disease prediction strategies

Reference