The following are some resources to check out for developing a full fledged population health monitoring program. They are under review for application to the surveillance program being developed.
______________________________
Part 1. Techniques – Machine Learning
Introduction
Machine learning is using your Big Data/EMR and software tools to develop routine methods for analyzing results. Most recently, I produced a method for evaluating the top 20 ICDs (by groups I defined) for each of the ethnic groups in the EMR. These lists were combined in a final table, with ethnics group listed side by side, in descending order for each, for the ‘hot diagnoses’, followed by rank, n and percent. That SAS took about 1000 lines of programming and a minimum of 53 processes (according to the SAS info that is displayed during a run).
Machine Learning uses two methods–a supervised classification process and an unsupervised classification process. Supervised classification is where you the researcher manually define the different groupings. For example, my ICD lists and sets are defined based upon personal impressions of ICDs that need to stand out more than they do with the predefined ICD groups defined at CMS. My first group of ICDs is 135, my second 303. These groups are to some degree subjective, and based upon clinical observations regarding priority healthcare issues.
Unsupervised classification is where the SAS itself analyzes the data and determines where clusters appear, with the clinical variables (parametric and non-parametric) used to define these clusters. Although these are fairly easy to produce, they are not always logical and may link one outlier ICD in Group A to the cluster in group B, making the outcomes generated questionable. There are hundreds of classifications to be tested in terms of ICD and ICD comorbidity relationships. The following are few examples of these.
Most of these are available in full text form on the internet (sorry, no downloadable pdf copies for the moment, due to copyright concerns.)
Bibliography
Afzal, Z., Engelkes, M., Verhamme, K., Janssens, H. M., Sturkenboom, M. C., Kors, J. A., & Schuemie, M. J. (2013). Automatic generation of case‐detection algorithms to identify children with asthma from large electronic health record databases. Pharmacoepidemiology and Drug Safety, 22(8), 826-833. doi:10.1002/pds.3438
Afzal, Z., Schuemie, M. J., van Blijderveen, J. C., Sen, E. F., Sturkenboom, M. C., & Kors, J. A. (2013). Improving sensitivity of machine learning methods for automated case identification from free-text electronic medical records. BMC medical informatics and decision making, 13(1), 30. doi:10.1186/1472-6947-13-30
Boland, M. R., Tatonetti, N. P., & Hripcsak, G. (2014). CAESAR: a Classification Approach for Extracting Severity Au-tomatically from Electronic Health Records.
Boxwala, A. A., Kim, J., Grillo, J. M., & Ohno-Machado, L. (2011). Using statistical and machine learning to help institutions detect suspicious access to electronic health records. Journal of the American Medical Informatics Association, 18(4), 498-505. doi:10.1136/amiajnl-2011-000217
Caballero Barajas, K. L., & Akella, R. (2015, August). Dynamically Modeling Patient’s Health State from Electronic Medical Records: A Time Series Approach. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 69-78). ACM. doi:10.1145/2783258.2783289
Dua, S., Acharya, U. R., & Dua, P. (2014). Machine learning in healthcare informatics. Springer Berlin Heidelberg. [Fuzzy logic, supervised and unsupervised classifications, rule learning, black box, predictions, longitudinal data, fraud, imagery.]
FitzHenry, F., Murff, H. J., Matheny, M. E., Gentry, N., Fielstein, E. M., Brown, S. H., … & Speroff, T. (2013). Exploring the Frontier of Electronic Health Record Surveillance: The Case of Post-Operative Complications. Medical Care, 51(6), 509. doi:10.1097/MLR.0b013e31828d1210
Gupta, S., Tran, T., Luo, W., Phung, D., Kennedy, R. L., Broad, A., … & Matheson, L. (2014). Machine-learning prediction of cancer survival: a retrospective study using electronic administrative records and a cancer registry. BMJ open, 4(3), e004007. doi:10.1136/bmjopen-2013-004007
Hoogendoorn, M., Moons, L. M., Numans, M. E., & Sips, R. J. (2014). Utilizing Data Mining for Predictive Modeling of Colorectal Cancer Using Electronic Medical Records. In Dominik Ślȩzak, Ah-Hwee Tan, James F. Peters, Lars Schwabe (Eds.), Brain Informatics and Health (pp. 132-141). Springer International Publishing. doi:10.1007/978-3-319-09891-3_13
Jensen, P. B., Jensen, L. J., & Brunak, S. (2012). Mining electronic health records: towards better research applications and clinical care. Nature Reviews Genetics, 13(6), 395-405. doi:10.1038/nrg3208
Liu, H., Bielinski, S. J., Sohn, S., Murphy, S., Wagholikar, K. B., Jonnalagadda, S. R., … Chute, C. G. (2013). An Information Extraction Framework for Cohort Identification Using Electronic Health Records . AMIA Summits on Translational Science Proceedings, 2013, 149–153.
Mo, H., Thompson, W. K., Rasmussen, L. V., Pacheco, J. A., Jiang, G., Kiefer, R., … & Lingren, T. (2015). Desiderata for computable representations of electronic health records-driven phenotype algorithms. Journal of the American Medical Informatics Association, 22(6), 1220-1230. doi:10.1093/jamia/ocv112
Kononenko, I. (2001). Machine learning for medical diagnosis: history, state of the art and perspective. Artificial Intelligence in Medicine, 23(1), 89-109.
Koutsojannis, C., Nabil, E., Tsimara, M., & Hatzilygeroudis, I. (2009, November). Using machine learning techniques to improve the behaviour of a medical decision support system for prostate diseases. In Intelligent Systems Design and Applications, 2009. ISDA’09. Ninth International Conference on (pp. 341-346). IEEE. 10.1109/ISDA.2009.110
Kreuzthaler, M., Schulz, S., & Berghold, A. (2015). Secondary use of electronic health records for building cohort studies through top-down information extraction. Journal of biomedical informatics, 53, 188-195. doi:10.1016/j.jbi.2014.10.010 [COHORTS]
Lin C, Karlson EW, Canhao H, Miller TA, Dligach D, Chen PJ, et al. (2013) Automatic Prediction of Rheumatoid Arthritis Disease Activity from the Electronic Medical Records. PLoS ONE 8(8): e69932. doi:10.1371/journal.pone.0069932
Martin-Sanchez, F., Iakovidis, I., Nørager, S., Maojo, V., de Groen, P., Van der Lei, J., … & Baud, R. (2004). Synergy between medical informatics and bioinformatics: facilitating genomic medicine for future health care. Journal of biomedical informatics, 37(1), 30-42.
Meystre, S. M., Friedlin, F. J., South, B. R., Shen, S., & Samore, M. H. (2010). Automatic de-identification of textual documents in the electronic health record: a review of recent research. BMC medical research methodology, 10(1), 70. doi:10.1186/1471-2288-10-70
Patel, V. L., Shortliffe, E. H., Stefanelli, M., Szolovits, P., Berthold, M. R., Bellazzi, R., & Abu-Hanna, A. (2009). The coming of age of artificial intelligence in medicine. Artificial intelligence in medicine, 46(1), 5-17. doi:10.1016/j.artmed.2008.07.017
Pathak, J., Bailey, K. R., Beebe, C. E., Bethard, S., Carrell, D. S., Chen, P. J., … & Huff, S. M. (2013). Normalization and standardization of electronic health records for high-throughput phenotyping: the SHARPn consortium. Journal of the American Medical Informatics Association, 20(e2), e341-e348. doi:10.1136/amiajnl-2013-001939 [MU]
Pineda, A. L., Ye, Y., Visweswaran, S., Cooper, G. F., Wagner, M. M., & Tsui, F. R. (2015). Comparison of machine learning classifiers for influenza detection from emergency department free-text reports. Journal of Biomedical Informatics, 58, 60-69.. doi:10.1016/j.jbi.2015.08.019
Prather, J. C., Lobach, D. F., Goodwin, L. K., Hales, J. W., Hage, M. L., & Hammond, W. E. (1996, December). Medical data mining: knowledge discovery in a clinical data warehouse. In Proceedings: a conference of the American Medical Informatics Association/… AMIA Annual Fall Symposium. AMIA Fall Symposium (pp. 101-105).
Sada, Y., Hou, J., Richardson, P., El-Serag, H., & Davila, J. (2013). Validation of case finding algorithms for hepatocellular cancer from administrative data and electronic health records using natural language processing. Medical care, 54(2), e9–e14. doi:10.1097/MLR.0b013e3182a30373
Skeppstedt, M., Kvist, M., Nilsson, G. H., & Dalianis, H. (2014). Automatic recognition of disorders, findings, pharmaceuticals and body structures from clinical text: An annotation and machine learning study. Journal of Biomedical Informatics, 49, 148-158. doi:10.1016/j.jbi.2014.01.012
Szarvas, G., Farkas, R., & Busa-Fekete, R. (2007). State-of-the-art anonymization of medical records using an iterative machine learning framework. Journal of the American Medical Informatics Association, 14(5), 574-580. doi:10.1197/j.jamia.M2441
Wang, Z., Shah, A. D., Tate, A. R., Denaxas, S., Shawe-Taylor, J., & Hemingway, H. (2012). Extracting diagnoses and investigation results from unstructured text in electronic health records by semi-supervised machine learning. PLoS One, 7(1), e30412. doi:10.1371/journal.pone.0030412
Wiens, J., Campbell, W. N., Franklin, E. S., Guttag, J. V., & Horvitz, E. (2014, September). Learning Data-Driven Patient Risk Stratification Models for Clostridium difficile. In Open Forum Infectious Diseases (Vol. 1, No. 2, p. ofu045). Oxford University Press. doi: 10.1093/ofid/ofu045
Weiss, J. C., Natarajan, S., Peissig, P. L., McCarty, C. A., & Page, D. (2012). Machine learning for personalized medicine: predicting primary myocardial infarction from electronic health records. AI Magazine, 33(4), 33. doi:10.1609/aimag.v33i4.2438
Wolfson, J., Bandyopadhyay, S., Elidrisi, M., Vazquez-Benitez, G., Musgrove, D., Adomavicius, G., … & O’Connor, P. (2013). A Naive Bayes machine learning approach to risk prediction using censored, time-to-event electronic health record data. [Draft of presentation/publication; not completed.]
Wu, J., Roy, J., & Stewart, W. F. (2010). Prediction modeling using EHR data: challenges, strategies, and a comparison of machine learning approaches. Medical care, 48(6), S106-S113. doi:10.1097/MLR.0b013e3181de9e17
Observational Studies=Data Mining
Introduction
I most often incorporate GIS into my work by using the raw data provided by EMR, reclassifiying it as need be, and adding longitude-latitude data whenever possible. This use of GIS may be considered an extension of increasing popular “Observations Studies” term and techniques now found in the literature. A GIS study of the raw or freshly mined and slightly modified data may also be labelled an “ecological study.”
Grimes, D. A., & Schulz, K. F. (2002). Bias and causal associations in observational research. The Lancet, 359(9302), 248-252. doi:10.1016/S0140-6736(02)07451-2
Hansen, R. A., Gray, M. D., Fox, B. I., Hollingsworth, J. C., Gao, J., & Zeng, P. (2013). How well do various health outcome definitions identify appropriate cases in observational studies? Drug Safety, 36(1), 27-32. doi:10.1007/s40264-013-0104-0
Madigan, D., Stang, P. E., Berlin, J. A., Schuemie, M., Overhage, J. M., Suchard, M. A., … & Ryan, P. B. (2014). A systematic statistical approach to evaluating evidence from observational studies. Annual Review of Statistics and Its Application, 1, 11-39. doi:10.1146/annurev-statistics-022513-115645
Nagisetty, N., Huang, E. Y., Wade, G., & Viangteeravat, T. (2014). Building a knowledge base to assist clinical decision-making using the Pediatric Research Database (PRD) and machine learning: a case study on pediatric asthma patients. BMC Bioinformatics, 15(Suppl 10), P17. doi:10.1186/1471-2105-15-S1-S10
Roche, J. J. W., Wenn, R. T., Sahota, O., & Moran, C. G. (2005). Effect of comorbidities and postoperative complications on mortality after hip fracture in elderly people: prospective observational cohort study. BMJ, 331(7529), 1374. doi:10.1136/bmj.38643.663843.55
Schuemie, M. J., Ryan, P. B., DuMouchel, W., Suchard, M. A., & Madigan, D. (2014). Interpreting observational studies: why empirical calibration is needed to correct p‐values. Statistics in medicine, 33(2), 209-218. doi:10.1002/sim.5925
Shiomi, H., Nakagawa, Y., Morimoto, T., Furukawa, Y., Nakano, A., Shirai, S., … & Mitsuoka, H. (2012). Association of onset to balloon and door to balloon time with long term clinical outcome in patients with ST elevation acute myocardial infarction having primary percutaneous coronary intervention: observational study. BMJ, 344, e3257. doi: 10.1136/bmj.e3257
Tannen, R. L., Weiner, M. G., & Xie, D. (2009). Use of primary care electronic medical record database in drug efficacy research on cardiovascular outcomes: comparison of database and randomised controlled trial findings. BMJ, 338. doi:10.1136/bmj.b81
Twisk, J. W. (1997). Different statistical models to analyze epidemiological observational longitudinal data: an example from the Amsterdam Growth and Health Study. International Journal of Sports Medicine, 18, S216-24.
Yost, N. P., Bloom, S. L., McIntire, D. D., & Leveno, K. J. (2005). A prospective observational study of domestic violence during pregnancy. Obstetrics & gynecology, 106(1), 61-65. doi:10.1097/01.AOG.0000164468.06070.2a
******************
Comparative Effectiveness Research
CER is when treatment programs for several programs or facilities are contrasted and compared statistically. This refers to database settings where the data source is several places, and in order to retain HIPAA compliance, the data is cleaned of the personal identifiers and other data, as specified by some program and/or HIPAA guidelines. These guidelines are followed as much as possible by the researchers, but realize, full compliance is difficult when the restricted data is essential to the study process itself, such as 5digit zip code identification and even street and house number data. CER involves institutions cross-comparing their healthcare results and performance. These measures are often implemented as part of the meaningful use program as well.
References:
Hersh, W. R., Weiner, M. G., Embi, P. J., Logan, J. R., Payne, P. R., Bernstam, E. V., … & Saltz, J. H. (2013). Caveats for the use of operational electronic health record data in comparative effectiveness research. Medical Care, 51(8 Suppl 3), S30-7. doi:10.1097/MLR.0b013e31829b1dbd
Holve, E., Segal, C., Lopez, M. H., Rein, A., & Johnson, B. H. (2012). The Electronic Data Methods (EDM) forum for comparative effectiveness research (CER). Medical care, 50, S7-S10. doi:10.1097/MLR.0b013e318257a66b
Kudyakov, R., Bowen, J., Ewen, E., West, S. L., Daoud, Y., Fleming, N., & Masica, A. (2012). Electronic health record use to classify patients with newly diagnosed versus preexisting type 2 diabetes: infrastructure for comparative effectiveness research and population health management. Population Health Management, 15(1), 3-11. doi:10.1089/pop.2010.0084.
Lopez, M. H., Holve, E., Sarkar, I. N., & Segal, C. (2012). Building the informatics infrastructure for comparative effectiveness research (CER): a review of the literature. Medical Care, 50, S38-S48. doi: 10.1097/MLR.0b013e318259becd
Masica, M. D., & Collinsworth, M. P. H. (2012). Leveraging Electronic Health Records in Comparative Effectiveness Research. Prescriptions for Excellence in Health Care Newsletter Supplement, 1(14), 6.
Ogunyemi, O. I., Meeker, D., Kim, H. E., Ashish, N., Farzaneh, S., & Boxwala, A. (2013). Identifying appropriate reference data models for comparative effectiveness research (CER) studies based on data from clinical information systems. Medical Care, 51, S45-S52. doi:10.1097/MLR.0b013e31829b1e0b
Toh, S., Platt, R., Steiner, J. F., & Brown, J. S. (2011). Comparative‐Effectiveness Research in Distributed Health Data Networks. Clinical Pharmacology & Therapeutics, 90(6), 883-887. doi:10.1038/clpt.2011.236
Toh, S., & Platt, R. (2013). Is size the next big thing in epidemiology?. Epidemiology, 24(3), 349-351. doi:10.1097/EDE.0b013e31828ac65e
Data Sharing (iDASH, a HIPAA certified cloud)
Ohno-Machado, L., Bafna, V., Boxwala, A. A., Chapman, B. E., Chapman, W. W., Chaudhuri, K., … & Kim, H. (2012). iDASH: integrating data for analysis, anonymization, and sharing. Journal of the American Medical Informatics Association, 19(2), 196-201. 10.1136/amiajnl-2011-000538
Reisinger, S. J., Ryan, P. B., O’Hara, D. J., Powell, G. E., Painter, J. L., Pattishall, E. N., & Morris, J. A. (2010). Development and evaluation of a common data model enabling active drug safety surveillance using disparate healthcare databases. Journal of the American Medical Informatics Association, 17(6), 652-662. doi:10.1136/jamia.2009.002477
Data Quality Assessment Model
Kahn, M. G., Raebel, M. A., Glanz, J. M., Riedlinger, K., & Steiner, J. F. (2012). A pragmatic framework for single-site and multisite data quality assessment in electronic health record-based clinical research. Medical care, 50. doi:10.1097/MLR.0b013e318257dd67 Accessed at http://europepmc.org/articles/pmc3833692
Brown, J., Kahn, M., & Toh, S. (2013). Data quality assessment for comparative effectiveness research in distributed data networks. Medical care, 51(8 0 3), S22. doi:10.1097/MLR.0b013e31829b1e2c
Dreyer, N. A., Schneeweiss, S., McNeil, B. J., Berger, M. L., Walker, A. M., Ollendorf, D. A., & Gliklich, R. E. (2010). GRACE principles: recognizing high-quality observational studies of comparative effectiveness. The American Journal of Managed Care, 16(6), 467-471.
Data Accuracy
Cipparone, C. W., Withiam-Leitch, M., Kimminau, K. S., Fox, C. H., Singh, R., & Kahn, L. (2015). Inaccuracy of ICD-9 Codes for Chronic Kidney Disease: A Study from Two Practice-based Research Networks (PBRNs). The Journal of the American Board of Family Medicine, 28(5), 678-682. doi:10.3122/jabfm.2015.05.140136
PHI
Malin, B. A., El Emam, K., & O’Keefe, C. M. (2013). Biomedical data privacy: problems, perspectives, and recent advances. Journal of the American medical informatics association, 20(1), 2-6. doi:10.1136/amiajnl-2012-001509
Tran, D. T., Halgrim, S., & Carrell, D. (2014). C3-4: An Algorithm to Combine Machine Learning and Structured Data to Automate De-identification of Clinical Text. Clinical Medicine & Research, 12(1-2), 94-95. doi:10.3121/cmr.2014.1250.c3-4
*************************
Part 2 – Applications, Methods and Skills
These are examples of how to employ population health analysis procedures.
Algorithms
Alghwiri, A., Alghadir, A., & Awad, H. (2014). The Arab Risk (ARABRISK): Translation and Validation. Biomedical Research, 25(2), 271-275.
Carroll, R. J., Thompson, W. K., Eyler, A. E., Mandelin, A. M., Cai, T., Zink, R. M., … & Karlson, E. W. (2012). Portability of an algorithm to identify rheumatoid arthritis in electronic health records. Journal of the American Medical Informatics Association, 19(e1), e162-e169. doi:10.1136/amiajnl-2011-000583
Holroyd-Leduc, J. M., Lorenzetti, D., Straus, S. E., Sykes, L., & Quan, H. (2011). The impact of the electronic medical record on structure, process, and outcomes within primary care: a systematic review of the evidence. Journal of the American Medical Informatics Association, 18(6), 732-737. doi:10.1136/amiajnl-2010-000019
Lin, Y. K., Chen, H., Brown, R., Li, S. H., & Yang, H. J. (2014). Time-to-Event Predictive Modeling for Chronic Conditions using Electronic Health Records. Intelligent Systems, IEEE, 29(3), 14-20. doi:10.1109/MIS.2014.18
Sovio, U., Skow, A., Falconer, C., Park, M. H., Viner, R. M., & Kinra, S. (2013). Improving prediction algorithms for cardiometabolic risk in children and adolescents. Journal of obesity, 2013. doi:10.1155/2013/684782
Phenotyping
Anderson, A. E., Kerr, W. T., Thames, A., Li, T., Xiao, J., & Cohen, M. S. (2015). Electronic health record phenotyping improves detection and screening of type 2 diabetes in the general United States population: A cross-sectional, unselected, retrospective study. arXiv preprint arXiv:1501.02402.
Boland, M. R., Tatonetti, N. P., & Hripcsak, G. (2015). Development and validation of a classification approach for extracting severity automatically from electronic health records. Journal of Biomedical Semantics, 6(1), 14. doi:10.1186/s13326-015-0010-8
Carroll, R. J., Eyler, A. E., & Denny, J. C. (2011). Naïve electronic health record phenotype identification for rheumatoid arthritis. In AMIA annual symposium proceedings (Vol. 2011, p. 189). American Medical Informatics Association.
Chen, Y., Carroll, R. J., Hinz, E. R. M., Shah, A., Eyler, A. E., Denny, J. C., & Xu, H. (2013). Applying active learning to high-throughput phenotyping algorithms for electronic health records data. Journal of the American Medical Informatics Association, 20(e2), e253-e259. doi:10.1136/amiajnl-2013-001945
Pecci, A., Klersy, C., Gresele, P., Lee, K. J., De Rocco, D., Bozzi, V., … & Fabris, F. (2014). MYH9‐Related Disease: A Novel Prognostic Model to Predict the Clinical Evolution of the Disease Based on Genotype–Phenotype Correlations. Human Mutation, 35(2), 236-247. doi:10.1002/humu.22476
Hripcsak, G., & Albers, D. J. (2013). Next-generation phenotyping of electronic health records. Journal of the American Medical Informatics Association, 20(1), 117-121. doi:10.1136/amiajnl-2012-001145
Peissig, P. L. D. (1913). COMPUTATIONAL METHODS FOR ELECTRONIC HEALTH RECORD-DRIVEN PHENOTYPING (Doctoral dissertation, UNIVERSITY OF WISCONSIN-MADISON).
Peissig, P. L., Costa, V. S., Caldwell, M. D., Rottscheit, C., Berg, R. L., Mendonca, E. A., & Page, D. (2014). Relational machine learning for electronic health record-driven phenotyping. Journal of biomedical informatics, 52, 260-270. doi:10.1016/j.jbi.2014.07.007
Rasmussen, L. V., Thompson, W. K., Pacheco, J. A., Kho, A. N., Carrell, D. S., Pathak, J., … & Starren, J. B. (2014). Design patterns for the development of electronic health record-driven phenotype extraction algorithms. Journal of Biomedical Informatics, 51, 280-286. doi:10.1016/j.jbi.2014.06.007
Shivade, C., Raghavan, P., Fosler-Lussier, E., Embi, P. J., Elhadad, N., Johnson, S. B., & Lai, A. M. (2014). A review of approaches to identifying patient phenotype cohorts using electronic health records. Journal of the American Medical Informatics Association, 21(2), 221-230. doi:10.1136/amiajnl-2013-001935
Wei, W. Q., Teixeira, P. L., Mo, H., Cronin, R. M., Warner, J. L., & Denny, J. C. (2015). Combining billing codes, clinical notes, and medications from electronic health records provides superior phenotyping performance. Journal of the American Medical Informatics Association, ocv130.
Surveillance
Chai, K. E., Anthony, S., Coiera, E., & Magrabi, F. (2013). Using statistical text classification to identify health information technology incidents. Journal of the American Medical Informatics Association, 20(5), 980-985. doi:10.1136/amiajnl-2012-001409
Dai, W., Brisimi, T. S., Adams, W. G., Mela, T., Saligrama, V., & Paschalidis, I. C. (2015). Prediction of hospitalization due to heart diseases by supervised learning methods. International journal of medical informatics, 84(3), 189-197. doi:10.1016/j.ijmedinf.2014.10.002
Pak, T. R., & Kasarskis, A. (2015). How next-generation sequencing and multiscale data analysis will transform infectious disease management. Clinical Infectious Diseases, 61(11), 1695-1702. doi: 10.1093/cid/civ670
Ye, Y., Tsui, F., Wagner, M., Espino, J. U., & Li, Q. (2014). Influenza detection from emergency department reports using natural language processing and Bayesian network classifiers. Journal of the American Medical Informatics Association, 21(5), 815-823. doi:10.1136/amiajnl-2013-001934
QOL & Dx post-Tx
Penson, D. F., Feng, Z., Kuniyuki, A., McClerran, D., Albertsen, P. C., Deapen, D., … & Stanford, J. L. (2003). General quality of life 2 years following treatment for prostate cancer: what influences outcomes? Results from the prostate cancer outcomes study. Journal of Clinical Oncology, 21(6), 1147-1154. doi:10.1200/JCO.2003.07.139
Econometrics
Newhouse, J. P., & McClellan, M. (1998). Econometrics in outcomes research: the use of instrumental variables. Annual Review of Public Health, 19(1), 17-34. doi:10.1146/annurev.publhealth.19.1.17
Comorbidity Scores
Austin, S.R., Wong, Y.N., Uzzo, R.G., Beck, J.R., Egleston, B.L. (2015). Why Summary Comorbidity Measures Such As the Charlson Comorbidity Index and Elixhauser Score Work. Medical Care, 53(9), e65-72. doi:10.1097/MLR.0b013e318297429c. Accessed at http://www.ncbi.nlm.nih.gov/pubmed/23703645
Bang, J. H., Hwang, S.-H., Lee, E.-J., & Kim, Y. (2013). The predictability of claim-data-based comorbidity-adjusted models could be improved by using medication data. BMC Medical Informatics and Decision Making, 13, 128. http://doi.org/10.1186/1472-6947-13-128
Chu, Y.-T., Ng, Y.-Y., & Wu, S.-C. (2010). Comparison of different comorbidity measures for use with administrative data in predicting short- and long-term mortality. BMC Health Services Research, 10, 140. http://doi.org/10.1186/1472-6963-10-140
Gutacker, N, Bloor, K, Cookson, R. (2015). Comparing the performance of the Charlson/Deyo and Elixhauser comorbidity measures across five European countries and three conditions. European Journal of Public Health. 25 Suppl 1, 15-20. doi:10.1093/eurpub/cku221.
Johnson, A. E., Kramer, A. A., & Clifford, G. D. (2013). A new severity of illness scale using a subset of acute physiology and chronic health evaluation data elements shows comparable predictive accuracy*. Critical Care Medicine, 41(7), 1711-1718. doi: 10.1097/CCM.0b013e31828a24fe [Oxford Acute Severity of Illness Score ; Particle Swarm Optimization]
Menendez, Mariano E. et al. The Elixhauser Comorbidity Method Outperforms the Charlson Index in Predicting Inpatient Death After Orthopaedic Surgery. Clinical Orthopaedics and Related Research 472.9 (2014): 2878–2886. PMC. Web. 24 Jan. 2016.
Schneeweiss, S., Maclure, M. (2000). Use of comorbidity scores for control of confounding in studies using administrative databases. International Journal of Epidemiology, 29(5), 891-8. Accessed at http://www.ncbi.nlm.nih.gov/pubmed/11034974
Stausberg J, Hagn S (2015) New Morbidity and Comorbidity Scores based on the Structure of the ICD-10. PLoS ONE 10(12): e0143365. doi:10.1371/journal.pone.0143365 Accessed at http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4677989/pdf/pone.0143365.pdf
Yang, M., Mehta, H.B., Bali, V., Gupta, P., Wang, X., Johnson, M.L., Aparasu, R. R.
(2015). Which risk-adjustment index performs better in predicting 30-day mortality? A systematic review and meta-analysis. Journal Evaluation Clinical Practice, 21(2), 292-9. doi: 10.1111/jep.12307. [Includes several speciality disease scores]
Genomics
Castro, V. M., Clements, C. C., Murphy, S. N., Gainer, V. S., Fava, M., Weilburg, J. B., … & Smoller, J. W. (2013). QT interval and antidepressant use: a cross sectional study of electronic health records. BMJ, 346, f288. doi:10.1136/bmj.f288
Costa, F. F. (2014). Big data in biomedicine. Drug discovery today, 19(4), 433-440. doi:10.1016/j.drudis.2013.10.012
Khoury, M. J., Rich, E. C., Randhawa, G., Teutsch, S. M., & Niederhuber, J. (2009). Comparative effectiveness research and genomic medicine: an evolving partnership for 21st century medicine. Genetics in Medicine, 11(10), 707-711. doi:10.1097/GIM.0b013e3181b99b90
Analytics (other)
Schulam, P., Wigley, F., & Saria, S. (2015, February). Clustering Longitudinal Clinical Marker Trajectories from Electronic Health Data: Applications to Phenotyping and Endotype Discovery. In Twenty-Ninth AAAI Conference on Artificial Intelligence.
*********************************
Part 3 – Risk Analysis
Predicting Risk for Diabetes
Eggleston, E. M., & Klompas, M. (2014). Rational use of electronic health records for diabetes population management. Current Diabetes Reports, 14(4), 1-10. 10.1007/s11892-014-0479-z
Exalto, L. G., Biessels, G. J., Karter, A. J., Huang, E. S., Katon, W. J., Minkoff, J. R., & Whitmer, R. A. (2013). Risk score for prediction of 10 year dementia risk in individuals with type 2 diabetes: a cohort study. The Lancet Diabetes & Endocrinology, 1(3), 183-190. doi:10.1016/S2213-8587(13)70048-2
Herman, W. H. (2009). Predicting risk for diabetes: choosing (or building) the right model. Annals of Internal Medicine, 150(11), 812-814.
Jin, H., & Benyshek, D. C. (2013). The “metabolic syndrome index”: A novel, comprehensive method for evaluating the efficacy of diabetes prevention programs. doi:10.4236/jdm.2013.32014
Lawrence, J. M., Black, M. H., Zhang, J. L., Slezak, J. M., Takhar, H. S., Koebnick, C., … & Reynolds, K. (2013). Validation of pediatric diabetes case identification approaches for diagnosed cases by using information in the electronic health records of a large integrated managed health care organization. American Journal of Epidemiology, kwt230. doi:10.1093/aje/kwt230
Makam, A. N., Nguyen, O. K., Moore, B., Ma, Y., & Amarasingham, R. (2013). Identifying patients with diabetes and the earliest date of diagnosis in real time: an electronic health record case-finding algorithm. BMC medical informatics and decision making, 13(1), 81. doi:10.1186/1472-6947-13-81
Onitilo, A. A., Stankowski, R. V., Berg, R. L., Engel, J. M., Williams, G. M., & Doi, S. A. (2014). A novel method for studying the temporal relationship between type 2 diabetes mellitus and cancer using the electronic medical record. BMC medical informatics and decision making, 14(1), 38. doi:10.1186/1472-6947-14-38
Reed, M., Huang, J., Brand, R., Graetz, I., Neugebauer, R., Fireman, B., … & Hsu, J. (2013). Implementation of an outpatient electronic health record and emergency department visits, hospitalizations, and office visits among patients with diabetes. JAMA, 310(10), 1060-1065. doi:10.1001/jama.2013.276733.
Riaz, M., Basit, A., Hydrie, M. Z. I., Shaheen, F., Hussain, A., Hakeem, R., & Shera, A. S. (2012). Risk assessment of Pakistani individuals for diabetes (RAPID). Primary care diabetes, 6(4), 297-302. doi:10.1016/j.pcd.2012.04.002
Tankova, T., Chakarova, N., Atanassova, I., & Dakovska, L. (2011). Evaluation of the Finnish Diabetes Risk Score as a screening tool for impaired fasting glucose, impaired glucose tolerance and undetected diabetes. Diabetes Research and Clinical Practice, 92(1), 46-52. doi:10.1016/j.diabres.2010.12.020
Wang, H., Liu, T., Qiu, Q., Karp, E., Ding, P., He, Y. H., & Chen, W. Q. (2015). Development and validation of a simple risk score for prevalent undiagnosed type 2 diabetes in Southern Chinese population. International Journal of Diabetes in Developing Countries, 35(3), 1-9. doi:10.1007/s13410-014-0285-9
Prediction, Risk, in General
Bandyopadhyay, S., Wolfson, J., Vock, D. M., Vazquez-Benitez, G., Adomavicius, G., Elidrisi, M., … & O’Connor, P. J. (2014). Data mining for censored time-to-event data: A Bayesian network model for predicting cardiovascular risk from electronic health record data. Data Mining and Knowledge Discovery, 1-37. doi: 10.1007/s10618-014-0386-6
Eggleston, E. M., & Weitzman, E. R. (2014). Innovative uses of electronic health records and social media for public health surveillance. Current Diabetes Reports, 14(3), 1-9. doi:10.1007/s11892-013-0468-7
Fox, K. A., Dabbous, O. H., Goldberg, R. J., Pieper, K. S., Eagle, K. A., Van de Werf, F., … & Granger, C. B. (2006). Prediction of risk of death and myocardial infarction in the six months after presentation with acute coronary syndrome: prospective multinational observational study (GRACE). BMJ, 333(7578), 1091. doi:10.1136/bmj.38985.646481.55
Goldstein, B. A., Chang, T. I., Mitani, A. A., Assimes, T. L., & Winkelmayer, W. C. (2014). Near-term prediction of sudden cardiac death in older hemodialysis patients using electronic health records. Clinical Journal of the American Society of Nephrology, 9(1), 82-91. doi:10.2215/CJN.03050313
Gultepe, E., Green, J. P., Nguyen, H., Adams, J., Albertson, T., & Tagkopoulos, I. (2014). From vital signs to clinical outcomes for patients with sepsis: a machine learning basis for a clinical decision support system. Journal of the American Medical Informatics Association, 21(2), 315-325. doi:10.1136/amiajnl-2013-001815
Himes, B. E., Dai, Y., Kohane, I. S., Weiss, S. T., & Ramoni, M. F. (2009). Prediction of chronic obstructive pulmonary disease (COPD) in asthma patients using electronic medical records. Journal of the American Medical Informatics Association, 16(3), 371-379. doi:10.1197/jamia.M2846
Hubbard, R. (2014). Statistical methods for misclassified outcomes and exposures in data from electronic medical records. [Report]. Accessed at https://www.grouphealthresearch.org/biostat-symposium/HUBBARD_Statistical_Methods_for_Misclassified_Outcomes_and_Exposures_in_Data_from_EMRs_2014.pdf
Li, D., Simon, G., Chute, C. G., & Pathak, J. (2013). Using Association Rule Mining for Phenotype Extraction from Electronic Health Records . AMIA Summits on Translational Science Proceedings, 2013, 142–146. [ARM Model building]
Mani, S., Ozdas, A., Aliferis, C., Varol, H. A., Chen, Q., Carnevale, R., … & Weitkamp, J. H. (2014). Medical decision support using machine learning for early detection of late-onset neonatal sepsis. Journal of the American Medical Informatics Association, 21(2), 326-336. doi:10.1136/amiajnl-2013-001854
Melton, L. J., Atkinson, E. J., St Sauver, J. L., Achenbach, S. J., Therneau, T. M., Rocca, W. A., & Amin, S. (2014). Predictors of Excess Mortality After Fracture: A Population‐Based Cohort Study. Journal of Bone and Mineral Research, 29(7), 1681-1690. doi:10.1002/jbmr.2193
Murray, R. E., Ryan, P. B., & Reisinger, S. J. (2011). Design and validation of a data simulation model for longitudinal healthcare data. In AMIA Annual Symposium Proceedings (Vol. 2011, p. 1176). American Medical Informatics Association.
Pearson, J. F., Brownstein, C. A., & Brownstein, J. S. (2011). Potential for electronic health records and online social networking to redefine medical research. Clinical chemistry, 57(2), 196-204. doi:10.1373/clinchem.2010.148668
Ryan, P. B., Schuemie, M. J., Gruber, S., Zorych, I., & Madigan, D. (2013). Empirical performance of a new user cohort method: lessons for developing a risk identification and analysis system. Drug safety, 36(1), 59-72. doi:10.1007/s40264-013-0099-6
Ryan, P. B., Schuemie, M. J. (2013). Evaluating Performance of Risk Identification Methods Through a Large-Scale Simulation of Observational Data. Drug Safety, 36(1), 171-180. doi:10.1007/s40264-013-0110-2
Weiss, J. C., Natarajan, S., Peissig, P. L., McCarty, C. A., & Page, D. (2012, July). Statistical Relational Learning to Predict Primary Myocardial Infarction from Electronic Health Records. In IAAI. Twenty-Fourth IAAI Conference, Toronto, Ontario, Canada, July 22, 2012 – July 26, 2012
***************************************
Part 4 – Other Applications, Methodology Information
NLP
Gobbel, G. T., Reeves, R., Jayaramaraja, S., Giuse, D., Speroff, T., Brown, S. H., … & Matheny, M. E. (2014). Development and evaluation of RapTAT: a machine learning system for concept mapping of phrases from medical narratives. Journal of biomedical informatics, 48, 54-65. doi:10.1016/j.jbi.2013.11.008
Jonnagaddala, J., Dai, H. J., Ray, P., & Liaw, S. T. (2015). A preliminary study on automatic identification of patient smoking status in unstructured electronic health records. ACL-IJCNLP 2015, 147. Accessed at http://www.aclweb.org/anthology/W15-38#page=159
Poulin, C., Shiner, B., Thompson, P., Vepstas, L., Young-Xu, Y., Goertzel, B., … & McAllister, T. (2014). Predicting the risk of suicide by analyzing the text of clinical notes. PloS one, 9(1). doi:10.1371/journal.pone.0085733
Strauss, J. A., Chao, C. R., Kwan, M. L., Ahmed, S. A., Schottinger, J. E., & Quinn, V. P. (2013). Identifying primary and recurrent cancers using a SAS-based natural language processing algorithm. Journal of the American Medical Informatics Association, 20(2), 349-355. doi:10.1136/amiajnl-2012-000928
Zheng, C., Rashid, N., Wu, Y. L., Koblick, R., Lin, A. T., Levy, G. D., & Cheetham, T. C. (2014). Using natural language processing and machine learning to identify gout flares from electronic clinical notes. Arthritis Care & Research, 66(11), 1740-1748. doi:10.1002/acr.22324
Temporal
Hripcsak, G., Albers, D. J., & Perotte, A. (2015). Parameterizing time in electronic health record studies. Journal of the American Medical Informatics Association, ocu051. doi:10.1093/jamia/ocu051
Software Other
Mowery, D., Wiebe, J., Ross, M., Vellupillai, S., Mystere, S., Chapman, W. W. Generating Patient Problem Lists from the ShARe Corpus using SNOMED CT/SNOMED CT CORE Problem List In Proceedings of the 2014 Workshop on Biomedical Natural Language Processing (BioNLP 2014) (pages 54–58). Baltimore, Maryland USA, June 26-27 2014. Accessed at http://www.aclweb.org/old_anthology/W/W14/W14-34.pdf#page=66
Ng, K., Ghoting, A., Steinhubl, S. R., Stewart, W. F., Malin, B., & Sun, J. (2014). PARAMO: A PARAllel predictive MOdeling platform for healthcare analytic research using electronic health records. Journal of biomedical informatics, 48, 160-170. doi:10.1016/j.jbi.2013.12.012
Savova, G. K., Masanz, J. J., Ogren, P. V., Zheng, J., Sohn, S., Kipper-Schuler, K. C., & Chute, C. G. (2010). Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. Journal of the American Medical Informatics Association, 17(5), 507-513. doi:10.1136/jamia.2009.001560
Software Sentinel
Behrman, R. E., Benner, J. S., Brown, J. S., McClellan, M., Woodcock, J., & Platt, R. (2011). Developing the Sentinel System—a national resource for evidence development. New England Journal of Medicine, 364(6), 498-499. doi:10.1056/NEJMp1014427
Curtis, L. H., Weiner, M. G., Boudreau, D. M., Cooper, W. O., Daniel, G. W., Nair, V. P., … & Brown, J. S. (2012). Design considerations, architecture, and use of the Mini‐Sentinel distributed data system. Pharmacoepidemiology and Drug Safety, 21(S1), 23-31. 10.1002/pds.2336
Madigan, D., & Ryan, P. (2011). Commentary: What Can We Really Learn From Observational Studies?: The Need for Empirical Assessment of Methodology for Active Drug Safety Surveillance and Comparative Effectiveness Research. Epidemiology, 22(5), 629-631. doi:10.1097/EDE.0b013e318228ca1d
Maro, J. C., Platt, R., Holmes, J. H., Strom, B. L., Hennessy, S., Lazarus, R., & Brown, J. S. (2009). Design of a national distributed health data network. Annals of Internal Medicine, 151(5), 341-344. doi:10.7326/0003-4819-151-5-200909010-00139
Overhage, J. M., Ryan, P. B., Reich, C. G., Hartzema, A. G., & Stang, P. E. (2012). Validation of a common data model for active safety surveillance research. Journal of the American Medical Informatics Association, 19(1), 54-60. doi:10.1136/amiajnl-2011-000376 [Values]
Reich, C., Ryan, P. B., Stang, P. E., & Rocca, M. (2012). Evaluation of alternative standardized terminologies for medical conditions within a network of observational healthcare databases. Journal of Biomedical Informatics, 45(4), 689-696. doi:10.1016/j.jbi.2012.05.002
Ryan, P. B., Madigan, D., Stang, P. E., Marc Overhage, J., Racoosin, J. A., & Hartzema, A. G. (2012). Empirical assessment of methods for risk identification in healthcare data: results from the experiments of the Observational Medical Outcomes Partnership. Statistics in Medicine, 31(30), 4401-4415. doi:10.1002/sim.5620
Stang, P. E., Ryan, P. B., Racoosin, J. A., Overhage, J. M., Hartzema, A. G., Reich, C., … & Woodcock, J. (2010). Advancing the science for active surveillance: rationale and design for the Observational Medical Outcomes Partnership. Annals of internal medicine, 153(9), 600-606. doi:10.7326/0003-4819-153-9-201011020-00010
Stang, P. E., Ryan, P. B., Dusetzina, S. B., Hartzema, A. G., Reich, C., Overhage, J. M., & Racoosin, J. A. (2012). Health outcomes of interest in observational data: issues in identifying definitions in the literature. Health Outcomes Research in Medicine, 3(1), e37-e44. doi:10.1016/j.ehrm.2011.11.003
Coloma, P. M., Trifirò, G., Schuemie, M. J., Gini, R., Herings, R., Hippisley‐Cox, J., … & Lei, J. (2012). Electronic healthcare databases for active drug safety surveillance: is there enough leverage?. Pharmacoepidemiology and Drug Safety, 21(6), 611-621. doi:10.1002/pds.3197
********************************************
MISC (some nice reads on this topic)
For more on OMOP, see http://omop.org/CDM
Brooks, R., & Grotz, C. (2010). Implementation of electronic medical records: How healthcare providers are managing the challenges of going digital. Journal of Business & Economics Research (JBER), 8(6). doi:10.19030/jber.v8i6.736
Hansen, M. M., Miron-Shatz, T., Lau, A. Y. S., & Paton, C. (2014). Big Data in Science and Healthcare: A Review of Recent Literature and Perspectives: Contribution of the IMIA Social Media Working Group. Yearbook of Medical Informatics, 9(1), 21-26. doi:10.15265/IY-2014-0004 [Table provides examples: location use, visualization, assess disease spread, evaluate cause, predict, define social and environmental factors, crisis and disaster management planning, tracking, storing and mining population health data, bring together data from different sources, monitor, cost model]
Herland, M., Khoshgoftaar, T. M., & Wald, R. (2013, December). Survey of Clinical Data Mining Applications on Big Data in Health Informatics. In Machine Learning and Applications (ICMLA), 2013 12th International Conference (Vol. 2, pp. 465-472). IEEE. doi:10.1109/ICMLA.2013.163
Kennedy, E. H., Wiitala, W. L., Hayward, R. A., & Sussman, J. B. (2013). Improved cardiovascular risk prediction using nonparametric regression and electronic health record data. Medical care, 51(3), 251. doi:10.1097/MLR.0b013e31827da594
Kerr, W. T., Lau, E. P., Owens, G. E., & Trefler, A. (2012). The future of medical diagnostics: large digitized databases. The Yale journal of biology and medicine, 85(3), 363.
Kushida, C. A., Nichols, D. A., Jadrnicek, R., Miller, R., Walsh, J. K., & Griffin, K. (2012). Strategies for de-identification and anonymization of electronic health record data for use in multicenter research studies. Medical care, 50, S82-S101. doi:10.1097/MLR.0b013e3182585355
Mukherjee, B. (2012). EHR READINESS AND CLINICAL INFORMATION MANAGEMENT: STAKEHOLDER CONSULTATION AND ANALYSIS. McMaster University, Ontario, Canada
Murdoch, T. B., & Detsky, A. S. (2013). The inevitable application of big data to health care. Jama, 309(13), 1351-1352.
Pathak, J., Kho, A. N., & Denny, J. C. (2013). Electronic health records-driven phenotyping: challenges, recent advances, and perspectives. Journal of the American Medical Informatics Association, 20(e2), e206-e211. doi:10.1136/amiajnl-2013-002428
Savage, N. (2012). Better medicine through machine learning. Communications of the ACM, 55(1), 17-19. doi:10.1145/2063176.2063182
Schneeweiss, S. (2014). Learning from big health care data. New England Journal of Medicine, 370(23), 2161-2163. doi:10.1056/NEJMp1401111
Scruggs, S. B., Watson, K., Su, A. I., Hermjakob, H., Yates, J. R., Lindsey, M. L., & Ping, P. (2015). Harnessing the Heart of Big Data. Circulation Research, 116(7), 1115-1119. doi:10.1161/CIRCRESAHA.115.306013
Sweeney, L. (2002). k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(05), 557-570. doi:10.1142/S0218488502001648