Published Dec 14, 2017


Google Scholar
Search GoogleScholar

Alexandra Pomares-Quimbaya, PhD

Rafael Andres González, PhD

Oscar Mauricio Muñoz-Vendia, MSc

Ricardo Bohorquez-Rodriguez, Esp

Olga Milena García-Morales, Esp



Objective: Electronic medical records (EMR) typically contain both structured attributes as well as narrative text. The usefulness of EMR for research and administration is hampered by the difficulty in automatically analyzing their narrative portions. Accordingly, this paper proposes SPIRE, a strategy for prioritizing EMR, using natural language processing in combination with analysis of structured data, in order to identify and rank EMR that match specific queries from clinical researchers and health administrators.Materials and Methods: The resulting software tool was evaluated technically and validated with three cases (heart failure, pulmonary hypertension and diabetes mellitus) compared against expert obtained results.Results and Discussion: Our preliminary results show high sensitivity (70%, 82% and 87% respectively) and specificity (85%, 73.7% and 87.5%) in the resulting set of records. The AUC was between 0.84 and 0.9.Conclusions: SPIRE was successfully implemented and used in the context of a university hospital information system, enabling clinical researchers to obtain prioritized EMR to solve their information needs through collaborative search templates with faster and more accurate results than other existing methods.


electronic medical records, natural language processing, narrative textRegistros médicos electrónicos, texto narrativo, procesamiento de lenguaje natural

[1] D. A. Ludwick and J. Doucette, “Adopting electronic medical records in primary care: lessons learned from health information systems implementation experience in seven countries,” Int. J. Med. Inform., vol. 78, no. 1, pp. 22–31, Jan. 2009. doi: 10.1016/j.ijmedinf.2008.06.005
[2] M. J. Howley, E. Y. Chou, N. Hansen, and P. W. Dalrymple, “The long-term financial impact of electronic health record implementation,” J. Am. Med. Inform. Assoc., vol. 22 ,no. 2, pp. 443–452, Mar. 2015. doi: 10.1136/amiajnl-2014-002686
[3] A. Jamal, K. McKenzie, and M. Clark, “The impact of health information technology onthe quality of medical and health care: a systematic review,” Health. Inf. Manag. J., vol. 38, no. 3, pp. 26–37, 2009.
[4] O. Ben-Assuli, A. Ziv, D. Sagi, A. Ironi, and M. Leshno, “Cost-Effectiveness Evaluation of EHR: Simulation of an Abdominal Aortic Aneurysm in the Emergency Department,” J. Med. Syst., vol. 40, no. 6, Jun. 2016. doi: 10.1007/s10916-016-0502-9
[5] A. Boonstra, A. Versluis, and J. F. J. Vos, “Implementing electronic health records in hospitals: A systematic literature review,” BMC Health Services Research, vol. 14, no. 1, 2014.
[6] H. Hyppönen et al., “Impacts of structuring the electronic health record: A systematic review protocol and results of previous reviews,” Int. J. Med. Informatics, vol. 83, no. 3, pp. 159–169, 2014.
[7] K. Tu, D. Manuel, K. Lam, D. Kavanagh, T. F. Mitiku, and H. Guo, “Diabetics can be identified in an electronic medical record using laboratory tests and prescriptions,” J. Clin. demiol., vol. 64, no. 4, pp. 431–435, Apr. 2011.
[8] M. Couralet et al., “Method for developing national quality indicators based on manual data extraction from medical records,” BMJ Quality and Safety, vol. 22, no. 2, pp. 155–162, 2013.
[9] S. DeLisle et al., “Using the Electronic Medical Record to Identify Community-Acquired Pneumonia: Toward a Replicable Automated Strategy,” PLoS ONE, vol. 8, no. 8, 2013.
[10] J. D. Osborne, M. Wyatt, A. O. Westfall, J. Willig, S. Bethard, and G. Gordon, “Efficient identification of nationally mandated reportable cancer cases using natural language processing and machine learning,” J. Am. Med. Inform. Assoc., vol. 23, no. 6, pp. 1077–1084, Nov. 2016.
[11] R. Vuokko, P. Mäkelä-Bengs, H. Hyppönen, and P. Doupi, “Secondary Use of Structured Patient Data: Interim Results of A Systematic Review,” presented at the Studies in Health Technology and Informatics, 2015, vol. 210, pp. 291–295.
[12] S. J. Athenikos, H. Han, and A. D. Brooks, “A Framework of a Logic-based Questionanswering System for the Medical Domain (LOQAS-Med),” in Proceedings of the 2009 ACM Symposium on Applied Computing, New York, NY, USA, 2009, pp. 847–851.
[13] S. T. Wu et al., “Automated chart review for asthma cohort identification using natural language processing: an exploratory study,” Ann. Allergy Asthma Immunol., vol. 111, no. 5, pp. 364–369, Nov. 2013.
[14] A. Wright, A. B. McCoy, S. Henkin, A. Kale, and D. F. Sittig, “Use of a support vector machine for categorizing free-text notes: assessment of accuracy across two institutions,” J. Am. Med. Inform. Assoc., vol. 20, no. 5, pp. 887–890, Oct. 2013.
[15] D. S. Carrell et al., “Using natural language processing to improve efficiency of manual chart abstraction in research: the case of breast cancer recurrence,” Am. J. Epidemiol., vol. 179, no. 6, pp. 749–758, Mar. 2014.
[16] C. D. Manning and H. Schütze, Foundations of Statistical Natural Language Processing, 1st edition. Cambridge, Mass: The MIT Press, 1999.
[17] P. M. Nadkarni, L. Ohno-Machado, and W. W. Chapman, “Natural language processing: an introduction,” J. Am. Med. Inform. Assoc., vol. 18, no. 5, pp. 544–551, 2011.
[18] A. Hotho, A. Nürnberger, and G. Paaß, “A brief survey of text mining,” LDV Forum - GLDV J. Computational Linguistics Lang. Technol., vol. 20, no. 1, pp. 19–62, 2005.
[19] K. P. Huang, S. Mullangi, Y. Guo, and A. A. Qureshi, “Autoimmune, atopic, and mental health comorbid conditions associated with alopecia areata in the united states,” JAMA Dermatol, vol. 149, no. 7, pp. 789–794, Jul. 2013.
[20] S. E. Williams, R. Carnahan, and M. L. McPheeters, “A systematic review of validated methods for identifying uveitis using administrative or claims data,” Vaccine, vol. 31 Suppl 10, pp. K88-97, Dec. 2013.
[21] P. L. Peissig et al., “Importance of multi-modal approaches to effectively identify cataract cases from electronic health records,” J. Am. Med. Inform. Assoc., vol. 19, no. 2, pp. 225–234, Apr. 2012.
[22] F. FitzHenry et al., “Exploring the frontier of electronic health record surveillance: the case of postoperative complications,” Med Care, vol. 51, no. 6, pp. 509–516, Jun. 2013.
[23] S. V. Iyer, R. Harpaz, P. Lependu, A. Bauer-Mehren, and N. H. Shah, “Mining clinical text for signals of adverse drug-drug interactions,” J. Am. Med. Inform. Assoc., Oct. 2013.
[24] P. LePendu et al., “Pharmacovigilance using clinical notes,” Clin. Pharmacol. Ther., vol. 93, no. 6, pp. 547–555, Jun. 2013.
[25] H.-M. Lu et al., “Multilingual chief complaint classification for syndromic surveillance: an experiment with Chinese chief complaints,” Int. J. Med. Inform., vol. 78, no. 5, pp. 308–320, May 2009.
[26] J. St-Maurice, M.-H. Kuo, and P. Gooch, “A proof of concept for assessing emergency room use with primary care data and natural language processing,” Methods Inf. Med., vol. 52, no. 1, pp. 33–42, 2013.
[27] A. Mehrotra et al., “Applying a natural language processing tool to electronic health records to assess performance on colonoscopy quality measures,” Gastrointest. Endosc., vol. 75, no. 6, p. 1233–1239.e14, Jun. 2012.
[28] A. J. McMurry, B. Fitch, G. Savova, I. S. Kohane, and B. Y. Reis, “Improved de-identification of physician notes through integrative modeling of both public and private medical text,” BMC Medical Informatics and Decision Making, vol. 13, p. 112, 2013.
[29] E. Chazard, C. Mouret, G. Ficheur, A. Schaffar, J.-B. Beuscart, and R. Beuscart, “Proposal and evaluation of FASDIM, a Fast And Simple De-Identification Method for unstructured free-text clinical records,” Int. J. Med. Inform., vol. 83, no. 4, pp. 303–312, Apr. 2014.
[30] Z. Liu et al., “Automatic de-identification of electronic medical records using token-level and character-level conditional random fields,” J. Biomed. Inform., vol. 58, pp. S47–S52, 2015.
[31] Z. Liu, B. Tang, X. Wang, and Q. Chen, “De-identification of clinical notes via recurrent neural network and conditional random field,” 2017.
[32] A. Kovacevic, A. Dehghan, M. Filannino, J. A. Keane, and G. Nenadic, “Combining rules and machine learning for extraction of temporal expressions and events from clinical narratives,” J. Am. Med. Inform. Assoc., vol. 20, no. 5, pp. 859–866, Oct. 2013.
[33] R. M. Reeves et al., “Detecting temporal expressions in medical narratives,” Int. J. Med. Inform., vol. 82, no. 2, pp. 118–127, Feb. 2013.
[34] L. F. Soualmia, E. Prieur-Gaston, Z. Moalla, T. Lecroq, and S. J. Darmoni, “Matching health information seekers’ queries to medical terms,” BMC Bioinformatics, vol. 13 Suppl 14, p. S11, 2012.
[35] J. Zheng, W. W. Chapman, R. S. Crowley, and G. K. Savova, “Coreference resolution: a review of general methodologies and applications in the clinical domain,” J. Biomed. Inform., vol. 44, no. 6, pp. 1113–1122, Dec. 2011.
[36] R. Cohen, I. Aviram, M. Elhadad, and N. Elhadad, “Redundancy-Aware Topic Modeling for Patient Record Notes,” PLOS ONE, vol. 9, no. 2, p. e87555, Feb. 2014.
[37] J. S. Hirsch et al., “HARVEST, a longitudinal patient record summarizer,” J. Am. Med. Inform. Assoc., vol. 22, no. 2, pp. 263–274, 2015.
[38] D. A. Hanauer, Q. Mei, J. Law, R. Khanna, and K. Zheng, “Supporting information retrieval from electronic health records: A report of University of Michigan’s nine-year experience in developing and using the Electronic Medical Record Search Engine (EMERSE),” J. Biomed. Inform., vol. 55, pp. 290–300, Jun. 2015.
[39] E. J. Campbell, A. Krishnaraj, M. Harris, S. Saini, and J. M. Richter, “Automated beforeprocedure electronic health record screening to assess appropriateness for GI endoscopy and sedation,” Gastrointest. Endosc., vol. 76, no. 4, pp. 786–792, Oct. 2012.
[40] H. Cunningham et al., Text Processing with GATE (Version 6). University of Sheffield Department of Computer Science, 2011.
[41] “ - index.html.” [Online]. Available: [Accessed: 30-Nov- 2017].
[42] A. Pomares-Quimbaya et al., “Concept Attribute Labeling and Context-Aware Named Entity Recognition in Electronic Health Records,” IJRQEH, vol. 7, no. 1, pp. 1–15, Jan. 2018.
[43] World Health Organization, “International Statistical Classification of Diseases and Related Health Problems 10th Revision,” 2010. [Online]. Available: [Accessed: 01-Dec-2017].
[44] R. L. Keeney and H. Raiffa, Decisions with Multiple Objectives: Preferences and Value Tradeoffs. Cambridge England; New York, NY, USA: Cambridge University Press, 1993.
[45] A. Pomares-Quimbaya, R. A. González, W.-R. Bohórquez, O. M. Muñoz, O. M. García, and D. Londoño, “Improving Decision-Making for Clinical Research and Health Administration,” in Engineering and Management of IT-based Service Systems, M. Mora, J. M. Gómez,L. Garrido, and F. C. Pérez, Eds. Springer Berlin Heidelberg, 2014, pp. 179–200.
How to Cite
Pomares-Quimbaya, A., González, R. A., Muñoz-Vendia, O. M., Bohorquez-Rodriguez, R., & García-Morales, O. M. (2017). A strategy for prioritizing electronic medical records using structured analysis and natural language processing. Ingenieria Y Universidad, 22(1), 7–31.
Industrial and systems engineering

Most read articles by the same author(s)