Methods Inf Med 2021; 60(03/04): 095-103
DOI: 10.1055/s-0041-1733945
Original Article

The Pipeline for Standardizing Russian Unstructured Allergy Anamnesis Using FHIR AllergyIntolerance Resource

Iuliia D. Lenivtceva
1   National Center for Cognitive Research, ITMO University, Saint-Petersburg, Russia
,
Georgy Kopanitsa
1   National Center for Cognitive Research, ITMO University, Saint-Petersburg, Russia
› Author Affiliations
Funding This work is financially supported by National Center for Cognitive Research of ITMO University. This work was financially supported by the Government of the Russian Federation through the ITMO fellowship and professorship program.

Abstract

Background The larger part of essential medical knowledge is stored as free text which is complicated to process. Standardization of medical narratives is an important task for data exchange, integration, and semantic interoperability.

Objectives The article aims to develop the end-to-end pipeline for structuring Russian free-text allergy anamnesis using international standards.

Methods The pipeline for free-text data standardization is based on FHIR (Fast Healthcare Interoperability Resources) and SNOMED CT (Systematized Nomenclature of Medicine Clinical Terms) to ensure semantic interoperability. The pipeline solves common tasks such as data preprocessing, classification, categorization, entities extraction, and semantic codes assignment. Machine learning methods, rule-based, and dictionary-based approaches were used to compose the pipeline. The pipeline was evaluated on 166 randomly chosen medical records.

Results AllergyIntolerance resource was used to represent allergy anamnesis. The module for data preprocessing included the dictionary with over 90,000 words, including specific medication terms, and more than 20 regular expressions for errors correction, classification, and categorization modules resulted in four dictionaries with allergy terms (total 2,675 terms), which were mapped to SNOMED CT concepts. F-scores for different steps are: 0.945 for filtering, 0.90 to 0.96 for allergy categorization, 0.90 and 0.93 for allergens reactions extraction, respectively. The allergy terminology coverage is more than 95%.

Conclusion The proposed pipeline is a step to ensure semantic interoperability of Russian free-text medical records and could be effective in standardization systems for further data exchange and integration.

Supplementary Material



Publication History

Received: 24 March 2021

Accepted: 26 June 2021

Article published online:
23 August 2021

© 2021. Thieme. All rights reserved.

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany

 
  • References

  • 1 Kong HJ. Managing unstructured big data in healthcare system. Healthc Inform Res 2019; 25 (01) 1-2
  • 2 Clarke JM, Warren LR, Arora S, Barahona M, Darzi AW. Guiding interoperable electronic health records through patient-sharing networks. NPJ Digit Med 2018; 1: 65
  • 3 Santos MR, Bax MP, Kalra D. Building a logical EHR architecture based on ISO 13606 standard and semantic web technologies. Stud Health Technol Inform 2010; 160 (Pt 1): 161-165
  • 4 Mascia C, Uva P, Leo S, Zanetti G. OpenEHR modeling for genomics in clinical practice. Int J Med Inform 2018; 120: 147-156
  • 5 Andersen MV, Kristensen IH, Larsen MM, Pedersen CH, Gøeg KR, Pape-Haugaard LB. Feasibility of representing a Danish microbiology model using FHIR. Stud Health Technol Inform 2017; 235: 13-17
  • 6 Spackman KA, Campbell KE, Côté RA. SNOMED RT: a reference terminology for health care. Proc a Conf Am Med Informatics Assoc AMIA Fall Symp 1997; 4: 640-644
  • 7 Fiebeck J, Gietzelt M, Ballout S. et al. Implementing LOINC: Current status and ongoing work at the Hannover Medical School. Stud Health Technol Inform 2019; 258: 247-248
  • 8 Hong N, Wen A, Shen F. et al. Developing a scalable FHIR-based clinical data normalization pipeline for standardizing and integrating unstructured and structured electronic health record data. JAMIA Open 2019; 2 (04) 570-579
  • 9 Wang Z, Shah AD, Tate AR, Denaxas S, Shawe-Taylor J, Hemingway H. Extracting diagnoses and investigation results from unstructured text in electronic health records by semi-supervised machine learning. PLoS One 2012; 7 (01) e30412
  • 10 Ali AR, Ijaz M. Urdu text classification. Paper presented at: Proceedings of the 6th International Conference on Frontiers of Information Technology, FIT ’09; Abbottabad, Pakistan; 2009
  • 11 Toldova S, Lyashevskaya O, Bonch-Osmolovskaya A, Ionov M. Evaluation for morphologically rich language: Russian NLP. Paper presented at: Proceedings on the International Conference on Artificial Intelligence (ICAI); Las Vegas: CSREA Press; 2015
  • 12 Rindflesch TC, Fiszman M. The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text. J Biomed Inform 2003; 36 (06) 462-477
  • 13 Moon S, Pakhomov S, Melton GB. Automated disambiguation of acronyms and abbreviations in clinical texts: window and training size considerations. AMIA Annu Symp Proc 2012; 2012: 1310-1319
  • 14 Lucini FRS, Fogliatto FS, da Silveira GJC. et al. Text mining approach to predict hospital admissions using early medical records from the emergency department. Int J Med Inform 2017; 100: 1-8
  • 15 Bondarenko A. A corpus-based contrastive study of verbless sentences: quantitative and qualitative perspectives. Stud Neophilol 2019; 91 (02) 175-198
  • 16 Panchenko A, Lopukhina A, Ustalov D. et al. RUSSE'2018: a shared task on word sense induction for the Russian Language. Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference; Dialogue 2018. Moscow, May 30–June 2, 2018
  • 17 Rozovskaya A, Roth D. Grammar error correction in morphologically rich languages: the case of Russian. Trans Assoc Comput Linguist 2019; 7: 1-17
  • 18 Nikishina I, Bakarov A, Kutuzov A. RusNLP: Semantic search engine for Russian NLP conference papers. Lect Notes Comput Sci 2018; 11179: 111-120
  • 19 Nikolaev K, Malafeev A. Russian Q&A method study: from Naive Bayes to convolutional neural networks. Lect Notes Comput Sci 2018; 11179: 121-126
  • 20 Vatian A, Dobrenko N, Makarenko A. et al. Adaptation of algorithms for medical information retrieval for working on Russian-language text content. Lect Notes Comput Sci 2018; 11107: 106-114
  • 21 Sun W, Cai Z, Li Y, Liu F, Fang S, Wang G. Data processing and text mining technologies on electronic medical records: a review. J Healthc Eng 2018; 2018: 4302425
  • 22 Cronin RM, Fabbri D, Denny JC, Rosenbloom ST, Jackson GP. A comparison of rule-based and machine learning approaches for classifying patient portal messages. Int J Med Inform 2017; 105: 110-120
  • 23 Quimbaya AP, Múnera AS, Rivera RAG. et al. Named entity recognition over electronic health records through a combined dictionary-based approach. Proc Comput Sci 2016; 100: 55-61
  • 24 Dong X, Qian L, Guan Y, Huang L, Yu Q, Yang J. A multiclass classification method based on deep learning for named entity recognition in electronic medical records. Paper presented at: 2016 New York Scientific Data Summit, NYSDS 2016 - Proceedings; New York: Institute of Electrical and Electronics Engineers Inc.; 17 November 2016
  • 25 Ji B, Liu R, Li S. et al. A hybrid approach for named entity recognition in Chinese electronic medical record. BMC Med Inform Decis Mak 2019; 19 (Suppl. 02) 64
  • 26 Sebastiani F. Machine learning in automated text categorization. ACM Comput Surv 2002; 34 (01) 1-47
  • 27 Bhattacharya M, Jurkovitz C, Shatkay H. Identifying patterns of associated-conditions through topic models of Electronic Medical Records. Paper presented at: Proceedings - 2016 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2016; Shenzhen, China: Institute of Electrical and Electronics Engineers Inc.; 17 January 2017
  • 28 Dudchenko A, Ganzinger M, Kopanitsa G. Diagnoses detection in short snippets of narrative medical texts. Proc Comput Sci 2019; 156: 150-157
  • 29 Zhang Z, Zhou T, Zhang Y, Pang Y. Attention-based deep residual learning network for entity relation extraction in Chinese EMRs. BMC Med Inform Decis Mak 2019; 19 (Suppl. 02) 55
  • 30 Pestian JP, Brew C, Matykiewicz P. et al. A Shared Task Involving Multi-Label Classification of Clinical Free Text. Stroudsburg: Association for Computational Linguistics. 2007
  • 31 Mujtaba G, Shuib L, Idris N. et al. Clinical text classification research trends: systematic literature review and open issues. Expert Syst Appl 2019; 116: 494-520
  • 32 Lenivtceva I, Slasten E, Kashina M, Kopanitsa G. Applicability of Machine Learning Methods to Multi-label Medical Text Classification. In: Lecture Notes in Computer Science Springer, Cham 2020: 509-522
  • 33 Xiao C, Choi E, Sun J. Opportunities and challenges in developing deep learning models using electronic health records data: a systematic review. J Am Med Inform Assoc 2018; 25 (10) 1419-1428
  • 34 Arbabi A, Adams DR, Fidler S, Brudno M. Identifying clinical terms in medical text using ontology-guided machine learning. JMIR Med Inform 2019; 7 (02) e12596
  • 35 Miñarro-Giménez JA, Martínez-Costa C, Karlsson D, Schulz S, Gøeg KR. Qualitative analysis of manual annotations of clinical text with SNOMED CT. PLoS One 2018; 13 (12) e0209547
  • 36 Alobaidi M, Malik KM, Sabra S. Linked open data-based framework for automatic biomedical ontology generation. BMC Bioinformatics 2018; 19 (01) 319
  • 37 Dridi A, Sassi S, Chbeir R, Faiz S. A Flexible Semantic Integration Framework for Fully-integrated EHR based on FHIR Standard. Paper presented at: Proceedings of the 12th International Conference on Agents and Artificial Intelligence; Valletta, Malta: SCITEPRESS - Science and Technology Publications; 2020
  • 38 Kilintzis V, Chouvarda I, Beredimas N, Natsiavas P, Maglaveras N. Supporting integrated care with a flexible data management framework built upon Linked Data, HL7 FHIR and ontologies. J Biomed Inform 2019; 94: 103179
  • 39 Zhu Y, Jin X, Li L. Automatic conversion of electronic medical record text for openEHR based on semantic analysis. Paper presented at: Proceedings - 10th International Conference on Information Technology in Medicine and Education, ITME 2019; Qingdao, China: Institute of Electrical and Electronics Engineers Inc.; 1 August 2019
  • 40 Ivanović M, Budimac Z. An overview of ontologies and data resources in medical domains. Expert Syst Appl 2014; 41 (11) 5158-5166
  • 41 Korobov M. Morphological Analyzer and Generator for Russian and Ukrainian Languages. Cham: Springer; 2015: 320-332
  • 42 Kashina M, Lenivtceva ID, Kopanitsa GD. Preprocessing of unstructured medical data: the impact of each preprocessing stage on classification. Procedia Computer Science 2020; 178: 284-290
  • 43 Lenivtceva I, Kashina M, Kopanitsa G. Category of allergy identification from free-text medical records for data interoperability. Stud Health Technol Inform 2020; 273: 170-175
  • 44 Buitinck L, Louppe G, Blondel M. et al. API design for machine learning software: experiences from the scikit-learn project. 2013. Accessed May 11, 2021 at: http://arxiv.org/abs/1309.0238
  • 45 GitHub—natasha/yargy: Rule-based facts extraction for Russian language. Accessed February 24, 2021 at: https://github.com/natasha/yargy
  • 46 GitHub—nazrulworld/fhir.resources: FHIR Resources. Accessed January 14 2021 at: https://www.hl7.org/fhir/resourcelist.html.2021;https://github.com/nazrulworld/fhir.resources
  • 47 Bolgva EV, Zvartau NE, Kovalchuk SV, Balakhontceva MA, Metsker OG. Improving electronic medical records with support of human computer interaction in medical information systems. Proc Comput Sci 2017; 121: 469-474
  • 48 Brown SH, Bauer BA, Wahner-Roedler DL, Elkin PL. Coverage of oncology drug indication concepts and compositional semantics by SNOMED-CT. AMIA. Annu Symp AMIA Symp Proc 2003; 2003: 115-119
  • 49 Montella D, Brown SH, Elkin PL. et al. Comparison of SNOMED CT versus Medcin terminology concept coverage for mild traumatic brain injury. AMIA Annu Symp Proc 2011; 2011: 969-978