An Easily Implemented Method for Abbreviation Expansion for the Medical Domain in Japanese Text

E. Y. Shinohara; E. Aramaki; T. Imai; Y. Miura; M. Tonoike; T. Ohkuma; H. Masuichi; K. Ohe

doi:10.3414/ME12-01-0040

Methods of Information in Medicine, Table of Contents

Methods Inf Med 2013; 52(01): 51-61
DOI: 10.3414/ME12-01-0040

Original Articles

Schattauer GmbH

An Easily Implemented Method for Abbreviation Expansion for the Medical Domain in Japanese Text

A Preliminary Study

Authors

E. Y. Shinohara

¹Department of Planning, Information and Management, The University of Tokyo Hospital, Tokyo, Japan
E. Aramaki

²Center for Knowledge Structuring, The University of Tokyo, Tokyo, Japan
T. Imai

³Center for Disease Biology and Integrative Medicine, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
Y. Miura

⁴Research & Technology Group, Fuji Xerox Co., Ltd, Kanagawa, Japan
M. Tonoike

⁴Research & Technology Group, Fuji Xerox Co., Ltd, Kanagawa, Japan
T. Ohkuma

⁴Research & Technology Group, Fuji Xerox Co., Ltd, Kanagawa, Japan
H. Masuichi

⁴Research & Technology Group, Fuji Xerox Co., Ltd, Kanagawa, Japan
K. Ohe

¹Department of Planning, Information and Management, The University of Tokyo Hospital, Tokyo, Japan

⁵Graduate School of Medicine and Faculty of Medicine, The University of Tokyo, Tokyo, Japan

Abstract

Summary

Background: One of the barriers for the effective use of computerized health-care related text is the ambiguity of abbreviations. To date, the task of disambiguating abbreviations has been treated as a classification task based on surrounding words. Application of this framework for languages that have no word boundaries requires pre-processing to segment a sentence into separate word sequences. While the segmentation processing is often a source of problem, it is unknown whether word information is really requisite for abbreviation expansion.

Objectives: The present study examined and compared abbreviation expansion methods with and without the incorporation of word information as a preliminary study.

Methods: We implemented two abbreviation expansion methods: 1) a morpheme-based method that relied on word information and therefore required pre-processing, and 2) a character-based method that relied on simple character information. We compared the expansion accuracies for these two methods using eight medical abbreviations. Experimental data were automatically built as a pseudo-annotated corpus using the Internet.

Results: As a result of the experiment, accuracies for the character-based method were from 0.890 to 0.942 while accuracies for the morpheme-based method were from 0.796 to 0.932. The character-based method significantly outperformed the morpheme-based method for three of the eight abbreviations (p < 0.05). For the remaining five abbreviations, no significant differences were found between the two methods.

Conclusions: Character information may be a good alternative in terms of simplicity to morphological information for abbreviation expansion in English medical abbreviations appeared in Japanese texts on the Internet.

Keywords

Natural language processing - machine learning - abbreviation - information storage and retrieval - algorithms

Full Text

References

References
1 Botsis T, Hartvigsen G, Chen F, Weng C. Secondary Use of EHR: Data Quality Issues and Informatics Opportunities. AMIA Summits Transl Sci Proc. 2010. 2010: 1-5.
2 Demner-Fushman D, Chapman WW, McDonald CJ. What can natural language processing do for clinical decision support?. J Biomed Inform 2009; 42 (05) 760-772.
3 Aramaki E, Miura Y, Tonoike M, Ohkuma T, Masuichi H, Waki K. et al Extraction of adverse drug effects from clinical records. Stud Health Technol Inform 2010; 160 Pt 1 739-743.
4 Stetson PD, Johnson SB, Scotch M, Hripcsak G. The sublanguage of cross-coverage. Proc AMIA Symp. 2002: 742-746.
5 Chase HS, Kaufman DR, Johnson SB, Mendonca EA. Voice capture of medical residents’ clinical information needs during an inpatient rotation. J Am Med Inform Assoc 2009; 16 (03) 387-394.
6 Minard AL, Ligozat AL, Ben Abacha A, Bernhard D, Cartoni B, Deléger L. et al Hybrid methods for improving information access in clinical documents: concept, assertion, and relation identification. J Am Med Inform Assoc 2011; 18 (05) 588-593.
7 Compliance data for the Joint Commissions’ 2004 and 2005 national patient safety goals. Jt Comm Perspect 2005; 25: 7-8.
8 Myers JS, Gojraty S, Yang W, Linsky A, Airan-Javia S, Polomano RC. A randomized-controlled trial of computerized alerts to reduce unapproved medication abbreviation use. J Am Med Inform Assoc 2011; 18 (01) 17-23.
9 Gaudan S, Kirsch H, Rebholz-Schuhmann D. Resolving abbreviations to their senses in Medline. Bioinformatics 2005; 21 (18) 3658-3664.
10 Yu H, Hripcsak G, Friedman C. Mapping abbreviations to full forms in biomedical articles. J Am Med Inform Assoc 2002; 9 (03) 262-272.
11 Xu H, Stetson PD, Friedman C. A study of abbreviations in clinical notes. AMIA Annu Symp Proc. 2007: 821-825.
12 Xu H, Stetson PD, Friedman C. Methods for building sense inventories of abbreviations in clinical notes. J Am Med Inform Assoc 2009; 16 (01) 103-108.
13 Schwartz AS, Hearst MA. A simple algorithm for identifying abbreviation definitions in biomedical text. Pac Symp Biocomput. 2003: 451-462.
14 Okazaki N, Ananiadou S. Building an abbreviation dictionary using a term recognition approach. Bioinformatics 2006; 22 (24) 3089-3095.
15 Wren JD, Garner HR. Heuristics for identification of acronym-definition patterns within text: towards an automated construction of comprehensive acronym-definition dictionaries. Methods Inf Med 2002; 41 (05) 426-434.
16 Ao H, Takagi T. ALICE: an algorithm to extract abbreviations from MEDLINE. J Am Med Inform Assoc 2005; 12 (05) 576-586.
17 Gale WA, Church KW, Yarowsky D. A Method for Disambiguating Word Senses in a Large Corpus. Computers and the Humanities 1992; 26: 415-439.
18 Brown PF, Pietra SAD, Pietra VJD, Mercer RL. Word-sense disambiguation using statistical methods. Proceedings of the 29th annual meeting on Association for Computational Linguistics. Berkeley, California: Association for Computational Linguistics; 1991.
19 Lesk M. Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. Proceedings of the 5th annual international conference on Systems documentation. Toronto, Ontario, Canada: ACM; 1986.
20 Guthrie JA, Guthrie L, Wilks Y, Aidinejad H. Subject-dependent co-occurrence and word sense disambiguation. Proceedings of the 29th annual meeting on Association for Computational Linguistics. Berkeley, California: Association for Computational Linguistics; 1991.
21 Yarowsky D. Word-sense disambiguation using statistical models of Roget’s categories trained on large corpora. Proceedings of the 14th conference on Computational linguistics – Volume 2. Nantes, France: Association for Computational Linguistics; 1992.
22 Dagan I, Itai A. Word sense disambiguation using a second language monolingual corpus. Comput Linguist 1994; 20 (04) 563-596.
23 Pakhomov S.. Semi-supervised Maximum Entropy based approach to acronym and abbreviation normalization in medical texts. Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Philadelphia, Pennsylvania: Association for Computational Linguistics; 2002.
24 Pakhomov S, Pedersen T, Chute CG. Abbreviation and acronym disambiguation in clinical discourse. AMIA Annu Symp Proc. 2005: 589-593.
25 Joshi M, Pakhomov S, Pedersen T, Chute CG. A comparative study of supervised learning as applied to acronym expansion in clinical reports. AMIA Annu Symp Proc. 2006: 399-403.
26 Liu H, Teller V, Friedman C. A multi-aspect comparison study of supervised word sense disambiguation. J Am Med Inform Assoc 2004; 11 (04) 320-331.
27 Savova GK, Coden AR, Sominsky IL, Johnson R, Ogren PV, de Groen PC. et al Word sense disambiguation across two domains: biomedical literature and clinical notes. J Biomed Inform 2008; 41 (06) 1088-1100.
28 Liu H, Johnson SB, Friedman C. Automatic resolution of ambiguous terms based on machine learning and conceptual relations in the UMLS. J Am Med Inform Assoc 2002; 9 (06) 621-636.
29 Yamada E, Aramaki E, Tonoike M, Ohkuma T, Miura Y, Sugihara D, Masuichi H, and Ohe K. Abbreviation Disambiguation in Japanese Medical Text. Jpn J Med Inf 2010; 30 Suppl 389-392. In Japanese
30 Schutze H. Word sense disambiguation with sublexical representations. Proc. Workshop on Statistically-Based NLP Techniques, AAAI Technical Report WS-92–01. 1992: 100-104.
31 Schutze H. Automatic word sense discrimination. Comput Linguist 1998; 24 (01) 97-123.
32 Yarowsky D. Unsupervised word sense disambiguation rivaling supervised methods. Proceedings of the 33rd annual meeting on Association for Computational Linguistics. Cambridge, Massachusetts: Association for Computational Linguistics; 1995.
33 Okazaki N, Ananiadou S, Tsujii J. Building a High Quality Sense Inventory for Improved Abbreviation Disambiguation. Bioinformatics 2010; 26 (09) 1246-1253.
34 Japan Collegium on Hospital Administration. 16000 Abbreviations in Medical record Receipt. Igakutushinsya Co. Ltd.; 2008. ISBN 978-4-87058-367-2. In Japanese.
35 Nishimoto N, Terae S, Uesugi M, Ogasawara K, Sakurai T. Development of a medical-text parsing algorithm based on character adjacent probability distribution for Japanese radiology reports. Methods Inf Med 2008; 47 (06) 513-521.
36 Baldwin T, Kim SN, Bond F, Fujita S, Martinez D, Tanaka T. MRD-based word sense disambiguation: further extending LESK. In: Proceedings of the 3rd International Joint Conference on Natural Language Processing. 2008: 775-780.
37 Fujii H, Croft WB. A comparison of indexing techniques for Japanese text retrieval. Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval. Pittsburgh, Pennsylvania, United States: ACM; 1993.
38 Baldwin T. Low-cost, high-performance translation retrieval: dumber is better. Proceedings of the 39th Annual Meeting on Association for Computational Linguistics. Toulouse, France: Association for Computational Linguistics; 2001.
39 MeCab: Yet Another Part-of-Speech and Morphological Analyzer. http://mecab.sourceforge.net/. Accessed Oct. 26, 2011
40 Unidic. http://www.tokuteicorpus.jp/dist/. Accessed Oct. 26, 2011