Methods of Information in Medicine

nicht eingeloggt Login
- Benutzername oder E-Mail-Adresse:
  
  Passwort:
  
  Zugangsdaten vergessen? Neu registrieren OpenAthens/Shibboleth Login
Warenkorb

Jahre (Archiv)

2012

Ausgaben

RSS-Feed abonnieren

Bitte kopieren Sie die angezeigte URL und fügen sie dann in Ihren RSS-Reader ein.

https://www.thieme-connect.de/rss/thieme/de/10.1055-s-00035037.xml

Teilen / Bookmarken

Facebook X Linkedin Weibo

PDF herunterladen

Methods Inf Med 2012; 51(03): 229-241
DOI: 10.3414/ME11-01-0048

Original Articles

Schattauer GmbH

A Database De-identification Framework to Enable Direct Queries on Medical Data for Secondary Use

B. S. Erdal

¹Information Warehouse, The Ohio State University Medical Center, Columbus, Ohio, USA

²Electrical and Computer Engineering, The Ohio State University, Columbus, Ohio, USA

,

J. Liu

¹Information Warehouse, The Ohio State University Medical Center, Columbus, Ohio, USA

,

J. Ding

¹Information Warehouse, The Ohio State University Medical Center, Columbus, Ohio, USA

,

J. Chen

¹Information Warehouse, The Ohio State University Medical Center, Columbus, Ohio, USA

,

C. B. Marsh

³Internal Medicine, The Ohio State University Medical Center, Columbus, Ohio, USA

,

J. Kamal

¹Information Warehouse, The Ohio State University Medical Center, Columbus, Ohio, USA

,

B. D. Clymer

²Electrical and Computer Engineering, The Ohio State University, Columbus, Ohio, USA

› Institutsangaben

Weitere Informationen

Publikationsverlauf

received:31. Mai 2011

accepted:08. Februar 2011

Publikationsdatum:
20. Januar 2018 (online)

Abstract
Volltext
Referenzen

Lizenzen und Reprints

Summary

Objective: To qualify the use of patient clinical records as non-human-subject for research purpose, electronic medical record data must be de-identified so there is minimum risk to protected health information exposure. This study demonstrated a robust framework for structured data de-identification that can be applied to any relational data source that needs to be de-identified.

Methods: Using a real world clinical data warehouse, a pilot implementation of limited subject areas were used to demonstrate and evaluate this new de-identification process. Query results and performances are compared between source and target system to validate data accuracy and usability.

Results: The combination of hashing, pseudonyms, and session dependent randomizer provides a rigorous de-identification framework to guard against 1) source identifier exposure; 2) internal data analyst manually linking to source identifiers; and 3) identifier cross-link among different researchers or multiple query sessions by the same researcher. In addition, a query rejection option is provided to refuse queries resulting in less than preset numbers of subjects and total records to prevent users from accidental subject identification due to low volume of data.

This framework does not prevent subject re-identification based on prior knowledge and sequence of events. Also, it does not deal with medical free text de-identification, although text de-identification using natural language processing can be included due its modular design.

Conclusion: We demonstrated a framework resulting in HIPAA Compliant databases that can be directly queried by researchers. This technique can be augmented to facilitate inter-institutional research data sharing through existing middleware such as caGrid.

Keywords

De-identification - data warehouse

References
1 Powell J, Buchan I. Electronic health records should support clinical research. J Med Internet Res 2005; 14 7 (01) e4

Crossref PubMed Google Scholar
2 Weiner M, Embi P. Toward reuse of clinical data for research and quality improvement: the end of the beginning?. Ann Intern Med 2009; 151: 359-360.

Crossref PubMed Google Scholar
3 U.S. Dept. of Health and Human Services Standards for privacy of individually identifiable health information, final rule. Federal Registry. 2002. 45 CRF, Parts 160 and 164

Google Scholar
4 Federal Policy for the Protection of Human Subjects (the “Common Rule”), 45 CFR part 46. (June 18, 1991) Fed Regist. 1991; 56: 28003

PubMed Google Scholar
5 Kamal J, Silvey SA, Buskirk J, Dhaval R, Erdal S, Ding J, Ostrander M, Borlawsky T, Smaltz DH, Payne PR. Innovative applications of an enterprise-wide information warehouse. AMIA Annu Symp Proc 2008; 1134

PubMed Google Scholar
6 Silvey SA, Schulte J, Smaltz DH, Kamal J. Honest broker protocol streamlines research access to data while safeguarding patient privacy. AMIA Annu Symp Proc 2008: 1133

Google Scholar
7 Liu J, Erdal S, Silvey SA, Ding J, Marsh CB, Kamal J. Toward a Fully De-identified Biomedical Information Warehouse. AMIA Annu Symp Proc 2009: 370-374.

Google Scholar
8 Boussi Rahmouni H, Solomonides T, Casassa Mont M, Shiu S, Rahmouni M. A Model-driven Privacy Compliance Decision Support for Medical Data Sharing in Europe. Methods Inf Med 2011; 50 (04) 326-336. Epub 2011 Jul 26

Artikel in Thieme Connect PubMed Google Scholar
9 Holzer K, Gall W. Utilizing IHE-based Electronic Health Record Systems for Secondary Use. Methods Inf Med 2011; 50 (04) 319-325. Epub 2011 Mar 21

Artikel in Thieme Connect PubMed Google Scholar
10 Safran C, Bloomrosen M, Hammond WE, Labkoff S, Markel-Fox S, Tang PC, Detmer DE. Expert Panel. Toward a national framework for the secondary use of health data: an American Medical Informatics Association White Paper. J Am Med Inform Assoc 2007; 14 (01) 1-9. Epub 2006 Oct 31

Crossref PubMed Google Scholar
11 Wylie JE, Mineau GP. Biomedical databases: protecting privacy and promoting research. Trends Biotechnol 2003; 21 (03) 113-116.

Crossref PubMed Google Scholar
12 Loukides G, Gkoulalas-Divanis A, Malin B. Anonymization of electronic medical records for validating genome-wide association studies. Proc Natl Acad Sci 2010; 107 (17) 7898-7903.

Crossref PubMed Google Scholar
13 Claerhout B, DeMoor GJ. Privacy protection for clinical and genomic data. The use of privacy-enhancing techniques in medicine. Int J Med Inform 2005; 74 (2-4) 257-265.

Crossref PubMed Google Scholar
14 Cooper T, Collman J. Managing Information Security and Privacy in Healthcare Data Mining. Medical Informatics, Integrated Series in Information Systems. Springer US 2005; 8: 95-137.

PubMed Google Scholar
15 de Moor GJ, Claerhout B, de Meyer F. Privacy enhancing technologies: the key to secure communication and management of clinical and genomic data. Methods Inf Med 2003; 42: 148-153.

Artikel in Thieme Connect PubMed Google Scholar
16 El Emam KE, Jabbouri S, Sams S, Drouet Y, Power M. Evaluating Common De-Identification Heuristics for Personal Health Information. J Med Internet Res 2006; 8 (04) e28 [old-15]

Crossref PubMed Google Scholar
17 Kohane IS, Dong H, Szolovits P. Health information identification and de-identification toolkit. Proc AMIA Symp 1998: 356-360.

Google Scholar
18 Arvind N, Shmatikov V. Privacy and Security: Myths and Fallacies of “Personally Identifiable Information”. Communications of the ACM 53 6 2010 24-26.

PubMed Google Scholar
19 Cavoukian A, El Emam K. Dispelling the Myths Surrounding De-identification: Anonymization Remains a Strong Tool for Protecting Privacy. Discussion Papers, Information and Privacy Commissioner of Ontario. June 2011

Google Scholar
20 El Emam K, Dankar FK, Vaillancourt R, Roffey T, Lysyk M. Evaluating the Risk of Re-identificationof Patients from Hospital Prescription Records. The Canadian Journal of Hospital Pharmacy 2009. 62 (4)

Google Scholar
21 Roden DM, Pulley JM, Basford MA, Bernard GR, Clayton EW, Balser JR, Masys DR. Development of a Large-Scale De-Identified DNA Biobank to Enable Personalized Medicine. Clinical Pharmacology & Therapeutics 2008; 84 (03) 362-369. [old-8]

Crossref PubMed Google Scholar
22 Lyman JA, Scully K, Harrison Jr JH. The Development of Health Care Data Warehouses to Support Data Mining. Clinics in Laboratory Medicine 2008; 28 (01) 55-71.

Crossref PubMed Google Scholar
23 Berman JJ. Concept-Match Medical Data Scrubbing. How pathology text can be used in research. Archives of Pathology and Laboratory Medicine 2003; 127 (06) 680-686.

PubMed Google Scholar
24 Gardner J, Xiong L. HIDE: An Integrated System for Health Information DE-identification. 21st IEEE International Symposium on Computer-Based Medical Systems (CBMS). June 2008: 254-259.

Google Scholar
25 Gupta D, Saul M, Gilbertson J. Evaluation of a De-identification (DE-ID) Software Engine to Share Pathology Reports and Clinical Documents for Research. Am J Clin Pathol 2004; 121: 176-186.

Crossref PubMed Google Scholar
26 El Emam K, Dankar FK. Protecting Privacy Using k-Anonymity. J Am Med Inform Assoc 2008; 15 (05) 627-637.

Crossref PubMed Google Scholar
27 Sweeney L. k-anonymity: a model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 2002. 10 (5)

Google Scholar
28 Meystre SM, Friedlin FJ, South BR, Shen S, Samore MH. Automatic de-identification of textual documents in the electronic health record: a review of recent research. BMC Med Res Methodol 2010; 10: 70

Crossref PubMed Google Scholar
29 Pulley J, Clayton E, Bernard GR, Roden DM, Masys DR. Principles of human subjects protections applied in an opt-out, de-identified biobank. Clin Transl Sci 2010; 3 (01) 42-48.

Crossref PubMed Google Scholar
30 Kantarcioglu M, Jiang W, Liu Y, Malin B. A cryptographic approach to securely share and query genomic sequences. IEEE Trans Inf Technol Biomed 2008; 12 (05) 606-617.

Crossref PubMed Google Scholar
31 Hacigumus H, Iyer B, Li C, Mehrotra S. Executing SQL over Encrypted Data in the Database-Service-Provider Model. Proceedings of the 2002 ACM SIGMOD international conference on Management of data. 2002

Google Scholar
32 Kantarcioglu M, Jiang W, Malin B. A Privacy-Preserving Framework for Integrating Person-Specific Databases, Privacy in Statistical Databases, 2008, LNCS 5262 298-314.
33 Sweeney L. Guaranteeing anonymity when sharing medical data, the Datafly System. Proc AMIA Annu Fall Symp 1997: 51-55.

Google Scholar
34 Sweeney L. Achieving k-anonymity privacy protection using generalization and suppression. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 2002. 10 (5)

Google Scholar
35 Dwork C. Differential privacy: a survey of results. TAMC’08 Proceedings of the 5th international conference on theory and applications of models of computation. Berlin, Heidelberg: Springer-Verlag; 2008

Google Scholar
36 Neamatullah I, Douglass MM, Lehman LH, Reisner A, Villarroel M, Long WJ, Szolovits P, Moody GB, Mark RG, Clifford GD. Automated de-identification of free-text medical records. BMC Med Inform Decis Mak 2008; 8: 32

Crossref PubMed Google Scholar
37 Wellner B, Huyck M, Mardis S, Aberdeen J, Morgan A, Peshkin L, Yeh A, Hitzeman J, Hirschman L. Rapidly retargetable approaches to de-identification in medical records. J Am Med Inform Assoc 2007; 14 (05) 564-573.

Crossref PubMed Google Scholar
38 Uzuner O, Sibanda TC, Luo Y, Szolovits P. A De-identifier for Medical Discharge Summaries. Artif Intell Med 2008; 42 (01) 13-35.

Crossref PubMed Google Scholar
39 Lafky D. The Safe Harbor Method of De-Identification: An Empirical Test. Department of Health and Human Services Presentation, October 8, 2009, available at http://www.ehcca.com/presentations/HIPAAWest4/lafky_2.pdf

PubMed Google Scholar
40 Goldwasser S, Micali S, Rackoff C. The knowledge complexity of interactive proof systems. SIAM Journal on Computing 1989; 18 (01) 186-208.

Crossref PubMed Google Scholar
41 Berman J. J. Health and Human Services Workshop on the HIPAA Privacy Rule’s De-identifica-tion Standard March 8-9, 2010, available at http://www.hhshipaaprivacy.com/

PubMed
42 Office for Human Research Protections (OHRP), U.S. Department of Health and Human Services Guidance on research involving coded private information or biological specimens. October 2008

Google Scholar
43 Boyd AD, Saxman PR, Hunscher DA, Smith KA, Morris TD, Kaston M, Bayoff F, Rogers B, Hayes P, Rajeev N, Kline-Rogers E, Eagle K, Clauw D, Greden JF, Green LA, Athey BD. The University of Michigan Honest Broker: a Web-based service for clinical and translational research and practice. J Am Med Inform Assoc 2009; 16 (06) 784-791.

Crossref PubMed Google Scholar
44 Dhir R, Patel AA, Winters S, Bisceglia M, Swanson D, Aamodt R, Becich MJ. A multidisciplinary approach to honest broker services for tissue banks and clinical data: a pragmatic and practical model. Cancer 2008; 113 (07) 1705-1715.

Crossref PubMed Google Scholar
45 Java 2 Platform Standard Edition Version 1.4.2. Date accessed: April 2011 http://download.oracle.com/javase/1.4.2/docs/api/java/security/SecureRandom.html

PubMed
46 National Institute of Standards and Technology (NIST), Computer Security Division, Computer Security Resource Center Date accessed: April 2011 http://csrc.nist.gov/groups/STM/index.html

PubMed
47 Bruce Schneier. SHA-1 broken. February 15,2005 Available at http://www.schneier.com/blog/archives/2005/02/sha1_broken.html

PubMed
48 Oracle Corporation Oracle Database Data Warehousing Guide, 11g Release 1 (11.1), Chapter 6, Indexes, September 2011 http://download.oracle.com/docs/cd/B28359_01/server.111/b28313/indexes.htm

PubMed
49 Oracle Corporation Fine Grained Auditing. Date accessed: July 2010 http://www.oracle.com/technetwork/database/security/index-083815.html

PubMed
50 International Business Machines Corporation. IBM InfoSphere DataStage Date accessed: September 2011 http://www01.ibm.com/software/data/infosphere/datastage/requirements.html#IBM%20InfoSphere%20DataStage85

PubMed
51 Oracle Corporation. Oracle Warehouse Builder Date accessed: September 2011 http://www.oracle.com/technetwork/developer-tools/warehouse/overview/introduction/index.html

PubMed
52 Kahmann S, Erdal BS, Liu J, Kamal J, Clymer BD. Generalizable Session Dependent De-identification Methods. AMIA 2011 Annual Symposium, October 2011 [accepted]

Google Scholar
53 Erdal BS, Liu J, Key CB, Kamal J, Clymer BD. Proxy PACS Servers for Image Delivery through an Information Warehouse. AMIA 2011 Annual Symposium, October 2011 [accepted]

Google Scholar
54 National Institute of Standards and Technology (NIST), Computer Security Division, Computer Security Resource Center Random Number Generation. Date accessed: September 2011 http://csrc.nist.gov/groups/ST/toolkit/rng/index.html

PubMed Google Scholar
55 National Institute of Standards and Technology (NIST), Computer Security Division, Computer Security Resource Center. A Statistical Test Suite for the Validation of Random Number Generators and Pseudo Random Number Generators for Cryptographic Applications Date accessed: September 2011. Available at: http://csrc.nist.gov/groups/ST/toolkit/rng/documentation_software

PubMed Google Scholar
56 Maurer UM. A Universal Statistical Test for Random Bit Generators. Journal of Cryptology 1992; 5 (02) 89-105.

PubMed Google Scholar
57 Chung KL. Elementary Probability Theory with Stochastic Processes. New York: Springer Verlag; 1979: 210-217.

Google Scholar
58 Malin B. Secure construction of k-unlinkable patient records from distributed providers. Artificial Intelligence in Medicine 2010; 48 (01) 29-41.

Crossref PubMed Google Scholar
59 NLM. Unified Medical Language System Date accessed: April 2011 http://www.nlm.nih.gov/research/umls/knowledge_sources/metathesaurus/index.html

PubMed
60 Bodenreider O. Using UMLS semantics for classification purposes. AMIA Annu Symp Proc 2000: 86-90.

Google Scholar
61 Campbell KE, Oliver DE, Spackman KA, Shortliffe EH. Representing thoughts, words, and things in the UMLS. J Am Med Inform Assoc 1998; 5 (05) 421-431.

Crossref PubMed Google Scholar
62 Meystre SM, Savova GK, Kipper-Schuler KC, Hurdle JF. Extracting information from textual documents in the electronic health record: a review of recent research. Yearb Med Inform 2008: 128-144.

Google Scholar
63 Uzuner O, Goldstein I, Luo Y, Kohane I. Identifying patient smoking status from medical discharge records. J Am Med Inform Assoc 2008; 15 (01) 14-24.

Crossref PubMed Google Scholar
64 Uzuner O. Recognizing Obesity and Co-morbidities in Sparse Data. J Am Med Inform Assoc 2009; 16 (04) 561-570.

Crossref PubMed Google Scholar
65 Suzuki T, Yokoi H, Fujita S, Takabayashi K. Automatic DPC code selection from electronic medical records: text mining trial of discharge summary. Methods Inf Med 2008; 47 (06) 541-548.

Artikel in Thieme Connect PubMed Google Scholar
66 Murff HJ, FitzHenry F, Matheny ME, Gentry N, Kotter KL, Crimin K, Dittus RS, Rosen AK, Elkin PL, Brown SH, Speroff T. Automated Identification of Postoperative Complications Within an Electronic Medical Record Using Natural Language Processing. JAMA 2011; 306 (08) 848-855.

Crossref PubMed Google Scholar
67 Uzuner O, Luo Y, Szolovits P. Evaluating the state-of-the-art in automatic de-identification. J Am Med Inform Assoc 2007; 14 (05) 550-563.

Crossref PubMed Google Scholar
68 Oracle Corporation. Oracle Business Intelligence Enterprise Edition Plus Date accessed: April 2011 http://www.oracle.com/technetwork/middleware/bi-enterprise-edition/overview/index.html . [old-29]

PubMed
69 Ding J, Liu J, Kamal J. uQuery HIPAA-Compliant Web Query Tool for Retrieving Patient Clinical Data from a Data Warehouse. AMIA Annu Symp Proc 2009. p 821 [old-30]

Google Scholar
70 Murphy SN, Mendis ME, Berkowitz DA, Kohane I, Chueh H. Integration of clinical and genetic data in the i2b2 architecture. AMIA Annu Symp Proc 2006: 1040 [old-31]

Google Scholar
71 Murphy SN, Mendis M, Hackett K, Kuttan R, Pan W, Phillips L. et al. Architecture of the open-source clinical research chart from Informatics for Integrating Biology and the Bedside. AMIA Annu Symp Proc 2007: 548-552.

Google Scholar
72 Saltz J, Oster S, Hastings S, Langella S, Kurc T, Sanchez W, Kher M, Manisundaram A, Shanbhag K, Covitz P. caGrid: Design and Implementation ofthe Core Architecture of the Cancer Biomedical Informatics Grid. Bioinformatics 2006; 22 (15) 1910-1916.

Crossref PubMed Google Scholar