Methods Inf Med 2005; 44(01): 66-71
DOI: 10.1055/s-0038-1633924
Original Article
Schattauer GmbH

The EpiLink Record Linkage Software

Presentation and Results of Linkage Test on Cancer Registry Files
P. Contiero
1   Cancer Registry Division, Istituto Nazionale per lo Studio e la Cura dei Tumori, Milan, Italy
,
A. Tittarelli
1   Cancer Registry Division, Istituto Nazionale per lo Studio e la Cura dei Tumori, Milan, Italy
,
G. Tagliabue
1   Cancer Registry Division, Istituto Nazionale per lo Studio e la Cura dei Tumori, Milan, Italy
,
A. Maghini
1   Cancer Registry Division, Istituto Nazionale per lo Studio e la Cura dei Tumori, Milan, Italy
,
S. Fabiano
1   Cancer Registry Division, Istituto Nazionale per lo Studio e la Cura dei Tumori, Milan, Italy
,
P. Crosignani
1   Cancer Registry Division, Istituto Nazionale per lo Studio e la Cura dei Tumori, Milan, Italy
,
R. Tessandori
2   Azienda Sanitaria Locale della Provincia di Sondrio, Sondrio, Italy
› Author Affiliations
Further Information

Publication History

Publication Date:
06 February 2018 (online)

Summary

Objectives: Record linkage, the process of bringing together separately compiled but related records from different databases, is essential in many areas of biomedical research. We developed a record linkage program (EpiLink), which employs a simple mathematical approach. We describe the program and present results obtained testing it in a linkage task.

Methods: EpiLink was designed to be flexible with user-friendly settings to tailor linkage and operating parameters to specific linkage tasks, and employ deterministic, probabilistic or sequential deterministic-probabilistic linkage strategies as required. The user can also standardize data format, examine linkage results and accept or discard them. We used EpiLink to link a subset of cases of the Lombardy Cancer Registry (20,724 records) with the Social Security file of the population (1,021,846 records) covered by the registry. The linkage strategy was deterministic, followed by several probabilistic linkage steps.

Results: Manual inspection of the results showed that EpiLink achieved 98.8% specificity and 96.5% sensitivity.

Conclusions: EpiLink is a practical and accurate means of linking records from different databases that can be used by non-statisticians and is efficient in terms of human and financial resources.

 
  • References

  • 1 Leicester G, Goldacre M, Simmons H, Bettley G, Griffith M. Computerized linking of medical records: methodological guidelines. Journal of Epidemiology and Community Health 1993; 47: 316-9.
  • 2 Howe GR. Use of computerized record linkage in cohort studies. Epidemiol Rev 1998; 20: 112-21.
  • 3 Alsop JC, Langley JD. Determining first admissions in a hospital discharge file via record linkage. Meth Inform Med 1998; 37: 32-7.
  • 4 The West of Scotland Coronary Prevention Study Group. Computerized record linkage: compared with traditional patient follow-up methods in clinical trials and illustrated in a prospective epidemiological study. J Clin Epidemiol 1995; 48 (12) 1441-52.
  • 5 Hole DJ, Clarke JA, Hawthorne VM, Murdoch RM. Cohort follow-up using computer linkage with routinely collected data. J Chronic Dis 1981; 34: 291-7.
  • 6 Kato I, Toniolo P, Koenig KL, Kahn A, Schymura M, Zeleniuch-Jacquotte A. Comparison of active and cancer registry-based follow-up for breast cancer in a prospective cohort study. Am J Epidemiol 1999; 149: 372-8.
  • 7 Van den Brandt PA, Schouten LJ, Goldbohm RA, Dorant E, Hunen PMH. Development of a record linkage protocol for use in the Dutch cancer registry for epidemiological research. Int J Epidemiol 1990; 19: 553-8.
  • 8 Bernillon P, Lievre L, Pillonel J, Laporte A. Costagliola D and the clinical epidemiology group from centres d’information et de soins de l’immunodeficience humaine (CISIH). Record linkage between two anonymous databases for a capturerecapture estimation of underreporting of AIDS cases: France 1990–1993. Int J Epidemiol 2000; 29: 168-74.
  • 9 Camargo Jr KR, Coeli CM. Reclink: an application for database linkage implementing the probabilistic record linkage method. Cad Saude Publica 2000; 16 (02) 439-47.
  • 10 Roos LL, Wajda A. Record linkage strategies: Part 1. Estimating information and evaluating approaches. Meth Inform Med 1991; 30: 117-23.
  • 11 Wajda A, Roos LL, Layefsky M, Singleton JA. Record linkage strategies: Part 2. Portable software and deterministic software. Meth Inform Med 1991; 30: 210-4.
  • 12 Howe GR, Lindsay J. A generalized iterative record linkage computer system for use in medical follow-up studies. Comput Biomed Res 1981; 30: 327-40.
  • 13 Fellegi I, Sunter A. A theory for record linkage. JASA 1969; 64: 1183-210.
  • 14 MacLeod MCM, Bray CA, Kendrick SW, Cobbe SM. Enhancing the power of record linkage involving low quality personal identifiers: use of the best link principle and cause of death prior likelihoods. Comput Biomed Res 1998; 31: 257-70.
  • 15 Newcombe HB, Fair ME, Lalonde P. The use of names for linking personal records. JASA 1992; 87: 1193-208.
  • 16 Parkin DM, Whelan SL, Ferlay J, Raymond L, Young J. Cancer incidence in five continents, Vol VII. IARC Sci Pub 1997; 566-7.
  • 17 Parkin DM, Whelan SL, Ferlay J, Teppo L, Thomas DB. Cancer incidence in five continents, Vol VIII. IARC Sci Pub 2002; 386-7.
  • 18 Tao YC, Leibel RL. Identifying functional relationships among human genes by systematic analysis of biological literature. BMC Bioinformatics. 2002; 3: 01-16.
  • 19 Kim W, Aronson AR, Wilbur WJ. Automatic MeSH term assignment and quality assessment. Proc AMIA Symp 2001; 319-23.
  • 20 Lynch M. The similarity index and DNA fingerprinting. Mol Biol Evol 1990; 7 (05) 478-84.
  • 21 Carranza L, Feoli E, Ganis P. Analysis of vegetation structural diversity by Burnaby’s similarity index. Plant Ecol 1998; 138: 77-87.
  • 22 Jaro MA. Advances in record-linkage methodology as applied to matching the 1985 Census of Tampa, Florida. J Am Stat Ass 1989; 84: 414-20.
  • 23 Friedman C, Sideli R. Tolerating spelling errors during patient validation. Comput Biomed Res 1992; 25 (05) 486-509.
  • 24 Sideli RV, Friedman C. Validating patient names in an integrated clinical information system. Proc Annu Symp Comput Appl Med Care. 1991; 588-92.
  • 25 http://www.lispa.it/mercati/r_s_anagrafe.htm.
  • 26 Tittarelli A. et al. Epilink, sistema di linkage del registro tumori di Varese. VII Riunione dell’Associazione Italiana registri Tumori. April. 2003: 3-4.
  • 27 Contiero P, Evangelista A, Tittarelli A, Del Sette D, Krogh V, Berrino F, Tagliabue G. Benign neoplasms: a follow-up study in Italy1993-1998. IARC Sci Publ 2002; 156: 537-9.
  • 28 Evangelista A, Tagliabue G, Del Sette D, Tittarelli A, Contiero P, Krogh V, Crosignani P, Berrino F. Malignant tumour follow-up in Italy, 1993-1998. IARC Sci Publ 2002; 156: 535-6.
  • 29 Tagliabue G, Evangelista A, Tittarelli A, Del Sette D, Contiero P, Crosignani P, Berrino F, Micheli A. Follow-up of the ORDET cohort, Lombardy Cancer Registry, 1987-1997. IARC Sci Publ 2002; 156: 67-8.