Clin Colon Rectal Surg 2019; 32(01): 061-068
DOI: 10.1055/s-0038-1673355
Review Article
Thieme Medical Publishers 333 Seventh Avenue, New York, NY 10001, USA.

Surveillance, Epidemiology, and End Results (SEER) and SEER-Medicare Databases: Use in Clinical Research for Improving Colorectal Cancer Outcomes

Meghan C. Daly
1   Department of Surgery, University of Cincinnati School of Medicine, Cincinnati, Ohio
2   Department of Surgery, Cincinnati Research in Outcomes and Safety in Surgery (CROSS), Cincinnati, Ohio
,
Ian M. Paquette
1   Department of Surgery, University of Cincinnati School of Medicine, Cincinnati, Ohio
2   Department of Surgery, Cincinnati Research in Outcomes and Safety in Surgery (CROSS), Cincinnati, Ohio
› Author Affiliations

Source of Funding None.
Further Information

Address for correspondence

Ian M. Paquette, MD
Division of Colon and Rectal Surgery, Department of Surgery
University of Cincinnati College of Medicine
2123 Auburn Avenue # 524, Cincinnati, OH 45219

Publication History

Publication Date:
08 January 2019 (online)

 

Abstract

The Surveillance, Epidemiology, and End Results (SEER) program is a clinical database, funded by the National Cancer Institute (NCI), which was created to collect cancer incidence, prevalence, and survival data from U.S. cancer registries. By capturing approximately 30% of the U.S. population, it serves as a powerful resource for researchers focused on understanding the natural history of colorectal cancer and improvement in patient care. The linked SEER-Medicare database is a robust database allowing investigators to perform studies focusing on health disparities, quality of care, and cost of treatment in oncologic disease. Since its infancy in the early 1970s, the database has been utilized for thousands of studies resulting in novel publications that have shaped our management of colorectal cancer among other malignancies.


The Surveillance, Epidemiology, and End Results (SEER) program, a clinical database funded by the National Cancer Institute (NCI), collects data on cancer incidence and survival from U.S. cancer registries. Case ascertainment and data collection originally began on January 1, 1973, as a sequel from two earlier NCI programs: the End Results Program and the Third National Cancer Survey.[1] [2] During this initial stage, data was collected from the states of Connecticut, Iowa, New Mexico, Utah, and Hawaii and the metropolitan areas of Detroit and San Francisco-Oakland. Between 1974 and 1975, the metropolitan area of Atlanta and the 13-county Seattle-Puget Sound area were added. These geographic areas are considered the “original nine” SEER registries. In 1978, ten predominantly African American rural counties in Georgia were included. Subsequently in 1980, American Indians residing in Arizona were added to the database. Three additional geographic areas participated in the SEER program prior to 1990: New Orleans, Louisiana (1974–1977, rejoined in 2001); New Jersey (1979–1989, rejoined in 2001); and Puerto Rico (1973–1989).[3]

In 1992, the SEER Program was expanded to increase coverage of minority populations, particularly Hispanics. Los Angeles County and four counties in the San Jose-Monterey area south of San Francisco were added. In 2001, the SEER Program expanded coverage to include Kentucky and the remaining counties in California. Additionally, at this time, New Jersey and Louisiana rejoined the registry. In 2010, the SEER program expanded coverage to include the entire state of Georgia.

Currently, SEER collects and publishes cancer incidence and survival data from 17 population-based cancer registries covering approximately 30% of the U.S. population.[4] The database is broadly representative of the U.S. population. However, due to the limited geographic areas of the registries, there is a higher relative proportion of certain populations included in the SEER registry as compared with Caucasian and African Americans as shown in [Table 1].[5] Furthermore, the SEER population tends to have a higher proportion of foreign-born individuals (17.9%) as compared with the general U.S. population (12.8%).[3]

Table 1

Proportions in the overall U.S. population included in SEER Registry

SEER (%)

Native Hawaiian/Pacific Islander

69.8

Asian

53.3

American Indian/Alaska Native

42.2

Hispanic

40.4

Caucasian

23.4

African American

22.7

Abbreviation: SEER, Surveillance, Epidemiology, and End Results.


Data Collection

SEER routinely collects and publishes data on patient-specific and tumor-specific characteristics. Information collected for each case includes patient demographics, primary tumor site, tumor morphology, stage at diagnosis, treatment course, follow-up for vital status, and cause of death. A complete list of variables is described in [Table 2]. The SEER registry contains information on 9 million cancer cases with over 470,000 new cases added to the database each year. SEER uses the Population Estimates Program data of the United States Census Bureau and U.S. mortality data, collected and maintained by the National Center for Health Statistics, for population counts.[3]

Table 2

Review of variables included in SEER Database

Data

Variable

Description

Registry

Registry ID

Unique identifier

Type of reporting source

Where information came from, including autopsy and death certificate

Location

State and county at diagnosis

Patient

Patient ID number

Unique identifier

Demographics

Age, sex, race/ethnicity, Hispanic origin

Year of birth

Marital status

At the time of diagnosis

Tumor

Primary site

ICD-O-3 topography code

Date of diagnosis

Month, year

Tumor markers

Specific to malignancy

Sequence

Specifies if first malignancy and sequence number of reported malignancy

Biologic characteristics

Histology, behavior, grade, laterality, size

Extent

Extent of disease at the time of diagnosis, lymph node involvement

Stage

AJCC T, N, M staging and AJCC stage group

Treatment

Surgery

Surgical procedure/site, extent of lymph node dissection

Lymph nodes

Number of regional nodes examined, number of positive regional nodes

Radiation therapy

Administration, sequence with surgery, radiation to CNS (yes/no)

Outcomes

Mortality

Date of death, cause of death

Source: Adapted from Dictionary of SEER*Stat Variables 2015. For a more complete review of variables in SEER, please see http://seer.cancer.gov/data/seerstat/nov2015/.


Rigorous quality control measures are in place to ensure integrity of the dataset.[6] Registries are routinely audited for data accuracy and a Data Quality Profile is produced for each SEER registry. The program performs regular education and training sessions in coordination with the National Cancer Registrars Association annual meeting where registrars are tested through Web-based reliability studies.[5] Additionally, audits of high-volume facilities are performed to ensure that cases are recorded in a complete and timely fashion. NCI staff work closely with the North American Association of Central Cancer Registries (NAACCR) to monitor all state registries to ensure accurate recording of data and compatibility. The database is updated annually and available for download after completion of a data user agreement free of charge: https://seer.cancer.gov/data/access.html. Given the enormous, complex structure of the database, SEER provides resources to assist investigators including SEER*Stat, a free statistical software program to ease analysis of SEER data which includes survival analysis capability.[7] Patient subgroups can then be exported for use in the usual biostatistical software packages.

Increased detail has been recorded in the SEER database in recent years. As an example, since its inception in 1973, stage at diagnosis has been classified into five categories: in situ, localized, regional, distant, or unstaged. Since 2004, TNM staging data have been recorded based on American Joint Committee on Cancer (AJCC) staging in addition to Collaborative Staging Codes, providing more granular clinical and pathologic information.[5] Furthermore, details regarding the presence of extracapsular extension, classification of “fixed” nodes in head and neck cancers, and estrogen/progesterone/HER2 receptor status for breast cancer have been reported since 2004.


SEER-Medicare

Medicare provides federally funded health insurance for approximately 97% of individuals aged 65 years or older in the United States.[8] It also provides health insurance to persons younger than 65 years who have end-stage renal disease or medical disability. All beneficiaries are entitled to Part A coverage, which includes hospital inpatient care. Upward of 90% of participants pay to subscribe to Part B coverage, which covers physician and outpatient services.

In a collaborative effort of the NCI, SEER registries, and the Centers for Medicare and Medicaid Services (CMS), the SEER database has been linked to claims-based measures of comorbidities, screening and evaluation tests, and detailed treatment and outcomes data.[9] Beginning in 1991, a matching algorithm was employed to link cancer data on individual patients available from the SEER registries to a master Medicare enrollment is done via patient's name, Social Security number, sex, and date of birth. The database has been subsequently updated in 1995, 1999, 2003, 2006, 2009, 2012, and most recently in 2014. For each of the linkages, 93% of individuals aged 65 years and older in the SEER files were matched to a Medicare enrollment file.[10] The SEER-Medicare linkage is slated to be updated biennially.

The SEER data included as part of the SEER-Medicare files are in a customized file known as the Patient Entitlement and Diagnosis Summary File: https://healthcaredelivery.cancer.gov/seermedicare/. Of note, there is a separate data use agreement and significant per-cancer cost associated with obtaining these data files. This file contains a record for individuals in the SEER database who have been matched with Medicare enrollment records. Basic SEER diagnostic information is available for up to 10 diagnosed cancer cases for each individual. SEER data including cancer incidence, location, stage, initial treatment, and vital status are linked with Medicare claims for hospital stays, physician and laboratory services, hospital outpatient claims, and home health/hospice bills. Census tract and zip code data are available and can be used to draw conclusions regarding patient socioeconomic data.[4]

To allow for comparison studies with a control group, data are provided for two cohorts: patients with cancer and a random 5% sampling of Medicare beneficiaries residing in SEER areas who do not have cancer.[6] The linked SEER-Medicare database allows for a longitudinal perspective in the study of cancer care.


Strengths of SEER and SEER-Medicare

In capturing approximately 30% of the U.S. population, the SEER database is a very powerful research tool. The database is enriched with diverse and immigrant populations. The large sample size and uniform, high-quality data collected allow for accurate estimates of national cancer incidence and survival rates. This vast patient population also allows for specific subset analyses to be performed, based on patient characteristics, tumor stage, and treatment strategies. The SEER program includes long-term follow-up, providing researchers the ability to analyze temporal trends. Due to the population-based nature of the registry, all cancers occurring within a defined geographic region are required to be collected. This serves to minimize potential biases that may be encountered in facility-based databases where patient referral patterns can confound analysis, as patients with more severe disease are commonly referred to highly specialized centers. The quality control program conducted annually by the NCI is a critical component to ensuring quality and completeness of the database.

The SEER-Medicare database provides an opportunity to conduct case–control studies utilizing population-based sampling.[8] Employing this linked database allows one to obtain a near-complete census of all cancers arising in individuals older than 65 years. Furthermore, the SEER-Medicare database offers researchers a means of studying the following: cancer control practices and their effect on the cancer burden; patterns of access to cancer care; impact of comorbidities, race, geographic, socioeconomic, and provider-related factors on access to care; diagnosis and treatment outcomes (i.e., cause-specific survival analysis); and cost-effectiveness of cancer care.[11] [12] [13] The database includes information on multiple disease conditions allowing researchers to adjust for other health conditions and prior care (i.e., multivariate and propensity-score analysis). Inclusion of a control group that does not have cancer is instrumental for performing comparison studies.


Weaknesses of SEER and SEER-Medicare

While the information provided in the SEER database is valuable to the study of oncologic disease, there are several shortcomings. SEER provides detailed information about cancer stage and treatment at the time of diagnosis; however, details regarding completion of therapy and long-term outcomes other than death are not available. The database lacks information regarding recurrence or disease progression as well as chemotherapy use, thus prohibiting researchers from making inferences about these key factors and their impact on oncological outcomes. Furthermore, the SEER database population is predominantly Medicare/Medicaid based and tends to have a bias toward older subjects and among older records.

Limitations of the SEER-Medicare database surround the lack of data on cancer patients who do not have Medicare (i.e., those individuals younger than 65 years, privately insured, Medicaid, and the uninsured). It is important to note that Medicare data do not include the following: claims for HMO (Health Maintainence Organization) enrollees, care provided in outside settings (Veterans Administration), care for individuals with Medicare as the secondary payer, out-of-pocket expenditures, and coverage provided by Medigap policies.[14]

Although cancer cases and controls are thought to be generalizable to the entire U.S. elderly population, there are two limitations. First, Medicare eligibility depends on individuals having Social Security benefits or being married to someone with benefits, which depends on documentation of work history. The proportion of elderly individuals who do not qualify is small; however, it is likely that the underprivileged and recent immigrants are overrepresented in this excluded population. Second, SEER areas were purposely selected to include a relatively large proportion of racial and ethnic minorities.[6]

Since Medicare coverage is predominantly restricted to elderly people, the SEER-Medicare data cannot be utilized to evaluate risk factors that arise earlier in life (e.g., Crohn's disease).[8] Moreover, studies of the elderly in this linked database are likely not generalizable to younger populations. Limitations of using of SEER-Medicare registry to conduct case–control studies surround the completeness and accuracy of Medicare claims to evaluate certain risk factors, such as exposure. Only conditions diagnosed and documented by a healthcare provider or related procedure are included in the database. For example, an asymptomatic or undiagnosed medical condition may impact the sensitivity of an analysis.


Use of SEER and SEER-Medicare in the Study of Colorectal Cancer

Since 1974, thousands of scientific publications have been published using the SEER and SEER-Medicare databases, leaving no reservations about the immense impact this registry has on oncologic research. The Annual Report to the Nation on the status of cancer and Racial and Ethnic Patterns of Cancer in the United States are two vital statistical reviews produced by SEER.[3] With easy access to the database and accommodating statistical software, an increasing number of SEER-based publications have been produced over the past decade.

For the study of colorectal cancer specifically, the SEER database distinguishes anatomic subsites into proximal colon, distal colon, and rectum as categorized according to the International Classification of Diseases for Oncology, third edition (ICD-0–3) topography codes (anal cancers are also included, but are beyond the scope of this article). Much of the initial research in colorectal cancer was epidemiologic in nature.

Screening

The SEER database has been utilized to evaluate the impact of screening in colorectal cancer. Researchers first employed the database for this purpose in 1990, when they examined the public health impact of mass media coverage of President Reagan's colon cancer episode that aired in 1985.[15] They found an increase in incidence of early-stage colorectal cancers in the months following the President's diagnosis, suggesting a potential screening effect. In 1994, researchers used incidence and survival data from SEER to examine reasons for the significant decline in colorectal cancer mortality rates for both Caucasian males and females that began in 1985.[16] Results of this study demonstrated the important role of screening to detect early-stage cancer for reducing mortality.


Racial Disparities

A more recent study used the SEER database to calculate the age-specific incidence in colorectal cancer in African Americans as compared with Caucasians while controlling for differences in socioeconomic status to evaluate the disagreement regarding the age at which to initiate screening in African Americans.[17] Based on the results of this study, a disparity was seen in the age-specific incidence of colorectal cancer in African Americans as compared with Caucasians beginning at 45 years of age. Findings from this study may help policymakers (e.g., the U.S. Preventative Services Task Force) decide how to focus their efforts on improving screening rates for colorectal cancer and which specific populations should be targeted.

The SEER database has been used to investigate racial disparities in colorectal cancer for several decades. A study by Robbins et al used SEER data from 1985 to 2008 to evaluate stage-specific colorectal cancer mortality rates by race.[18] Several subsequent studies have used the SEER registry to further evaluate this disparity.[19] [20] [21]


Young-Onset Colorectal Cancer

Survival analyses of colorectal cancer patients in the SEER registry demonstrate that young patients with colorectal cancer have a higher cancer-specific survival rate following resection as shown in [Fig. 1], although they present with more unfavorable tumor biology and a greater proportion present at an advanced stage.[22] The database has also been used to evaluate gender disparities in metastatic colorectal cancer survival.[23] A study by Hendifar et al revealed that younger women with metastatic colorectal cancer exhibit a survival advantage as compared with their male counterparts, suggesting that hormonal status may be of prognostic significance.[23]

Zoom
Fig. 1 Survival curves in colorectal cancer patients according to age status.[22]

Rectal Cancer

A study by Lee et al used the database to compare differences in stage-specific survival between colon and rectal cancer patients.[24] The researchers demonstrated that colon cancer patients had better survival than those with rectal cancer, by a margin of 4 months in stage IIB ([Fig. 2a], [b]). However, in stage IIIC and stage IV, rectal cancer patients had better survival than colon cancer patients by approximately 3 months.

Zoom
Fig. 2 Survival and cumulative hazard of stage IV colon and rectal cancer patients (1, colon cancer; 2, rectal cancer).

The SEER database has also been used to investigate the impact of rural versus urban setting on the stage at presentation of colorectal cancer.[25] In this retrospective analysis, investigators concluded that residence in an urban setting as compared with a rural environment was associated with later stages colorectal cancer at presentation.

Analysis of the SEER database has demonstrated that the incidence of rectal cancer is increasing in patients younger than 40 years.[26] Using the detailed histology data recorded in SEER, researchers were able to determine that individuals in this population were 3.6 times more likely to have signet cell histology.[27] Current staging guidelines for rectal cancer have been reviewed using information from the SEER database. Gunderson et al helped to validate changes in AJCC staging for rectal cancer by supporting the shift of T1–2N2 lesions from IIIC to IIIA or IIIB and T4bN1 from IIIB to IIIC.[28] This study also supported subdividing T4, N1, and N2 and revised the substaging of stages II and III rectal cancer.


SEER-Medicare

Use of the linked SEER-Medicare database has allowed for a variety of analyses that span the course of colorectal cancer ranging from screening and detection to terminal care and mortality. Much of the research using this database is focused on health disparities, quality of care, and cost of treatment.[12] [29] Investigators have used the SEER-Medicare database to assess the impact of surgeon and hospital procedure volume on rectal cancer outcomes.[30] Results from this study concluded that surgeon-specific volume was associated with 2-year mortality and remained an important predictor of rectal cancer outcome even after adjustment for hospital volume.

The linked database has also been used to estimate the relative impact of changes in demographics, stage at detection, treatment mix, and medical technology on 5-year survival among older colorectal cancer patients.[31] The linked database allows estimates of cancer-related medical costs by site, stage of disease, treatment approach, and gender. In a study by Brown et al, data on Medicare payments were obtained for colorectal cancer patients during 1990–1994 from the SEER-Medicare database.[32] This study demonstrated that valid estimates of cancer-related long-term cost can be obtained from administrative claims data linked to incidence cancer registry data.

Patterns of therapy regimens and their efficacy have also been analyzed using the linked SEER-Medicare database.[33] [34] [35] [36] [37] [38] [39] [40] [41] [42] Haynes et al found that although neoadjuvant chemoradiation followed by tumor resection and postoperative chemotherapy is the standard of care for patients with clinical stage II or III adenocarcinoma of the rectum, significant variation exists in the receipt of postoperative chemotherapy after resection in the elderly population with more than one in three patients failing to receive adjuvant therapy.[43]

The abovementioned studies are just a selection of the numerous publications produced using the SEER and SEER-Medicare databases for the study of colorectal cancer.



Comparison of SEER to Other National Databases

Increasingly clinical research is being performed using national and local databases in the study of colorectal cancer. Other national registries that are comparable to SEER in terms of size and impact include the National Cancer Database (NCDB) created by the American College of Surgeons and the American Cancer Society, the National (Nationwide) Inpatient Sample (NIS), and the University HealthSystem Consortium (UHC) databases. While clinical databases (NCDB, SEER) tend to be more focused on oncologic disease incidence, treatment, and patient outcomes, the administrative databases (NIS, UHC) include data that focus on cost and hospital/provider characteristics. Administrative databases were not originally designed for clinical research, but instead to track billing for hospitals, providers, and procedures.[4] Administrative data are typically derived from two sources: requests to insurers for healthcare payments and claims for clinical services and therapies. In contrast, clinical databases were developed with specific clinical goals. The geographic catchment areas of the databases also vary. They may be national, state-based, or limited to specific locations. A comparison of the data available in each of these databases is depicted in [Table 3].

Table 3

Comparison of national clinical and administrative databases used in colorectal cancer research

SEER

NCDB

NIS

UHC

Type of data

Clinical

Clinical

Administrative

Administrative

Patient population

30% of U.S. population, 17 population-based cancer registries

70% of all U.S. cancer cases, COC-approved hospitals only

20% sampling of all hospital admissions

90% patients at nonprofit, academic medical centers

Cancer staging data

Yes

Yes

No

No

Cancer treatment

 Surgery

Yes

Yes

Yes

Yes

 Chemotherapy

No

Yes

No

Yes

 Radiation therapy

Yes

Yes

No

Yes

30-d outcomes

No

Yes

No

Yes

5-y mortality

Yes

Yes

No

No

Surgeon-specific data

No

No

Yes

Yes

Availability

Publicly available

American College of Surgeon members

Available for fee

UHC member institutions

Linkable to other databases

Web site

Yes

http://seer.cancer.gov/

No

http://www.facs.org/cancer/ncdb/

Yes

https://www.hcup-us.ahrq.gov/db/nation/nis/nisdbdocumentation.jsp

Yes

https://www.vizientinc.com/Login.htm

Abbreviations: NCDB, National Cancer Database; NIS, National (Nationwide) Inpatient Sample; SEER, Surveillance, Epidemiology and End Results Program; UHC, University Health System Consortium.



Future Work on Colorectal Cancer Using SEER

As more investigators are utilizing SEER and SEER-Medicare databases for outcomes research, there are ways these registries could be more effectively applied to further our understanding of colorectal cancer and improve patient care. Future studies targeted at improved staging and treatment algorithms will allow personalized therapy in the treatment of colorectal cancer such as watchful waiting in rectal cancer. Disparities in care received in different geographical regions and in different patient subsets need to be better identified and understood to promote national efforts for improvements in the quality of care delivered to patients. A greater emphasis on primary prevention and early detection is crucial to counter the effects of our aging and expanding population.[44]


Summary and Conclusion

SEER and SEER-Medicare are valuable databases used to understand the natural history of colorectal cancer and to evaluate the effectiveness of therapies. Data collected in these registries have served to establish and validate staging strategies, evaluate regional treatment variation, and identify disparities in care. Appropriate study design and thoughtful analyses allow investigators to make novel discoveries and answer key clinical questions in oncologic care. Understanding the strengths and limitations of these large databases is essential to perform quality surgical outcomes research.



No conflict of interest has been declared by the author(s).


Address for correspondence

Ian M. Paquette, MD
Division of Colon and Rectal Surgery, Department of Surgery
University of Cincinnati College of Medicine
2123 Auburn Avenue # 524, Cincinnati, OH 45219


Zoom
Fig. 1 Survival curves in colorectal cancer patients according to age status.[22]
Zoom
Fig. 2 Survival and cumulative hazard of stage IV colon and rectal cancer patients (1, colon cancer; 2, rectal cancer).