Nationally Representative Samples

Data AssetDescriptionYears IncludedDetails

Medicare 5% National Sample refreshed annually

Claims for a nationally representative random sample of all Medicare beneficiaries. This includes Medicare Part D prescription claims for beneficiaries with Part D coverage. The Part D files do not have information on drugs given during hospitalizations or which are paid under other auspices (e.g., hospice). Please note that 2006 is the first calendar year Medicare Part D claims became available to researchers, and is the first calendar year of Part D claims available through DPHS.1991 - 2022
 Data Asset Details

Medicare Part D Claims (first available to researchers in calendar year 2006):

Medicare 100% Inpatient 
refreshed annually

Inpatient hospitalization claims and master beneficiary summary files. (i.e. enrollment, chronic conditions, cost and utilization). Does not include Medicare Part D prescription claims.2000 - 2022
Medicare ACOBeneficiary claims aligned with providers participating in an Accountable Care Organization (ACO). This includes Medicare Part D prescription claims. The Part D files do not have information on drugs given during hospitalizations or which are paid under other auspices (e.g., hospice).2011 - 2014
 Data Asset Details
Medicare 5% Limited Data Set (LDS) refreshed annuallyClaims for a nationally representative random sample of all Medicare beneficiaries; does not include prescription drug claims. Less stringent criteria for CMS DUA approval2010 - 2021
 Data Asset Details

Note CMS Limited Data Set Files aren’t available for Medicare Part D claims

Medicare 100% Limited Data Set (LDS) refreshed annuallyInpatient file claims for 100% of Medicare beneficiaries; does not include prescription drug claims. Less stringent criteria for CMS DUA approval2010 - 2021
 Data Asset Details
  • 2010-2021 100% Inpatient
  • 2010-2015 100% Denominator
  • 2016-2021 100% Master Beneficiary Summary file (MBSF): Base segment

Geographic Samples

Data AssetDescriptionYears IncludedDetails
Medicare 100% NC/SC Claims for beneficiaries in North and South Carolina2013 - 2017
 Data Asset Details

This cohort includes claims files for Medicare beneficiaries who resided in North Carolina or in South Carolina from 2013-2017. Please note that for beneficiaries who moved into North Carolina or South Carolina in calendar year 2017, we do not have prior year claims. Beneficiaries who resided in NC or SC at any point between 2013-2016 and were alive in 2017 have complete data for all years (2013-2017).

Data Asset Details

Medicare Part D Prescription Claims

Medicare 100% SEDI

Claims for beneficiaries participating in the SEDI project (Durham/Cabarrus NC, Mingo WV, Quitman MS, plus border counties)

2009 - 2014
Medicare 20% Geographic SamplePer state claims based on 20% beneficiary random sample in Florida, New York, Alabama, Tennessee, Illinois, and Louisiana2013 - 2016
Duke EHR-SEDI Data MartMedicare claims linked to Duke University Health System EHR data for patients with a Durham Country, NC address2007 - 2014

Disease Cohorts

Data AssetDescriptionYears Included
100% Medicare Mitral Valve PXClaims for beneficiaries who have undergone a mitral valve procedure2006 - 2014

Registry Linkages

Data AssetDescriptionYears IncludedDetails
GWTG-HFClaims for beneficiaries linked to AHA's Get With the Guidelines-Heart Failure registry of patients hospitalized for heart failure2003 - 2016
 Data Asset Details

Get With the Guidelines Heart Failure Registry


PROSPERClaims for beneficiaries linked to AHA's Get With the Guidelines-Stroke registry of patients hospitalized for stroke2003 - 2015
 Data Asset Details


 NC Medicaid
Data AssetDescriptionYears Included
NC Medicaid

NC Medicaid claims data (limited data set) of payments from the NC DHHS to healthcare providers for services rendered. Additional files include member and provider files.

July 1, 2013 - March 31, 2024

NC Medicaid Data Request Process

NC Medicaid Data Dashboard

Data AssetDescriptionYears Included
National Inpatient Sample (NIS)Created from the largest publicly available all-payer inpatient health care database in the U.S.1994 - 2015 (varies by file)
Kids’ Inpatient Database (KID)All-payer pediatric database of hospital stays.2000 - 2012 (varies by file)
State Inpatient Databases (SID)State-specific databases of inpatient discharge records.2000 - 2014 (varies by state and file)
State Ambulatory Surgery and Services Databases (SASD)State-specific database of ambulatory surgery data and outpatient services data from hospital-owned facilities.2000 - 2014 (varies by state and file)
State Emergency Department Databases (SEDD)State-specific databases of emergency visits at hospital-affiliated emergency departments that do not result in hospitalization.2005 - 2014 (varies by state and file)

HCUP Claims Data Asset Metadata

 American Hospital Association
Data AssetDescriptionYears Included
American Hospital Association (AHA) SurveyAnnual AHA survey data allow researchers to understand utilization, physician arrangements, organizational structure and more. See the AHA Annual Survey Website for more information.2015, 2018, 2020
 Get With the Guidelines – Heart Failure (GWTG-HF) Registry
Data AssetDescriptionYears Included

Get With the Guidelines–Heart Failure (GWTG-HF) Registry

Get With the Guidelines-Heart Failure (GWTG-HF) is a registry from the American Heart Association that includes data on hospital admissions for heart failure from many hospitals throughout the U.S. 

DataShare has a method for linking the GWTG-HF data to Medicare 100% inpatient claims using indirect identifiers, originally developed by Brad Hammill. See this paper for details

 Jackson Heart Study (JHS)
Data AssetDescriptionYears Included

Jackson Heart Study (JHS)

DataShare has linked Medicare Fee-for-Service claims data for the JHS cohort. Duke is no longer JHS Vanguard Center (effective 8/12/2024). Researchers must obtain JHS data access from the JHS Coordinating Center (

2014 - 2021
 Reference Library (Reflib)
Data AssetDescriptionYears Included

Reference File Library - RefLib

The PopHealth DataShare Reference File Library "RefLib" is a collection of publicly available terminologies/vocabularies and data that can be useful additions to healthcare data analyses. For example, the collection includes ICD9/ICD10 terminologies as well as CPT, NDC, and NPI Taxonomy.

RefLib contents are sourced from multiple entities including the Unified Medical Language System (UMLS) and the Agency for Healthcare Research and Quality (AHRQ).

See documentation RefLib Documentation (pdf)
 SEER-Medicare Linked Database
Data AssetDescriptionYears Included

SEER-Medicare Linked Database

The SEER-Medicare data reflect the linkage of two large population-based sources of data that provide detailed information about Medicare beneficiaries with cancer. The data come from the
Surveillance, Epidemiology and End Results (SEER)
 program of cancer registries that collect clinical, demographic and cause of death information for persons with cancer and the Medicare
claims for covered health care services from the time of a person's Medicare eligibility until death.

The linkage of these two data sources results in a unique population-based source of information that can be used for an array of epidemiological and health services research. For example,
investigators using this combined dataset have conducted studies on patterns of care for persons with cancer before a cancer diagnosis, over the period of initial diagnosis and treatment, and
during long-term follow-up. Investigators have also examined the use of cancer tests and procedures and the costs of cancer treatment.

See SEER Website