Commonly-Linked Data Assets

The following files are often linked to insurance claims during analysis.  It is important to identify the files that your study will use prior to requesting data for NC Medicaid, SEER-Medicare, or CMS assets since you are only permitted to link those assets listed in an approved DUA application.

ResearchDataGov.org aggregates restricted federal data that is accessible through a standard application process. The agencies providing data assets can be found here, and include the Census Bureau, Bureau of Labor Statistics, National Center for Health Statistics, and SAMHSA Center for Behavioral Health Statistics and Quality.  Resources are filterable by agency, topic, linking variable, and availability of a public use file (PUF).  Please work with the appropriate Duke contracting offices, IRBs, and data security teams before employing these data.

 

This government report on different socioeconomic status (SES) measures describes several measures that are not listed below but may be pertinent to your study and DUA application. Among other things, it compares SES measures and categorizes area-level SES measures by level of disaggregation (census block, census tract, ZIP, county, person-level).

 

The table below provides key information about data assets that are frequently utilized in DataShare analyses.

Commonly-Linked File

Cost1

Reason for Use

Justification Language for the DUA

How It Is Linked to the Primary Data

File Location

Data Documentation

Commonly-Linked File

Cost1

Reason for Use

Justification Language for the DUA

How It Is Linked to the Primary Data

File Location

Data Documentation

American Hospital Association (AHA) Annual Survey Files

Has information regarding hospitals.2

  • Organizational structure

  • Facilities and services

  • Beds and utilization

  • Staffing

  • Expenses

  • Physician arrangements

  • System affiliation

  • Geographic indicators

  • Accreditations and approval codes by credentialing organization

AHA Annual Survey Files will be used to obtain detailed information about hospital systems and services, such as organizational structure, facilities and services, beds and utilization, staffing, expenses, physician arrangements, system affiliation, geographic indicators, and accreditations and approval codes by credentialing organizations.

Linked via hospital provider ID (PROVIDER) in the claims files

 

 

Area Deprivation Index

Free

Geography-based ranking based on SES variables, such as theoretical domains of income, education, employment, and housing quality.

Area Deprivation Index files will be linked via the ZIP+4 and will provide a measure of neighborhood socioeconomic status (SES) disadvantage for evaluation of SES impact on outcomes.

Justification for EDB file: [Year/percent] EDB 9 digit ZIP code files contain a key variable needed for linkage to the detailed geographic measures present in the Area Deprivation Index... (also list other linked geography files, if applicable)

Linked via ZIP code + 4 or census tract

Medicare: Must also request the EDB 9 digit ZIP code files for same the year/percent of your claims files

NC Medicaid: Not usable because most 9-digit ZIP codes in the data end in “0000” and there is not a census tract variable

  • On-site: RefLib schema

  • Create an account to download the files here.

  • Request a different version of the ADI here.

  • ADI map

American Medical Association (AMA) Physician Masterfile

Contains information about physicians, including:

  • Location

  • DOB

  • Sex

  • Type of practice (e.g., solo, group, etc.)

  • Specialty

  • Year graduated med school

  • Year completed training

  • Medical school location

Does not contain race or ethnicity.

The AMA Physician Masterfile data will be used to obtain taxonomy (specialty) information and other physician characteristics.

Research ID if linked to SEER-Medicare

NPI for other data sources

No communal files. Must be purchased through the NCI and from MMS, Inc. on a per-DUA basis when linking SEER-Medicare data.

File layout

Dartmouth Atlas of Health Care3

Free

Geographic boundary files for mapping

Basic healthcare utilization rates for different geographic aggregations

The Dartmouth Atlas of Health Care will be used to obtain geographic boundaries for mapping and basic healthcare utilization rates for geographic aggregations.

Linked via patient or provider ZIP code in the claims data files

 

 

Hospital Compare

(CMS PUF)

Free

Has information about hospital performance metrics

Hospital Compare Results will be used to describe hospital performance metrics, such as overall quality of care ratings, patient-reported quality of communication with providers, risk-adjusted 30-day readmission and 30-day mortality rates for specific conditions, complication rates, and outcome metrics.

Linked via hospital provider ID (PROVIDER) in the claims files

Data dictionary

HRSA Area Health Resources Files1

Free

Lots of area-level information for different geographic aggregations, including information on:

  • Health facilities

  • Health professions

  • Measures of resource scarcity

  • Health status

  • Economic activity

  • Health training programs

  • Socioeconomic and environmental characteristics

HRSA Area Health Resources Files will be used to ascertain area-level information for geographic aggregations including, but not limited to, data about health facilities, health professions, measures of resource scarcity, health status, economic activity, health training programs, and socioeconomic and environmental characteristics.

Linked via patient or provider ZIP code, county, or state in the claims data files

Datasets

NC Health Professions Data System (HPDS) (a.k.a. Health Workforce NC data)

Maybe

Descriptive data for selected licensed NC providers from 1979 - present.

Includes provider-level race/ethnicity variables

The HPDS data will be used to obtain taxonomy (specialty) information and other health care provider characteristics.

Linked via provider NPI

Most useful for linkage to NC Medicaid claims.

Free data here and here.

Contact nchealthworkforce@unc.edu to obtain additional data.

 

NC Office of State Budget and Managements’ rural/urban classification

Free

NC DHHS’ preferred rural/urban definition for NC Medicaid analysis

Note: As of January 2022, NC DHHS is considering transitioning to another rural/urban classification scheme.

We may link publicly available, area-level data to the claims files using enrollees’ county of residence and providers’ county of practice. Potentially public-use data files include NC Office of State Budget and Managements’ rural/urban definition or other sources.

Linked via patient or provider county in claims

Crosswalk (download)

 

NPI/NPPES Registry4

(CMS PUF)

Free

Almost exclusively for provider taxonomy (specialty) information. 

Can also provide physician zip code (though PROVZIP on carrier files or the provider ZIP from the POS file for OP/IP files is preferred)

Also request UPIN if using pre-2007 CMS data.

The NPI/NPPSE Registry will be used to obtain taxonomy (specialty) information and other health care provider characteristics.

Linked to physician NPI IDs (e.g., PRF_NPI, OP_NPI) in the claims data files.

  • On-site: RefLib schema

  • VRDC: NPPES library (refreshed monthly); must submit proof of approved linkage to CCW to gain access7

Code values

Provider of Service (POS) File2

(CMS PUF)

Free

Has lots of information regarding hospitals including but not limited to:

  • Geography (state, county, city, CBSA, etc.)

  • Ownership (type, date last changed)

  • Affiliated services (how many affiliated with the hospital), like :

    • Ambulatory surgery center

    • ESRD units

    • Home health

    • Hospice

  • Off-site services, like:

    • Psychiatric hospital

    • Rehabilitation units

    • Urgent care center

  • Bed size

  • How many of certain types of procedure rooms, like for :

    • Cardiac catheterization

    • Endoscopy

  • Affiliation with medical school

The Provider of Service File will be used to obtain information about healthcare facilities including, but not limited to, geography, ownership characteristics, affiliated services, off-site services, bed size, procedure room characteristics, and affiliations with medical schools.

Linked via facility provider ID (PROVIDER) in the claims files

Methodology and data dictionaries are on the download pages.

 

Provider Specific File

(CMS PUF)

Free

Has provider attributes and policy rates set annually by CMS for the Prospective Payment System.  It covers inpatient, SNF, home health agency, hospice, inpatient rehab, long-term care and inpatient psychiatric facility providers.

Includes:

  • NPI

  • Provider location (state, locality code)

  • Facility location (county, census division, MSA, CBSA)

  • Bed size

  • Hospital quality indicator

  • CMS value-based payment incentives

  • Ratio of Medicaid patients served

  • Ratio of interns and residents at teaching hospitals

 

 

RUCA Code files1,5

Free

A better classification of urban/rural status than the binary information available in other files (10 classifications).  Takes actual commuting patterns into account when defining rurality, e.g., distinguishes between a rural area where most people are commuting into a large metro area daily and a rural area that is isolated and does not have many people commuting into larger metro areas.

RUCA Code files will be used to classify the urban or rural statuses of geographic areas.

Linked via patient or provider ZIP codes in the claims files

 

 

RUCC files3

Free

Rural/urban classification that primarily derives rurality from population density and geographic proximity to metro areas (9 classifications).

RUCC files will be used to classify the urban or rural statuses of geographic areas.

Linked via patient or provider county

 

File documentation 

AHRQ Social Determinants of Health Database

Free

Database that aggregates variables from 47 data assets to provide social, economic, educational, physical infrastructure, and healthcare characteristics. It includes data from:

  • American Community Survey

  • AHA Annual Survey

  • Area Health Resources Files

  • Minority Health Social Vulnerability Index

  • HRSA Medically Underserved Areas

  • POS Files

  • Social Vulnerability Index

  • RUCC

  • RUCA

The AHRQ Social Determinants of Health Database will be used to describe the social characteristics (e.g., demographics, veteran status, socioeconomic disadvantage), economic status (e.g., income, unemployment rate, poverty), educational characteristics (e.g., attainment, literacy), physical infrastructure (e.g, housing, crime, transportation), and healthcare context (e.g., provider characteristics, measures of resource scarcity, healthcare quality) of geographic areas

Linked via patient, provider, or facility counties, ZIP codes, and census tracts in the claims files

Note: Census tract-level files are not available in the VRDC library.

UMLS Metathesaurus1,6

Free

Has detailed information for many biomedical vocabularies. We use it to provide code descriptions for things like:

  • ICD-9-CM diagnosis and procedure codes

  • ICD-10-CM diagnosis codes

  • ICD-10-PCS procedure codes

  • CPT/HCPCS procedure codes

  • BETOS codes

The UMLS Metathesaurus will be used to provide code descriptions (i.e. for ICD-9-CM diagnosis and procedure codes, ICD-10-CM diagnosis codes, ICD-10-PCS procedure codes, CPT/HCPCS procedure codes, and BETOS codes) and mapping ICD-9 to ICD-10 codes.

Descriptions are linked to the diagnosis and procedure codes in the claims data.

 

Release documentation

UPIN Directory2

Free

As with the NPI Registry, this is almost exclusively used for specialty information. Only relevant for pre-2007 data.

The UPIN Directory will be used to obtain taxonomy (specialty) information for physicians associated with claims prior to 2007.

Linked to physician UPIN (e.g., PRF_UPIN, OP_UPIN) in the claims data files.

VRDC: 2003 - 2007 are in the PROVIDER library; CMS must include the following files and EPPE codes on the DUA’s approved file list8:

  • UPIN group file: UPING

  • UPIN member file: UPINM

  • UPIN master file: UPINMA

2003 Data dictionary

US Census data: American Community Survey 

Free

Geographic level (state, metropolitan area, zip code, etc.) aggregated demographic, economic, and SES information drawn from the US Census and the American Community Survey. Data includes many averaged metrics including but not limited to:

  • Poverty and employment rates

  • Family size

  • Housing data

  • EEO occupation codes

  • Income levels

  • Business activity

US Census data will be used to obtain geographic-level aggregated demographic, economic, and socio-economic information.

Linked via patient or provider state or ZIP codes in the claims files using Census zip code tabulation areas

Sometimes we first link Medicare data to another source (like GWTG or DEDUCE) that may have more detailed geographic information. In the case of DEDUCE, we have geocoded address, which we can link to Census data at various levels of geography, down to the census block or tract.

 

Code lists, definitions, and accuracy

1 Cost to projects that pay the annual DataShare infrastructure fee

2 If all that is needed from this file are things like bed size and teaching hospital status, the Provider of Service file may be a suitable substitute. The American Hospital Association file may have more detailed information about hospital systems and services available at affiliated hospitals, however.

3 Approved by ORC for upload into the VRDC. 

4 Available directly from CMS in the VRDC.  Request access in the free text portion specification document and/or request access directly from GDIT following approval.

5 RUCA and RUCC classifications for a patient’s residence at the time of diagnosis are included in SEER-Medicare data.

6 Include this non-CMS file in all CMS DUA applications.  Available on Oracle in PACE as five tables with the REFLIB schema: CPT_HCPCS, ICD10CM, ICD10PCS, ICD9CM_DX, ICD9CM_PX; the GEMS and other files are from other sources.  Michael Stagner can provide the five SAS dataset files for upload into the VRDC.

7 Submit a request to the CCW help desk with an approved DUA Attachment A that includes the file in the linked files list.

8 Request the 100% extracts and needed data-years for the UPIN files on the DUA specification worksheet’s Annual Extract Summary\Miscellaneous\Other section. If necessary, this can be done via correspondence with ResDAC after CMS approves the linkage.