Medicare Data
Claims data from Medicare Parts A and B, prescription drug data from Part D, beneficiary information, and cost reports.
All Medicare Fee for Service recipients in the US: over 65 million beneficiaries.
Access
There are three levels of Medicare data, and the level of data requested determines the availability and access procedures. Data are available to any class of researcher, though some data requests require review and approval by CMS, a formal Data Use Agreement (DUA), and approval from the CMS Privacy Board.
Non-Identifiable Data Files are public use data and are available to anyone for purchase on the Public Use Files (PUF) or Non-Identifiable Data Files page of the CMS website.
Limited Data Sets (LDS) require a DUA, but are not subject to Privacy Board review.
Research Identifiable Files (RIF) require a DUA and are subject to Privacy Board review. The Research Data Assistance Center (ResDAC) at the University of Minnesota is the intermediary for processing and filing requests for these data.
Researchers have historically accessed CMS data through a Physical Research Data Request process, in which data are mailed to users via external media. Alternatively, researchers can access Medicare data through the CMS Virtual Research Data Center (VRDC), hosted by the Chronic Conditions Warehouse (CCW).
Researchers working in the VRDC have direct access to approved data files and conduct their analysis within the CMS secure environment. Users must go through Remote Identity Proofing (RIDP) prior to obtaining a CCW User ID to access the VRDC.
According the CCW, the process for accessing data through the Physical Research Data Request process and through the VRDC are very similar. The Physical Research Data Request process page includes all necessary forms and tips for completing them, as well as the process for accessing records through the VRDC.
To determine which documents are required to submit a request, ResDac has a Request Material Tool which will generate a customized output of request materials based on the type of requestor, funding source, and request type. To initiate the data request process, send draft versions (without signatures) of the “Required Materials” as applicable to [email protected].
Timeline for Access
ResDAC recommends planning a minimum of 6-8 months between a draft application for data and receipt of the data. Privacy Board approval may take up to 2 months or more, and data processing may last 3-6 months.
Physical Research Data Request: Data must be destroyed upon reaching the expiration date of the DUA. Researchers must request an extension to continue working with the data beyond that time.
Virtual Research Data Center Request: Project data are stored within the VRDC for a three-year period. Note that this only includes analysis files and does not include the raw data files.
Lag Time
The lag times are similar in length for Physical Research Data Requests and for researchers using the VRDC. Files are updated annually, and are available on approximately a one-year lag, with updates usually available in December. For a list of available file-years and upcoming file availability, see File Availability.
Some files, including Medicaid fee-for-service claims, are updated quarterly and are available on a 5-6 month lag. Quarterly claims are available 4 months after the quarter end and are approximately 93% mature and complete. Annual refreshes are available 14 months after year end and are 100% mature and complete. For example, 2015 Q4 data are available in raw form (93% mature and complete) by April 2016, and 2015 Q4 final form data are available February 2017. More information about quarterly data availability can be found here.
Cost
For Physical Research Data Requests, CMS generally charges for data by the number of beneficiaries in the requested cohort. For data including up to 1 million beneficiaries for one year, the fee per data set (i.e., per file per year of data) ranges between $1000 and $5000 depending on the data set. See Pricing Information for CMS Data Files for complete descriptions of the cost of data by beneficiary count and data set (this document also provides cost information for the VRDC). The ResDAC Assistance Desk can provide a ballpark cost estimate through the CCW Cost Estimate Application Tool, or a formal cost estimate with the completion of the Specifications Worksheet Cost Estimate Request form submitted to [email protected].
Limited Data Sets, PUF/Non-Identifiable Data File, and Research Identifiable Files may also be available for reusereuse at a lower cost if a particular file is already available to a researcher within the same research organization. CMS waives the reuse fee for student dissertation or thesis research. Contact CMS at [email protected] or your university department's data manager for more information on which files may be available. NBER affiliates should also contact the NBER.
For VRDC requests, there is an annual access fee for each researcher who will access data through the VRDC; research teams must purchase one “seat” per researcher at $25,000 per seat. Access can be renewed or terminated on a quarterly or annual basis ($25,000 annual renewal fee, $6,250 quarterly renewal fee). Additionally, researchers are subject to a one-time project fee that is data specific starting at $15,000. For requested files that exceed 500 GB of space, an additional $2,000 fee is imposed.
Linking
Researchers may request data on the full set of Medicare beneficiaries, on a random sample of beneficiaries, or define their cohort. There are two options for defining a cohort:
Option One: Researchers may limit the cohort by sex, age, date of death, race, residence, or Medicare status.
Option Two: Researchers may send a “finder file” identifying a defined list of individuals. This file will be linked using an exact match on one of the variables to be linked on (see below). For more information, see Submission of Medicare Data Finder and Crosswalk Files (found in the "Finder File Encryption Policy" PDF).
Identifiers Available for Linking
- According to CCW’s Finder File Encryption Policy, finder files must consist of the following types:
- Health Insurance Claim numbers
- Social Security numbers
- Medicare Beneficiary Identifier Numbers
- RES_ID / State Code - Identifies resident in the national repository
- Unique Physician Identification Number
- National Provider Identifiers
- Employer Identification Number/Tax Identification Number
- Secondary identifiers: last name* date of birth, zip code, and partial SSN or HIC
- *According to the CCW, last name is an unreliable search criterion, and users should prepare for a lower match rate when using last name to link records.
Linking to Outside Data Sources
According to the RIF Data Use Agreement, researchers must request permission prior to matching CMS data with any external data sources, or with CMS files not listed in the initial DUA.
Data Contents
Partial List of Variables
Beneficiary demographic information; claim payment amount; type of claim; claim procedure and diagnostic codes; ICD-9; ICD-10; location of claim; Drug plan characteristics: copay, coinsurance, type of donut-hole coverage; Prescription drug event characteristics: drug NDC11, drug cost, drug OOP cost, drug benefit phase of claim; prescriber NPI; prescription filled location.
J-PAL Randomized Evaluations Using this Data Set
Sacarny, Adam, David Yokum, Amy Finkelstein, and Shantanu Agrawal. 2016. “Medicare Letters To Curb Overprescribing Of Controlled Substances Had No Detectable Effect On Providers.” Health Affairs 35(3):471-9.