High-frequency checks, back-checks, and spot-checks can be used to detect programming errors, surveyor errors, data fabrication, poorly understood questions, and other issues. The results of these checks can also be useful in improving your survey, identifying enumerator effects, and assessing the reliability of your outcome measures. This resource describes use cases and how to implement each type of check, as well as special considerations relating to administrative data.
This section covers how to check the quality of data through three types of checks:
For each type of check, we cover the underlying logic, the implementation process, and how to use their results. Where available, we reference template forms and do-files to facilitate implementation of these methods.
As the name suggests, HFCs are checks of incoming data conducted on a regular basis (ideally daily). High-frequency checks can be run on survey data or administrative data. Regardless of the source, they should be run on as much of the data as possible.
For survey data, HFCs are used for identifying and correcting data errors, monitoring survey progress, measuring enumerator performance, and detecting data fraud. HFCs play a similar role for administrative data but can also be used to check its coherence (the degree to which the administrative data are comparable to other data sources) and its accuracy (e.g., information on any known sources of errors in the administrative data) (Iwig et al. 2013).
HFCs fall into five broad categories:
There are three main ways to implement HFCs:
Regardless of implementation method, it is best to prepare HFC procedures before enumerators go to the field.
On a daily basis, the Research Assistant should download the new data, run the HFC code on it, flag any issues, and send flagged responses to the PI/Research Manager. This is usually done by creating a spreadsheet with some basic information on the respondent (i.e, their unique ID, location, phone number, and the problematic response) so that field staff can contact them to verify their response. Once field teams have verified the data, a do-file can be used to fix or reconcile any errors (important: never directly edit or override the raw data! Always make edits in a do-file). This do-file can be updated regularly to incorporate new edits as you conduct HFCs on incoming batches of data.
On an ongoing (i.e., weekly or monthly) basis, the RA should maintain the HFC code (e.g., makes necessary adjustments). Changes to the HFC code should be made if you modify the survey (e.g., adding a response that was commonly given as an “Other- please specify” to the set of options). As more data is collected, you may be able to perform additional tests, such as comparing surveyors in one district to surveyors in another, or comparing responses to the same surveyor in different districts. You may want to modify the code to include these as time goes on. Discuss with your PIs how often modifications should be made to the HFC code.
There are further considerations to take when conducting HFCs on remote survey data, including figuring out optimal call times and tracking the number of call attempts. See more in the “Best practices for WFH-CATI data quality” section below.
A back-check is when previously-interviewed respondents are re-interviewed by a new enumerator using a shortened version of the original survey. The responses to the back-checked survey are then compared to the respondent’s original responses to detect discrepancies. Back-checks are used for two main purposes: i) to hold surveyors accountable by verifying surveys are actually occurring, ii) to assess how well surveyors are administering the survey, and iii) to gauge the reliability of a survey measure by seeing how respondents’ answers change between the main and back-check surveys.
An important limitation to back-checks, however, is that it is sometimes difficult to distinguish between these three explanations (or other potential explanations) for a given discrepancy.
Variables to be included fall into three distinct categories, defined below. For each question (or variable), included in the survey, you will need to determine the range of acceptable deviation. You might think consumption could vary by as much as 10% from one survey to the next, while some variables (e.g., age, gender) should not vary in the timeframe of your survey.
Once you have your list of back-check questions, follow standard survey procedures and have your back-check team administer it. This team should not be the same team conducting the original survey; you may have to hire and train additional staff. As such, back-checking surveys can carry a high cost. One money-saving alternative can be to record telephone numbers of respondents so that surveyors can call respondents instead of traveling to their locations. See more on phone survey logistics in Survey Logistics, J-PAL South Asia’s transitioning to CATI checklist, and the “Best practices for WFH-CATI data quality” section below. At the very least, the enumerator conducting the back-check should not be the same enumerator who conducted the original interview.
After the back-check surveys are complete, compare the responses in the original survey to the responses in the back-check survey. This can be done through a custom do-file (J-PAL staff and affiliates: see J-PAL’s template) or tools like IPA’s user-written commands. Responses that vary significantly between the two surveys (as defined above) should be flagged as an error. SurveyCTO has tools for conducting back-checks within the Monitor tab.
J-PAL’s Research Protocols encourage research teams to back-check at least 10% of respondents, as a best practice. Each enumerator should have at least one of their respondents back-checked, and any differences should be well-documented and reconciled.
Spot-checks are when research staff observe surveyors conducting interviews. These are usually conducted by higher-level members of the research staff, such as the Research Manager, Research Assistant, Field Coordinator, or senior surveyors. According to J-PAL’s Research Protocols, it is a suggested best practice that 15% of surveys are spot-checked. One method for doing spot-checks is to check a higher percentage of surveys at the beginning of a survey to catch errors early, then to decrease the percentage checked over time (Robert, 2019).
The goals of spot-checks are:
Plan your spot-checks so that they are at least unpredictable (if not random) to the enumerators. You want to observe enumerators doing surveys as they would in the absence of observation. Therefore surveyors should not know ahead of time which surveys will be observed. Upon arriving to the survey, enumerators should be asked if they are comfortable being observed. If enumerators are uncomfortable, you should consider why this is the case (e.g., are they concerned that they will be fired for poor surveying?).
Next, all observers must be introduced to the respondent:
Finally, the data from the spot-check forms should not be accessible by the enumerators.
Spot-checks can also be conducted, and often with more ease, in remote survey settings. See the logistical considerations for this in the “Best practices for WFH-CATI data quality” section below, and remember that IRB approvals are necessary if you plan to have a third person listen into calls for monitoring purposes. This information should be part of the consent administered to the respondent
Spot-check data can be used to test for enumerator effects: as it includes a question rating the enumerator’s quality, you can see if responses differ based on how the enumerator is ranked. You may also need to retrain enumerators who consistently earn low rankings of quality. Spot-checks also allow research teams to directly observe how respondents answer to questions. Questions that cause respondents to become upset, uncomfortable, or confused should be reworked to avoid this.
While the principles behind data quality checks remain the same for remote surveys, there are some logistical differences. Below are some extra considerations to take when implementing remote surveys, especially from a work-from-home (WFH) context. This material is heavily drawn from J-PAL South Asia’s Quality assurance best practices for CATI.
It is critical to ensure that SurveyCTO forms are coded in a manner that eliminates or minimizes possibilities of logical inconsistencies, entry or input errors, incomplete responses or sections. You can refer to SurveyCTO's CATI guide to program your forms.
A few points to keep in mind for CATI, in addition to the standard checks done as part of HFCs, are highlighted below:
Live Monitoring Incoming Data:
You can use SurveyCTO’s Data Explorer to monitor incoming data quickly. For encrypted forms, it is possible to view either just the variables that are marked publishable or all variables by temporarily allowing the web browser to use your private encryption key. With SurveyCTO 2.70, you can share view-only access to form data with external viewers who are not registered on your SurveyCTO account or server. This can be helpful to share with study partners (if required as part of an agreement or contract) or with field team members for monitoring when secure data transfer by other means is not feasible.
Call success rates, respondent availability, and respondent fatigue could prove to be major challenges in implementing a back-check survey productively for CATI. It is recommended that the target for back-checking is set to be much higher than the 10% to 15% of the sample that is typical for in-person surveys.
Spot-checks (accompaniments) for phone surveys are just as important as for in-person surveying, and can generally be done more often than the 15% recommended for in-person due to lower cost. They can be conducted through call conferencing or a three-way call. The enumerator first connects with the person monitoring the call and then ‘conferences-in’ the respondent onto the same call. This is also possible when using a calling application, like Exotel, in which case the enumerator will connect with the monitor and then proceed with the Exotel steps, conferencing-in the incoming Exotel call.
Note: IRB approvals are necessary if you plan to have a third person listen into calls for monitoring purposes. This information should be part of the consent administered to the respondent.
Further considerations:
Audio audits:
SurveyCTO has an audio audit feature that allows it to capture all or a part of the survey administration through audio recordings. However, there are challenges with using this feature for phone calls. Not all Android versions allow recording when a phone call is ongoing.
SurveyCTO note in their guide on the CATI starter kit:
Depending on the Android version installed on your device, audio audits might record a.) both sides of the conversation, b.) only the interviewer, or c.) neither. In brief, Android versions 4 through 7 allow for recording both sides of the conversation, and some success can be had with Android version 8.
SurveyCTO’s early release version of SurveyCTO Collect provides improvements on call recording for the Android versions that allow it. See the release notes for further information.
Call Recording:
Calls can be recorded when using a third-party calling application like Exotel. Recording is enabled by default when using the web version of Exotel, but when using the Exotel field plug-in for SurveyCTO call recordings can be toggled on or off.
Note: IRB approvals are necessary if you plan to record phone calls (or enable audio audits to record both sides of the conversation). This information should also be provided to the respondent during the consenting process.
Last updated March 2021.
We thank Maya Duru and Jack Cavanagh for helpful comments. Any errors are our own.
IPA User-written command: bcstats
IPA User-written command: ipacheck
J-PAL HFC exercises (J-PAL internal resource)
J-PAL Template Back-check do-file (J-PAL internal resource)
J-PAL Template HFC do-file and R script (J-PAL internal resource)
J-PAL Template monitoring form (J-PAL internal resource)
J-PAL South Asia's SurveyCTO-Exotel plugin
J-PAL South Asia's Quality assurance best practices for CATI
J-PAL South Asia's Transitioning to CATI checklist
SurveyCTO's Android release notes
SurveyCTO's Audio audit guide
SurveyCTO's CATI starter kit
SurveyCTO's Guide to using its data explorer tool
SurveyCTO: Survey design for data quality