Considerations for Reproducibility
Depending on the requirements of the data activity, contractors may conduct analysis at many points in time – baseline, interim, and/or final. Contractors should consider the following:
- When there are multiple rounds of data (baseline, interim, final), MCC prefers all data to be prepared in one complete data package for data sharing. MCC aims for public and/or restricted-access data to be as complete as possible. This means all data that was collected as part of the data activity is included in the data package (not just constructed variables produced for the analysis report or just sub-sections of questionnaires used in final analysis). Unless otherwise agreed with MCC staff, contractors should plan to package all data collected from all data rounds (baseline, interim(s), and final) as one data package. This is to ensure consistency in how de-identification of data is managed across data rounds, minimize risk of re-identification across rounds, and reduce costs.
- Establish reproducible workflow. In accordance with the contractual requirements, contractors should establish and maintain a reproducible workflow for analysis to ensure a direct link (as feasible) between the future public and/or restricted-access data, the analysis code, and the analysis results presented in baseline, interim, and/or final analysis reports.[[This can also help contractors meet evolving requirements for journal publications (for example, see AEA data and code submission requirements https://www.aeaweb.org/journals/policies/data-code).]]
- Separate de-identification code from analysis code. As a standard contract deliverable, MCC requests analysis code (code written in statistical software program to produce analysis) submitted as part of the final data package. This means the contractor should ensure any de-identification code is written separately from analysis code because de-identification code will not be publicly shared.
- Run analysis code on de-identified data. When possible, contractors should run analysis code on the de-identified data files to demonstrate reproducibility successes and/or challenges. This would improve documentation associated with reports and data, and inform the Transparency Statement to report what can, and cannot, be reproduced using the public-use and/or restricted-access data.
The Standard Evaluation Firm SOW provides specific detail on contractors’ requirements for ensuring appropriate review, feedback, and dissemination of analysis reports.