The EQUATOR (Enhancing the QUAlity and Transparency Of health Research) Network is an international initiative that seeks to improve the reliability and value of published health research literature by promoting transparent and accurate reporting and wider use of robust reporting guidelines.
This website serves as a channel of communication to (i) update scholars about the state of replications in economics, and (ii) establish a network for the sharing of information and ideas.
The goal is to encourage economists and their journals to publish replications.You are welcome you to join THE REPLICATION NETWORK and contribute to its content.
There has been an increasing interest in the predictors of reproducibility of research results, and how low reproducibility may inhibit efficient accumulation of knowledge. The best way to gain better understanding of reproducibility is to study it by trying to replicate published results.
Science Exchange provides a platform to rapidly and cost effectively validate studies, methods and reagents via independent replication of key experimental results. This allows the identification of high quality reproducible research and reagents.
Studies, protocols or reagents can be submitted for independent validation by expert labs in the Science Exchange network. Submitted experiments are matched with an appropriate, verified lab who will reproduce the experiments on a fee-for-service basis. The Science Exchange network consists of 800+ verified labs from top institutions, providing an opportunity to easily access specialized scientists to validate research.
National Academies of Sciences, Engineering, and Medicine; Policy and Global Affairs; Committee on Science, Engineering, Medicine, and Public Policy; Board on Research Data and Information; Division on Engineering and Physical Sciences; Committee on Applied and Theoretical Statistics; Board on Mathematical Sciences and Analytics; Division on Earth and Life Studies; Nuclear and Radiation Studies Board; Division of Behavioral and Social Sciences and Education; Committee on National Statistics; Board on Behavioral, Cognitive, and Sensory Sciences; Committee on Reproducibility and Replicability in Science.
Published in Science in 2015 (OA), the Transparency and Openness Promotion guidelines include eight modular standards, each with three levels of increasing stringency. Journals select which of the eight transparency standards they wish to implement and select a level of implementation for each. These features provide flexibility for adoption depending on disciplinary variation, but simultaneously establish community standards.
Rules for Reproducible Data Science
Excerpted From: Creating Reproducible Data Science Projects. Justin Boylan-Toomey, July 25, 2019. https://towardsdatascience.com/creating-reproducible-data-science-projects-1fa446369386
Use Version Control
Use a version control system such as GitHub or GitLab, to provide a remote backup of your codebase, track changes in your code and collaborate effectively as a team. Try to use git best practises, frequently committing small changes that solve a specific problem.
Agree a Common Project Structure
Consider using tools such as Cookiecutter to generate a standard data science project folder structure for you.If a specific projects requirements mean you need to use a different structure than your team normally uses, document the new structure in your repositories README.md file.
Use Virtual Environments
Use conda or Python’s built in venv environments to keep track of your projects dependencies and Python version information.
Clearly Document Everything
Clearly documenting your projects and code will save you time if you have to revisit the project at a later date. It will also make it far easier for others to use your code or follow and build on your analysis. At a minimum include a README.md file at the root level of your repository. The contents will vary between projects but should include a description of the project and an overview of the methodology and techniques used. (see original article for additional information)
Use Jupyter Notebooks Wisely
Consider moving your core logic out of your Jupyter Notebooks and into separate importable Python module files. This will enable the sharing of code across your team, avoiding duplicate and slightly edited versions of core data science code being scattered across your teams’ notebooks. Code quality will also improve as you can easily collaborate, run tests and conduct code reviews on your shared modules.(see original article for additional information)
Keep Your Code Sytlish
Agree coding standards. Try to write Pythonic code in line with Python’s PEP8 style guide.Using a fully featured IDE such as PyCharm or Visual Studio Code with built in linting, will highlight any poorly styled code and help identify and syntactic errors in your code.Using an automatic code formatter such as Black will ensure that the code in your teams projects has a consistent style, improving readability.
Test Your Code
Use a unit testing framework such as PyTest to catch any unexpected errors and test that your logic executes as expected. Where appropriate consider using Test Driven Development, this will ensure your code is error free and satisfies your requirements as you write it. It is also a good idea to use a tool such as Coverage to measure the proportion of your code covered by your unit tests. Python IDEs such as PyCharm have built in testing and coverage support, even automatically highlighting which lines of code are covered by your tests.
Use Continuous Integration
Consider using continuous integration tools such as Travis CI or Circle CI, to automatically test your code when merged to your master branch. Not only does this prevent broken code from reaching master, it also simplifies the code review process. You can even use Black with a pre-commit hook to automatically format committed code, removing any debates over code style from the review process and ensuring a standard code style across your repositories.
Sharing Data & Models
For larger or more complex projects consider using a cloud storage solution such as AWS S3, Azure Blob or locally hosted network storage to store your model and data. This can be combined with DVC a version control system designed to effectively version control the output of machine learning projects, without pushing your large data and model files to GIT.
Data Pipeline Management
Try to make your data pipeline code modular, breaking your pipeline into modules for each discrete process and unit testing each of them. For larger more complex pipelines consider using a workflow management tool such as Spotify’s Luigi or Apache Airflow to execute your Python modules as chained batch jobs in a directed acyclic graph.