Reproducible research in R
2022-03-16
Requirements
To start the job you need the latest version of R and RStudio (R Core Team 2021). The rmarkdown, tidyverse and tinytex packages are also required.
The code for installing those packages is as follows:
install.packages("rmarkdown", "tidyverse", "tinytex")
0.1 Before starting
If you have never worked with R
before this course, a good tool is provided by the Swirl (Kross et al. 2017) package. To begin with, complete the first 7 modules of the program R Programming: The basics of programming in R which includes:
- Basic Building Blocks
- Workspace and Files
- Sequences of Numbers
- Vectors
- Missing Values
- Subsetting Vectors
- Arrays and Data Frames
0.2 Workshop description
This course is focused on delivering basic principles of reproducible research in R, with an emphasis on collecting and/or reading data in a reproducible and automated way. For this, we will work with complex databases, which must be transformed and organized to optimize their analysis. Reproducible documents will be generated by integrating in one document: code, bibliography, exploration and data analysis. The course will culminate with the generation of a reproducible manuscript, presentation and/or interactive document. and the necessary metadata
0.3 Course Objectives
Know and understand the concept of Reproducible Research as a form and philosophy of research that allows investigations to be more orderly and replicable, from data collection to writing results.
Know and apply the concept of pipeline, which allows generating modularity from data collection to writing results, where the independent correction of a step has a cascading effect on the final result.
Learn good database collection and standardization practices, in order to optimize data analysis and peer review.
Perform critical analyzes of the nature of the data when conducting exploratory analyzes, which will allow determining the best way to test hypotheses associated with these databases.
Generating Metadata for your studies to ensure other researchers understand what your data means
0.4 Contents
Chapter 1 Tidy Data: In this chapter you will learn how to optimize a database, about cleaning and transforming databases, what a tidy database is and how to manipulate these databases with the dplyr package (Wickham et al. 2022).
Chapter 2 Reproducible research: In this chapter we will work on making a document that combines
R
codes and text to generate reproducible documents using the rmarkdown (Allaire et al. 2018) package. In addition, you will see how using RStudio you can save projects to a github repository.Chapter 4 Models in R Learn how to generate models in R, from ANOVA to GLM.
Chapter ?? The tidyverse and the pipeline concept: In this chapter you will learn about cleaning complex data.
Chapter ?? Data visualization visualize data vs. view models. Insert graphics with legend in an Rmd document
Chapter ?? Loops. Generation of own functions in R and loops
Writing scripts in R, transforming Rmd documents into a script
Presentations in R and generate interactive documents. Transformation of data in a presentation or in a Shiny app. Make a presentation or application in R.
0.5 Libros de consulta
Los principios de este curso están explicados en los siguientes libros gratuitos.
0.6 Reference books
The principles of this course are explained in the following free books.
Gandrud, Christopher. Reproducible Research with R and R Studio. CRC Press, 2013. Available for free in the following link Stodden, Victoria, Friedrich Leisch, and Roger D. Peng, eds. Implementing reproducible research. CRC Press, 2014. Available for free in the following link