1 Introduction
1.1 What you will learn
The goal of this book is to provide a solid foundation to non-disclosive data analysis using R and DataSHIELD/Opal through the resourcer R package. We illustrate how to preform such data analyses in two settings (’omics and geospatial) where the use of resources allows users to handle Big Data problems. We also present workflows of how to perform statistical analyses using data in formats other than simple tables. We aim to tackle key concepts covered in the manuscript, “Orchestrating non-disclosive big data analyses of shared data from different resources with R and DataSHIELD”, with each workflow covering these in varying detail, as well as essential preliminaries that are important for following along with the workflows on your own.
1.2 Preliminaries
For those unfamiliar with R (and those looking to learn more), in the R section we provide some links to R books. Also there are several R on-line courses such as this one at DataCamp or this at CodeAcademy that can help to introduce the main R concepts. Nonetheless, we assume that the readers of this book are already familiar with R.
For those interested in ’omic data analyses we recommend the book Omic association analysis with R and Bioconductor which provides a global overview of how to perform genomic, transcriptomic, epigenomic and multi-omic data analyses using R and Bioconductor packages. Bioconductor support provides the primary way to contact both Bioconductor developers and users and it is a great way to search for answers to your questions. Bioconductor courses provide excellent material to learn most of the Bioconductor basics as well as other advanced methods and R related topics.
[Omic data infrastructure][Bioconductor data infrastructures] deserves a mention here since understanding common data containers is an essential part of Bioconductor workflows that are used in our DataSHIELD packages designed for ’omic data analyses. This enables interoperability across packages, allowing for “plug and play” usage of cutting-edge methods. Geospatial data also requires specific data managment that is described the [Geospatial Workflow] {#GIS}
1.3 Acknowledgments
Firstly, we would like to thank OBiBa and DataSHIELD developers for providing such an impressive framework for non-disclosive data analysis. We would also like to thank all Bioconductor contributors for giving access to their packages for dealing with different omic and performing state-of-the-art data analyses. This has allowed us to create DataSHIELD packages easily without the need to re-program most of the ’omic association analyses. We also thank R package developers since their work will allow the community to implement other DataSHIELD packages specifically designed to address other biomedical, epidemiological and social science problems as we did with geospatial data.
Finally, we would like to thank the Bioconductor core team for inspiring us to write this book. We are happy to follow the two succesful works describing how to orchestrate High-Throughput Genomic Analysis and Single Cell Analysis with Bioconductor.