11 Setup

As described in a previous Chapter the resourcer R package allows users to deal with the main data sources (using tidyverse, DBI, dplyr, sparklyr, MongoDB, AWS S3, SSH etc.) and is easily extensible to new ones including specific data infrastructure in R or Bioconductor. So far ExpressionSet and RangedSummarizedExperiment objects saved in .RData files are accesible through the resourcer package. The dsOmics package contains a new extension that deals with VCF (Variant Calling Format) files which are coerced to a GDS (Genomic Data Storage) format (VCF2GDS).

In order to achieve this resourcer extension, two R6 classes have been implemented:

  • GDSFileResourceResolver class which handles file-base resources with data in GDS or VCF formats. This class is responsible for creating a GDSFileResourceClient object instance from an assigned resource.
  • GDSFileResourceClient class which is responsible for getting the referenced file and making a connection (created by GWASTools) to the GDS file (will also convert the VCF file to a GDS file on the fly, using SNPRelate). For the subsequent analysis, it is this connection handle to the GDS file that will be used.

11.1 Providing DataSHIELD packages in the opal server

The required DataSHIELD packages must be uploaded to Opal through the Administration site by accessing to DataSHIELD tab. In our case, both dsBase and dsOmics and resourcer packages must be installed as is illustrated in the figure.

Figure 11.1: Installed packages in the demo opal server

Installed packages in the demo opal server

The tab +Add package can be used to install a new package. The figure depicts how dsOmics was intalled on Opal

Figure 11.2: Description how dsOmics package was intalled into the demo opal server

Description how `dsOmics` package was intalled into the demo opal server

For reproducing this book the following packages must be installed on Opal

From CRAN: 
   - resourcer
From Github: 
   - datashield/dsBase
   - datashield/dsGeo (tombisho/dsGeo)
   - isglobal-brge/dsOmics

Note that the dsGeo package imports the sp, rgeos and rgdal R packages. rgeos and rgdal in turn require some additional libraries which can be installed as follows (on Ubuntu systems - see the notes in rgeos and rgdal for other operating systems):

sudo apt-get update
sudo apt-get install libgdal-dev libproj-dev libgeos++dev

11.2 Required R Packages in the client site (e.g. local machine)

Using DataSHIELD also requires some R packages to be installed on the client site. So far, the following R packages must be installed (in their development version):

install.packages("DSOpal", dependencies = TRUE)
install.packages("dsBaseClient", repos = c("https://cloud.r-project.org", "https://cran.obiba.org"), dependencies = TRUE)
devtools::install_github("isglobal-brge/dsOmicsClient", dependencies = TRUE)
devtools::install_github("tombisho/dsGeoClient", dependencies = TRUE)

The package dependencies are then loaded as follows:

library(DSOpal)
library(dsBaseClient)
library(dsOmicsClient)
library(dsGeoClient)