• OmicSHIELD
  • Welcome
  • I Preamble
  • 1 Introduction
    • 1.1 Materials to read beforehand
    • 1.2 What are “resources”: A very simple explanation without any technicalities
    • 1.3 Capabilities of OmicSHIELD
    • 1.4 Opal servers
  • 2 Omic data analysis: types of implemented analyses
  • 3 Datasets description
    • 3.1 CINECA project
    • 3.2 Data organization in the Opal server
    • 3.3 TCGA Liver project
    • 3.4 Epigenetic data from the GSE66351 project
    • 3.5 HELIX project
      • 3.5.1 Data organization in the Opal server
  • 4 Exploration of Genotype data
    • 4.1 Connection to the Opal server
    • 4.2 Assign the VCF resources
    • 4.3 Assign the phenotypes
    • 4.4 Merge VCF (genotype) and phenotype information
    • 4.5 Basic information of the Genotype data
    • 4.6 Basic statistics of the genotype data
    • 4.7 Principal component analysis
  • II Workflows
  • 5 Genome wide association study (GWAS) analysis
    • 5.1 Single cohort
      • 5.1.1 Connection to the Opal server
      • 5.1.2 Assign the VCF resources
      • 5.1.3 Assign the phenotypes
      • 5.1.4 Merge VCF (genotype) and phenotype information
      • 5.1.5 Performing the GWAS
    • 5.2 Multi cohorts
      • 5.2.1 Connection to the Opal server
      • 5.2.2 Assign the VCF resources
      • 5.2.3 Assign the phenotypes
      • 5.2.4 Merge VCF (genotype) and phenotype information
      • 5.2.5 Performing a meta - GWAS
      • 5.2.6 GWAS Plots
      • 5.2.7 Meta-level quality control
      • 5.2.8 Meta-analysis
      • 5.2.9 Ultra fast pooled GWAS analysis
    • 5.3 Post-omic analysis
      • 5.3.1 Enrichment analysis
  • 6 Polygenic risk scores
    • 6.1 Definition
    • 6.2 Study configuration
    • 6.3 Source of polygenic scores
    • 6.4 Connection to the Opal server
    • 6.5 Assign the VCF resources
    • 6.6 Calculate the PRS
      • 6.6.1 PGS Catalog Example
      • 6.6.2 Custom prs_table example
    • 6.7 Study of the PRS results
  • 7 PLINK
    • 7.1 Connection to the Opal server
    • 7.2 Build the PLINK call
    • 7.3 Call the analysis
  • 8 SNPTEST
    • 8.1 Connection to the Opal server
    • 8.2 Build the SNPTEST call
    • 8.3 Call the analysis
  • 9 Differential gene expression (DGE) analysis
    • 9.1 Connection to the Opal server
    • 9.2 Assign the RSE resource
    • 9.3 Inspect the RSE
    • 9.4 Pre-processing for RNAseq data
    • 9.5 DGE analysis
    • 9.6 Surrogate variable analysis
  • 10 Epigenome-wide association analysis (EWAS)
    • 10.1 Initial steps for all use cases
      • 10.1.1 Connection to the Opal server
      • 10.1.2 Assign the ExpressionSet resource
      • 10.1.3 Inspect the ExpressionSet
    • 10.2 Single CpG pooled analysis
    • 10.3 Multiple CpGs pooled analysis
    • 10.4 Full genome meta-analysis
    • 10.5 Adjusting for surrogate variables
  • III Showcase on a real dataset
  • 11 Transcriptomic analysis in the HELIX cohort.
    • 11.1 Getting started.
    • 11.2 Data formatting and manipulation in DataSHIELD.
    • 11.3 Differential gene expression analysis.
    • 11.4 Surrogate variable analysis.
    • 11.5 Enrichment analysis of functional annotations.
  • 12 Epigenetic analysis in the HELIX cohort.
    • 12.1 Getting started.
    • 12.2 Data formatting and manipulation in DataSHIELD.
    • 12.3 Full genome meta-analysis.
    • 12.4 Adjusting for surrogate variables.
    • 12.5 Enrichment analysis of functional annotations.
  • IV Technical details
  • 13 Fast GWAS
  • 14 PCA
  • 15 Differential privacy
  • 16 Internal structure of the PRS functionality
    • 16.1 Client
    • 16.2 Server resolver
    • 16.3 Server analysis
  • 17 Compression of GDS files vs. performance
  • 18 Recommended R server specs
    • 18.1 Limma + voom
    • 18.2 GWAS + PCA
    • 18.3 General remarks
  • 19 Disclosure filter configurations
  • V Appendix
  • 20 Session info
  • 21 Contributors
  • VI References
  • 22 References
  • Published with bookdown

Privacy protected federated omic data analysis in multi-center studies with DataSHIELD

7 PLINK

A simple association test use case will be illustrated in this section to portrait the usage of PLINK in OmicSHIELD.

⚠️ RESOURCES USED ALONG THIS SECTION

From https://opal-demo.obiba.org/ :

STUDY

TABLE

PROFILE

cohort1

RSRC.brge_plink

omics

The structure followed is illustrated on the following figure.

Proposed infrastructure for PLINK analysis.

Figure 7.1: Proposed infrastructure for PLINK analysis.

The data analyst corresponds to the “RStudio” session, which through DataSHIELD Interface (DSI) connects with the Opal server located at the cohort network. This Opal server contains a resource with the URL where the computational resource is located, also the access credentials are kept secret inside the resource and only the Opal administrator has access to them. A connection to this computational resource is created on the Opal server and the aggregated results are passed to the analyst.

7.1 Connection to the Opal server

We have to create an Opal connection object to the cohort server. We do that using the following functions.

library(DSOpal)
library(dsBaseClient)
library(dsOmicsClient)
builder <- newDSLoginBuilder()
builder$append(server = "study1", url = "https://opal-demo.obiba.org",
               user = "dsuser", password = "P@ssw0rd",
               resource = "RSRC.brge_plink",
               profile = "omics")
logindata <- builder$build()
conns <- datashield.login(logins = logindata, assign = TRUE,
                          symbol = "client")

7.2 Build the PLINK call

Now, we are ready to run any PLINK command from the client site. Notice that in this case we want to assess association between the genotype data in bed format and use as phenotype the variable ‘asthma’ that is in the file ‘brge.phe’ in the 6th column. The sentence in a PLINK command would be (NOTE: we avoid –out to indicate the output file since the file will be available in R as a tibble). The arguments must be encapsulated in a single character without the command ‘plink’.

plink.arguments <- "--bfile brge --logistic --pheno brge.phe --mpheno 6 --covar brge.phe --covar-name gender,age"

7.3 Call the analysis

Then, the analyses are performed by:

ans.plink <- ds.PLINK("client", plink.arguments)

The object ans.plink contains the PLINK results at each server as well as the outuput provided by PLINK

  lapply(ans.plink, names)
$study1
[1] "results"   "plink.out"
  head(ans.plink$study1$results)
# A tibble: 6 x 10
    CHR SNP           BP A1    TEST   NMISS    OR   STAT        P X10  
  <dbl> <chr>      <dbl> <chr> <chr>  <dbl> <dbl>  <dbl>    <dbl> <chr>
1     0 MitoC3993T  3993 T     ADD     2286 0.752 -1.33  0.182    <NA> 
2     0 MitoC3993T  3993 T     gender  2286 0.742 -3.27  0.00107  <NA> 
3     0 MitoC3993T  3993 T     age     2286 1.00   0.565 0.572    <NA> 
4     0 MitoG4821A  4821 A     ADD     2282 2.68   1.71  0.0879   <NA> 
5     0 MitoG4821A  4821 A     gender  2282 0.740 -3.31  0.000940 <NA> 
6     0 MitoG4821A  4821 A     age     2282 1.00   0.465 0.642    <NA> 
  ans.plink$study$plink.out
$status
[1] 0

$output
 [1] ""                                                                                   
 [2] "@----------------------------------------------------------@"                       
 [3] "|        PLINK!       |     v1.07      |   10/Aug/2009     |"                       
 [4] "|----------------------------------------------------------|"                       
 [5] "|  (C) 2009 Shaun Purcell, GNU General Public License, v2  |"                       
 [6] "|----------------------------------------------------------|"                       
 [7] "|  For documentation, citation & bug-report instructions:  |"                       
 [8] "|        http://pngu.mgh.harvard.edu/purcell/plink/        |"                       
 [9] "@----------------------------------------------------------@"                       
[10] ""                                                                                   
[11] "Skipping web check... [ --noweb ] "                                                 
[12] "Writing this text to log file [ /tmp/ssh-7484/out.log ]"                            
[13] "Analysis started: Mon Apr  4 15:55:21 2022"                                         
[14] ""                                                                                   
[15] "Options in effect:"                                                                 
[16] "\t--bfile brge"                                                                      
[17] "\t--logistic"                                                                        
[18] "\t--pheno brge.phe"                                                                  
[19] "\t--mpheno 6"                                                                        
[20] "\t--covar brge.phe"                                                                  
[21] "\t--covar-name gender,age"                                                           
[22] "\t--noweb"                                                                           
[23] "\t--out /tmp/ssh-7484/out"                                                           
[24] ""                                                                                   
[25] "Reading map (extended format) from [ brge.bim ] "                                   
[26] "100000 markers to be included from [ brge.bim ]"                                    
[27] "Reading pedigree information from [ brge.fam ] "                                    
[28] "2312 individuals read from [ brge.fam ] "                                           
[29] "2312 individuals with nonmissing phenotypes"                                        
[30] "Assuming a disease phenotype (1=unaff, 2=aff, 0=miss)"                              
[31] "Missing phenotype value is also -9"                                                 
[32] "725 cases, 1587 controls and 0 missing"                                             
[33] "1097 males, 1215 females, and 0 of unspecified sex"                                 
[34] "Reading genotype bitfile from [ brge.bed ] "                                        
[35] "Detected that binary PED file is v1.00 SNP-major mode"                              
[36] "Reading alternate phenotype from [ brge.phe ] "                                     
[37] "2312 individuals with non-missing alternate phenotype"                              
[38] "Assuming a disease phenotype (1=unaff, 2=aff, 0=miss)"                              
[39] "Missing phenotype value is also -9"                                                 
[40] "725 cases, 1587 controls and 0 missing"                                             
[41] "Reading 6 covariates from [ brge.phe ] with nonmissing values for 2199 individuals" 
[42] "Selected subset of 2 from 6 covariates"                                             
[43] "For these, nonmissing covariate values for 2312 individuals"                        
[44] "Before frequency and genotyping pruning, there are 100000 SNPs"                     
[45] "2312 founders and 0 non-founders found"                                             
[46] "6009 heterozygous haploid genotypes; set to missing"                                
[47] "Writing list of heterozygous haploid genotypes to [ /tmp/ssh-7484/out.hh ]"         
[48] "7 SNPs with no founder genotypes observed"                                          
[49] "Warning, MAF set to 0 for these SNPs (see --nonfounders)"                           
[50] "Writing list of these SNPs to [ /tmp/ssh-7484/out.nof ]"                            
[51] "Total genotyping rate in remaining individuals is 0.994408"                         
[52] "0 SNPs failed missingness test ( GENO > 1 )"                                        
[53] "0 SNPs failed frequency test ( MAF < 0 )"                                           
[54] "After frequency and genotyping pruning, there are 100000 SNPs"                      
[55] "After filtering, 725 cases, 1587 controls and 0 missing"                            
[56] "After filtering, 1097 males, 1215 females, and 0 of unspecified sex"                
[57] "Converting data to Individual-major format"                                         
[58] "Writing logistic model association results to [ /tmp/ssh-7484/out.assoc.logistic ] "
[59] ""                                                                                   
[60] "Analysis finished: Mon Apr  4 15:57:15 2022"                                        
[61] ""                                                                                   

$error
character(0)

$command
[1] "cd /home/master/brge && plink1 --bfile brge --logistic --pheno brge.phe --mpheno 6 --covar brge.phe --covar-name gender,age --noweb --out /tmp/ssh-7484/out"

attr(,"class")
[1] "resource.exec"
datashield.logout(conns)