Get Ploygenic Risk Score

ds.PRS(
  resources,
  pgs_id = NULL,
  prs_table = NULL,
  table = NULL,
  table_id_column = NULL,
  table_prs_name = NULL,
  snp_threshold = 90,
  snp_assoc = FALSE,
  datasources = NULL
)

Arguments

resources: list of all the VCF resources with biallelic genotype information. It is advised to have one VCF resource per chromosome, a big VCF file with all the information is always slower to use.
pgs_id: character (default NULL) ID of the PGS catalog to be used to calculate the polygenic risk score. Polygenic Score ID & Name from https://www.pgscatalog.org/browse/scores/
table: character (default NULL) If not NULL, it is the name of the table (on the server(s)) that will be used to merge the PRS results (typically a phenotypes table).
table_id_column: character (default NULL) Argument only used when the table argument is supplied, it corresponds to the column name of the table that contains the individual IDs to perform the merge.
table_prs_name: character (default NULL) If not NULL it's the name that will be used to design the column names added to table. Read the details for further information.
snp_threshold: numeric (default 90) Threshold to drop individuals. See details for further information.
datasources: a list of DSConnection-class (default NULL) objects obtained after login

Value

The function has no client return. The results are stored on the server(s)

Details

This function resolves a list of resources subsetting them by the SNPs of risk, this does not ensure that all the SNPs of risk will be found on the data. From all the found SNPs of risk, if an individual has less than 'snp_threshold' (percetage) of SNPs with data, it will be dropped (SNP with no data is marked on the VCF as ./.). If an individual passes this threshold filter but still has SNPs with no data, those SNPs will be counted on the polygenic risk score as non-risk-alleles, to take this infomation into account, the number of SNPs with data for each individual is returned as 'n_snps'.

When using a user provided prs_table table instead of a PGS catalog ID to calculate the PRS, it is important to note that the provided data.frame has to have a very strict structure regarding column names (order is not relevant). Please follow one of this two schemas:
- Schema 1 (provide SNP positions):
+ "chr_name", "chr_position", "effect_allele", "reference_allele", "effect_weight"

- Schema 2 (provide SNP id's):
+ "rsID", "effect_allele", "reference_allele", "effect_weight"

It is important to note that this "effect_weight" corresponds to the beta value of the SNP (log(OR)).

As a rule of thumb, it is advised to use when possible the Schema 1 (provide SNP positions), as the implementation to subset the VCF files is miles faster.

Since the actual results of the PRS is sensitive information, the results are not returned to the client, however they can be merged into a table on the server(s). The main use of that is to add the PRS results to a phenotypes table and assess relationships between PRS scores and the phenotypes. This merge is performed via the individuals ID, specified on the argument (table_id_column); the table is specified using the argument table. When merging the results to a table, by default the column names will be:
- When using pgs_id:
+ prs_pgs_id
+ prs_nw_pgs_id
- When using prs_table:
+ prs_prs_custom_results
+ prs_nw_prs_custom_results

If another designation is desired, make use of the table_prs_name argument, which by default is NULL. Note that this parameter only changes the tail of the names, the columns added (2) will begin by prs_ and prs_nw_. This columns correspond to the actual PRS calculated and the PRS without weights (or PRS where all weights equal 1).