ds.PRS.RdGet Ploygenic Risk Score
ds.PRS(
resources,
pgs_id = NULL,
prs_table = NULL,
table = NULL,
table_id_column = NULL,
table_prs_name = NULL,
snp_threshold = 90,
snp_assoc = FALSE,
datasources = NULL
)list of all the VCF resources with biallelic genotype information. It is advised to
have one VCF resource per chromosome, a big VCF file with all the information is always slower
to use.
character (default NULL) ID of the PGS catalog to be used to calculate the polygenic risk score.
Polygenic Score ID & Name from https://www.pgscatalog.org/browse/scores/
character (default NULL) If not NULL, it is the name of the table (on the server(s))
that will be used to merge the PRS results (typically a phenotypes table).
character (default NULL) Argument only used when the table argument
is supplied, it corresponds to the column name of the table that contains the individual IDs to perform
the merge.
character (default NULL) If not NULL it's the name that will be
used to design the column names added to table. Read the details for further information.
numeric (default 90) Threshold to drop individuals. See details for
further information.
a list of DSConnection-class (default NULL) objects obtained after login
The function has no client return. The results are stored on the server(s)
This function resolves a list of resources subsetting them by the SNPs of risk, this does not ensure that all the SNPs of risk will be found on the data. From all the found SNPs of risk, if an individual has less than 'snp_threshold' (percetage) of SNPs with data, it will be dropped (SNP with no data is marked on the VCF as ./.). If an individual passes this threshold filter but still has SNPs with no data, those SNPs will be counted on the polygenic risk score as non-risk-alleles, to take this infomation into account, the number of SNPs with data for each individual is returned as 'n_snps'.
When using a user provided prs_table table instead of a PGS catalog ID to calculate the PRS, it is important to note that
the provided data.frame has to have a very strict structure regarding column names (order is not relevant). Please
follow one of this two schemas:
- Schema 1 (provide SNP positions):
+ "chr_name", "chr_position", "effect_allele", "reference_allele", "effect_weight"
- Schema 2 (provide SNP id's):
+ "rsID", "effect_allele", "reference_allele", "effect_weight"
It is important to note that this "effect_weight" corresponds to the beta value of the SNP (log(OR)).
As a rule of thumb, it is advised to use when possible the Schema 1 (provide SNP positions), as the implementation to subset the VCF files is miles faster.
Since the actual results of the PRS is sensitive information, the results are not returned to the client,
however they can be merged into a table on the server(s). The main use of that is to add the PRS results
to a phenotypes table and assess relationships between PRS scores and the phenotypes. This merge is performed
via the individuals ID, specified on the argument (table_id_column); the table is specified using
the argument table. When merging the results to a table, by default the column names will be:
- When using pgs_id:
+ prs_pgs_id
+ prs_nw_pgs_id
- When using prs_table:
+ prs_prs_custom_results
+ prs_nw_prs_custom_results
If another designation is desired, make use of the table_prs_name argument, which by default is
NULL. Note that this parameter only changes the tail
of the names, the columns added (2) will begin by prs_ and prs_nw_. This columns correspond to the
actual PRS calculated and the PRS without weights (or PRS where all weights equal 1).