1. Creating an interface object
01_object.Rmd
Important note: The dsOMOPClient
package serves as a client-side interface with DataSHIELD servers,
facilitating the retrieval of tables from OMOP CDM databases based on
resources and integrating such tables into the DataSHIELD workflow.
However, users will need to manually carry out the subsequent processes
of transforming the resulting tables into a single dataset afterwards
using the DataSHIELD base functions of dsBaseClient
.
There are auxiliary packages that automate this integration, which is
why we recommend taking a look at packages like
dsOMOPHelper
instead of directly using
dsOMOPClient
. We advise against the direct use of
dsOMOPClient
for regular researchers unless faced with an
edge-case usage that requires the flexibility of operating directly with
the dsOMOPClient
data interface at a more primitive level,
or for developers who are building their own packages on top of
dsOMOPClient
.
For more information on dsOMOPHelper
, please visit its GitHub
repository.
Prerequisites
Before using dsOMOPClient
, it is recommended to have a
basic understanding of:
The OMOP CDM structure and its standardized clinical data format. You can learn more about OMOP CDM in the OHDSI Book chapter ‘The Common Data Model’.
OMOP Vocabularies and how they standardize medical concepts (like diagnoses, medications, procedures) across different coding systems (ICD-9, ICD-10, SNOMED CT, etc.) into a common representation. The OHDSI Book chapter ‘Standardized Vocabularies’ provides a comprehensive overview of this standardization process.
Basic DataSHIELD concepts and workflow. The DataSHIELD Beginner’s Tutorial is a good starting point.
This knowledge will help you better understand how to effectively query and work with OMOP CDM data through the DataSHIELD infrastructure.
1.1 Creating an interface object
The ds.omop
function creates an interface object that
allows users to interact with the OMOP CDM database based on a resource.
We can use this object to obtain tables from the database by applying
the desired filters and querying data catalogs for information present
in the database.
In this example, we will be using the MIMIC IV data available on the OBiBa’s public Opal demo server. This server is publicly accessible, so all users are able to reproduce the examples of this guide by executing the same commands in their R session. The access credentials are:
- Server URL:
https://opal-demo.obiba.org
- User:
dsuser
- Password:
P@ssw0rd
- Profile:
omop
First, we will establish a connection to the demo server using
DSI
with the provided credentials:
library(DSI)
library(DSOpal)
library(dsBaseClient)
library(dsOMOPClient)
builder <- newDSLoginBuilder()
builder$append(
server = "opal_demo",
url = "https://opal-demo.obiba.org",
user = "dsuser",
password = "P@ssw0rd",
profile = "omop"
)
logindata <- builder$build()
conns <- datashield.login(logins = logindata)
Once we have successfully established a connection with the server,
we will create the object with ds.omop
. The function
requires the following parameters:
-
connections
: A list of established DataSHIELD connections. -
resource
: The name of the resource of the OMOP CDM database in the DataSHIELD server.
Our server contains the database connection resource under the name
mimiciv
within the omop_demo
project.
Therefore, we need to specify that, from the connection we have
established, we want to take the omop_demo.mimiciv
resource.
o <- ds.omop(
connections = conns,
resource = "omop_demo.mimiciv"
)
1.1.1 Establishing multiple connections
If we wish to establish connections with multiple OMOP CDM databases
simultaneously in a connections
pool, the
resource
parameter accepts a named list as a parameter
where the name of the list object corresponds to the server name, and
the value corresponds to the resource name. For example:
1.2 Interface object functions
The ds.omop
interface object provides several functions
to interact with the OMOP CDM database. These functions allow users to
query information from the database and extract tables, which will then
be transformed and integrated into the DataSHIELD workflow.
1.2.1 Querying information
-
tables()
: This function lists all the available tables in the OMOP CDM database. It allows users to understand the structure of the database and identify which tables might be relevant for their analysis.
o$tables()
## $opal_demo
## [1] "attribute_definition" "care_site" "cdm_source"
## [4] "cohort" "cohort_attribute" "cohort_definition"
## [7] "concept" "concept_relationship" "condition_era"
## [10] "condition_occurrence" "cost" "death"
## [13] "device_exposure" "dose_era" "drug_era"
## [16] "drug_exposure" "fact_relationship" "location"
## [19] "measurement" "metadata" "note"
## [22] "note_nlp" "observation" "observation_period"
## [25] "payer_plan_period" "person" "procedure_occurrence"
## [28] "provider" "specimen" "visit_detail"
## [31] "visit_occurrence" "vocabulary"
-
columns(tableName)
: Given a table name, this function returns the columns available in that table. This is useful for users to identify the specific data fields they might want to analyze or use in their queries.
o$columns("measurement")
## $opal_demo
## [1] "measurement_id" "person_id"
## [3] "measurement_concept_id" "measurement_date"
## [5] "measurement_datetime" "measurement_time"
## [7] "measurement_type_concept_id" "operator_concept_id"
## [9] "value_as_number" "value_as_concept_id"
## [11] "unit_concept_id" "range_low"
## [13] "range_high" "provider_id"
## [15] "visit_occurrence_id" "visit_detail_id"
## [17] "measurement_source_value" "measurement_source_concept_id"
## [19] "unit_source_value" "value_source_value"
-
concepts(tableName)
: This function retrieves the concepts present in a given table and returns them as a data frame with two columns:concept_id
and their associatedconcept_name
. Concepts in OMOP CDM are standardized terms that represent clinical events, measurements, observations, and other entities. Understanding the concepts available in a table can help users to formulate data filtering queries.
o$concepts("measurement")
## $opal_demo
## concept_id concept_name
## 1 0 No matching concept
## 2 1175625 Breath rate spontaneous
## 3 3000067 Parathyrin.intact [Mass/volume] in Serum or Plasma
## 4 3000068 oxyCODONE [Presence] in Urine
## 5 3000099 Nuclear Ab [Units/volume] in Serum by Immunoassay
## 6 3000285 Sodium [Moles/volume] in Blood
## 7 3000330 Specific gravity of Urine by Test strip
## 8 3000348 Leukocyte esterase [Presence] in Urine by Test strip
## 9 3000456 Dacrocytes [Presence] in Blood by Light microscopy
## 10 3000461 Pressure support setting Ventilator
## [ reached 'max' / getOption("max.print") -- omitted 343 rows ]
The numeric values in the concept_id
column are the
standardized identifiers for the concepts in the OMOP CDM. For example,
as we can observe in the table above, the concept 1175625
refers to the concept Breath rate spontaneous
. We will use
these identifiers to filter the data in the following sections.
1.2.2 Retrieving tables
-
get(tableName)
: This function enables users to extract a specific table from the OMOP CDM database. The extracted table becomes available within the DataSHIELD environment, where it can be manipulated using other DataSHIELD functions and potentially combined with other tables from the database. Users have the flexibility to apply filters and specify columns to customize the extracted data according to their research needs. Below are some examples of how to use theget
function:
Getting a complete table
We simply specify the desired table’s name as a string in the
get
function:
o$get("person")
# We can use the `ds.summary` function to get a summary of the retrieved table
ds.summary("person")
## $opal_demo
## $opal_demo$class
## [1] "data.frame"
##
## $opal_demo$`number of rows`
## [1] 100
##
## $opal_demo$`number of columns`
## [1] 11
##
## $opal_demo$`variables held`
## [1] "person_id" "gender_concept_id" "year_of_birth"
## [4] "month_of_birth" "day_of_birth" "birth_datetime"
## [7] "race_concept_id" "ethnicity_concept_id" "location_id"
## [10] "provider_id" "care_site_id"
However, this is only advised for small tables or in contexts where
we need the full table, as it can be memory-intensive and slow to
retrieve. For larger tables, it is recommended to apply filters to the
get
function to retrieve a specific subset of the
table.
Getting a filtered table
We can apply filters to the get
function to retrieve a
specific subset of the table, which is the recommended approach for
larger tables, such as the Measurement
table. For example,
we can set the parameters of the get
function to retrieve
only specific columns (in this case, value_as_number
and
measurement_date
) and specific measurement types (in this
case, we want to use the concept IDs of Heart rate
and
Body weight
, which are 3027018
and
3025315
):
o$get(table = "measurement",
columnFilter = c("value_as_number", "measurement_date"),
conceptFilter = c(3027018, 3025315))
ds.summary("measurement")
## $opal_demo
## $opal_demo$class
## [1] "data.frame"
##
## $opal_demo$`number of rows`
## [1] 100
##
## $opal_demo$`number of columns`
## [1] 5
##
## $opal_demo$`variables held`
## [1] "person_id" "heart_rate.measurement_date"
## [3] "heart_rate.value_as_number" "body_weight.measurement_date"
## [5] "body_weight.value_as_number"
As we can observe, the resulting table contains only the information
about the measurements of Heart rate
and
Body weight
of all patients in the database.
These functions form the core of the ds.omop
interface
object’s capabilities, which enables users to interact with OMOP CDM
databases within the DataSHIELD environment. It is expected that the
various resulting tables will be relationally joined using the
ds.merge
function through their common identifiers, such as
person_id
.
In the next article, we will explore the internal workings of the
table processing operations, understanding how to manipulate and combine
tables into single datasets ready for analysis, along with a deeper dive
into the customization possibilities of the get
function
and its potential applications.