Skip to content

Important note: The dsOMOPClient package serves as a client-side interface with DataSHIELD servers, facilitating the retrieval of tables from OMOP CDM databases based on resources and integrating such tables into the DataSHIELD workflow. However, users will need to manually carry out the subsequent processes of transforming the resulting tables into a single dataset afterwards using the DataSHIELD base functions of dsBaseClient.

There are auxiliary packages that automate this integration, which is why we recommend taking a look at packages like dsOMOPHelper instead of directly using dsOMOPClient. We advise against the direct use of dsOMOPClient for regular researchers unless faced with an edge-case usage that requires the flexibility of operating directly with the dsOMOPClient data interface at a more primitive level, or for developers who are building their own packages on top of dsOMOPClient.

For more information on dsOMOPHelper, please visit its GitHub repository.

Prerequisites

Before using dsOMOPClient, it is recommended to have a basic understanding of:

This knowledge will help you better understand how to effectively query and work with OMOP CDM data through the DataSHIELD infrastructure.

1.1 Creating an interface object

The ds.omop function creates an interface object that allows users to interact with the OMOP CDM database based on a resource. We can use this object to obtain tables from the database by applying the desired filters and querying data catalogs for information present in the database.

In this example, we will be using the MIMIC IV data available on the OBiBa’s public Opal demo server. This server is publicly accessible, so all users are able to reproduce the examples of this guide by executing the same commands in their R session. The access credentials are:

  • Server URL: https://opal-demo.obiba.org
  • User: dsuser
  • Password: P@ssw0rd
  • Profile: omop

First, we will establish a connection to the demo server using DSI with the provided credentials:

library(DSI)
library(DSOpal)
library(dsBaseClient)
library(dsOMOPClient)

builder <- newDSLoginBuilder()
builder$append(
  server = "opal_demo",
  url = "https://opal-demo.obiba.org",
  user = "dsuser",
  password = "P@ssw0rd",
  profile = "omop"
)
logindata <- builder$build()
conns <- datashield.login(logins = logindata)

Once we have successfully established a connection with the server, we will create the object with ds.omop. The function requires the following parameters:

  • connections: A list of established DataSHIELD connections.
  • resource: The name of the resource of the OMOP CDM database in the DataSHIELD server.

Our server contains the database connection resource under the name mimiciv within the omop_demo project. Therefore, we need to specify that, from the connection we have established, we want to take the omop_demo.mimiciv resource.

o <- ds.omop(
  connections = conns,
  resource = "omop_demo.mimiciv"
)

1.1.1 Establishing multiple connections

If we wish to establish connections with multiple OMOP CDM databases simultaneously in a connections pool, the resource parameter accepts a named list as a parameter where the name of the list object corresponds to the server name, and the value corresponds to the resource name. For example:

o <- ds.omop(
  connections = conns,
  resource = list(opal_demo = "omop_demo.mimiciv",
                  another_server = "project_name.resource_name")
)

1.2 Interface object functions

The ds.omop interface object provides several functions to interact with the OMOP CDM database. These functions allow users to query information from the database and extract tables, which will then be transformed and integrated into the DataSHIELD workflow.

1.2.1 Querying information

  • tables(): This function lists all the available tables in the OMOP CDM database. It allows users to understand the structure of the database and identify which tables might be relevant for their analysis.
o$tables()
## $opal_demo
##  [1] "attribute_definition" "care_site"            "cdm_source"          
##  [4] "cohort"               "cohort_attribute"     "cohort_definition"   
##  [7] "concept"              "concept_relationship" "condition_era"       
## [10] "condition_occurrence" "cost"                 "death"               
## [13] "device_exposure"      "dose_era"             "drug_era"            
## [16] "drug_exposure"        "fact_relationship"    "location"            
## [19] "measurement"          "metadata"             "note"                
## [22] "note_nlp"             "observation"          "observation_period"  
## [25] "payer_plan_period"    "person"               "procedure_occurrence"
## [28] "provider"             "specimen"             "visit_detail"        
## [31] "visit_occurrence"     "vocabulary"
  • columns(tableName): Given a table name, this function returns the columns available in that table. This is useful for users to identify the specific data fields they might want to analyze or use in their queries.
o$columns("measurement")
## $opal_demo
##  [1] "measurement_id"                "person_id"                    
##  [3] "measurement_concept_id"        "measurement_date"             
##  [5] "measurement_datetime"          "measurement_time"             
##  [7] "measurement_type_concept_id"   "operator_concept_id"          
##  [9] "value_as_number"               "value_as_concept_id"          
## [11] "unit_concept_id"               "range_low"                    
## [13] "range_high"                    "provider_id"                  
## [15] "visit_occurrence_id"           "visit_detail_id"              
## [17] "measurement_source_value"      "measurement_source_concept_id"
## [19] "unit_source_value"             "value_source_value"
  • concepts(tableName): This function retrieves the concepts present in a given table and returns them as a data frame with two columns: concept_id and their associated concept_name. Concepts in OMOP CDM are standardized terms that represent clinical events, measurements, observations, and other entities. Understanding the concepts available in a table can help users to formulate data filtering queries.
o$concepts("measurement")
## $opal_demo
##    concept_id                                         concept_name
## 1           0                                  No matching concept
## 2     1175625                              Breath rate spontaneous
## 3     3000067   Parathyrin.intact [Mass/volume] in Serum or Plasma
## 4     3000068                        oxyCODONE [Presence] in Urine
## 5     3000099    Nuclear Ab [Units/volume] in Serum by Immunoassay
## 6     3000285                       Sodium [Moles/volume] in Blood
## 7     3000330              Specific gravity of Urine by Test strip
## 8     3000348 Leukocyte esterase [Presence] in Urine by Test strip
## 9     3000456   Dacrocytes [Presence] in Blood by Light microscopy
## 10    3000461                  Pressure support setting Ventilator
##  [ reached 'max' / getOption("max.print") -- omitted 343 rows ]

The numeric values in the concept_id column are the standardized identifiers for the concepts in the OMOP CDM. For example, as we can observe in the table above, the concept 1175625 refers to the concept Breath rate spontaneous. We will use these identifiers to filter the data in the following sections.

1.2.2 Retrieving tables

  • get(tableName): This function enables users to extract a specific table from the OMOP CDM database. The extracted table becomes available within the DataSHIELD environment, where it can be manipulated using other DataSHIELD functions and potentially combined with other tables from the database. Users have the flexibility to apply filters and specify columns to customize the extracted data according to their research needs. Below are some examples of how to use the get function:

Getting a complete table

We simply specify the desired table’s name as a string in the get function:

o$get("person")

# We can use the `ds.summary` function to get a summary of the retrieved table
ds.summary("person")
## $opal_demo
## $opal_demo$class
## [1] "data.frame"
## 
## $opal_demo$`number of rows`
## [1] 100
## 
## $opal_demo$`number of columns`
## [1] 11
## 
## $opal_demo$`variables held`
##  [1] "person_id"            "gender_concept_id"    "year_of_birth"       
##  [4] "month_of_birth"       "day_of_birth"         "birth_datetime"      
##  [7] "race_concept_id"      "ethnicity_concept_id" "location_id"         
## [10] "provider_id"          "care_site_id"

However, this is only advised for small tables or in contexts where we need the full table, as it can be memory-intensive and slow to retrieve. For larger tables, it is recommended to apply filters to the get function to retrieve a specific subset of the table.

Getting a filtered table

We can apply filters to the get function to retrieve a specific subset of the table, which is the recommended approach for larger tables, such as the Measurement table. For example, we can set the parameters of the get function to retrieve only specific columns (in this case, value_as_number and measurement_date) and specific measurement types (in this case, we want to use the concept IDs of Heart rate and Body weight, which are 3027018 and 3025315):

o$get(table = "measurement",
      columnFilter = c("value_as_number", "measurement_date"),
      conceptFilter = c(3027018, 3025315))

ds.summary("measurement")
## $opal_demo
## $opal_demo$class
## [1] "data.frame"
## 
## $opal_demo$`number of rows`
## [1] 100
## 
## $opal_demo$`number of columns`
## [1] 5
## 
## $opal_demo$`variables held`
## [1] "person_id"                    "heart_rate.measurement_date" 
## [3] "heart_rate.value_as_number"   "body_weight.measurement_date"
## [5] "body_weight.value_as_number"

As we can observe, the resulting table contains only the information about the measurements of Heart rate and Body weight of all patients in the database.

These functions form the core of the ds.omop interface object’s capabilities, which enables users to interact with OMOP CDM databases within the DataSHIELD environment. It is expected that the various resulting tables will be relationally joined using the ds.merge function through their common identifiers, such as person_id.

In the next article, we will explore the internal workings of the table processing operations, understanding how to manipulate and combine tables into single datasets ready for analysis, along with a deeper dive into the customization possibilities of the get function and its potential applications.