4 Opal
4.1 Introduction
Opal is OBiBa’s core database application for epidemiological studies. Participant data, collected by questionnaires, medical instruments, sensors, administrative databases etc. can be integrated and stored in a central data repository under a uniform model. Opal is a web application that can import, process, copy data and has advanced features for cataloging the data (fully described, annotated and searchable data dictionaries) as recommended by the Maelstrom Research group at McGill University, Canada. Opal is typically used by a research center to analyze the data acquired from assessment centres. Its ultimate purpose is to achieve seamless data-sharing among epidemiological studies. Opal is the reference implementation of the DataSHIELD infrastructure. More information on Opal can be found in the Opal description on OBiBa.
Opal provides the following main features:
- Use of MongoDB, Mysql, MariaDB and/or PostgreSQL as database software backends,
- Import of data from various file formats (CSV, SPSS, SAS, Stata etc.) and from SQL databases,
- Export of data to various file formats (CSV, SPSS, SAS, Stata etc.) and to SQL databases,
- Plugin architecture to extend import/export capabilities, for instance by connecting to data source software such as REDCap, LimeSurvey etc.,
- Storage of data about any type of “entity”, such as subject, sample, geographic area, etc.,
- Storage of data of any type (e.g., texts, numbers, geo-localisation, images, videos, etc.),
- Advanced authentication and authorization features,
- Reporting using R markdown,
- Data analysis plugins using R,
- Web services can be accessed using R, Python, Java, Javascript,
- DataSHIELD middleware reference implementation (configuration, access controls, R session management).
4.2 Data Management
In Opal the data sets are represented by tables, which are grouped by projects. A table has variables (columns) and entity values (rows). Opal also has the concept of views, which are logical tables where the variables are derived from physical tables via scripts. The storage of the data and of the meta-data (data dictionaries) is managed in a database (for example, a SQL database such as MySQL, MariaDB or PostgreSQL, or a document-oriented database such as MongoDB).
Detailed concepts and tutorials for tables can be found here:
4.3 Security
All Opal operations are accessible through web services that require authentication and proper authorization. The permissions can be granted to a specific user or a group of users, can be applied to a project or to a table and have different levels: read-only meta-data (access to the data dictionary without access to the individual-level data), read-only, or write permissions. The programmatic authentication can make use of username/password credentials, token or 2-way SSL authentication methods. Opal can also integrate with the hosting institution’s users registry using the OpenID Connect standard.
4.4 R Integration
Opal connects to a R server to perform different kind of operations:
- data import/export (using R packages),
- data analysis (by transfering data from Opal’s database into a R server session and using R packages).
The R server is based on the Rserve R package. The user R sessions that are running in this R server are managed by Opal.
This Opal/R integration works well for small to mid-size datasets (usually less than 10M data points). For bigger datasets, extracting and transferring data from the database to the R server is time, CPU and memory intensive. In this work, we will present a more flexible data description paradigm called resources that enables Opal to manage access to Big Data sets, complex data structures and computation units for analysis purpose, while still having the security and the analysis features provided by Opal.
4.5 Opal demo site
We have set up an Opal demo site to illustrate how to perform some basic analyses using DataSHIELD as well as how to deal with different resources for ’omic data. The Opal server can be accessed with the credentials:
- username:
administrator
- password:
password
In this figure we can see all the projects available.
This vignette will mainly make use of the resources available at RSRC
project
In order to make the reader familiar with Opal we recommend visiting the Opal online documentation.