3 Dealing with omic data in Bioconductor

This chapter offers a summary of the main data structures that are implemented in Bioconductor for dealing with genomic, transcriptomic, methylomic and exposomic data. The structures are objects to which methods are applied. Omic data is typically composed of three datasets: one containing the actual high-dimensional data of omic variables per individuals, annotation data that specifies the characteristics of the variables and phenotypic information that encodes the subject’s traits of interest, covariates and sampling characteristics. For instance, transcriptomic data is stored in a ExpressionSet object, which is a data structure that contains the transcription values of individuals at each transcription probe, the genomic information for the transcription probes and the phenotypes of the individuals. Specific data is accessed, processed and analyzed with specific functions from diverse packages, conceived as methods acting on the ExpressionSet object. The aim of this chapter is then to introduce the specific omic objects available in Bioconductor. In the following chapters, we will introduce the packages that have been implemented to process and analyze these objects.

The R code to reproduce the results and figures of this chapter can be found here.