Introduction to ribiosExpression

Introduction

The ribiosExpression package provides data structures and utility functions for gene expression analysis. It extends the Biobase ExpressionSet class with tools for:

  • Study design and contrasts: The DesignContrast class encapsulates design matrices, contrast matrices, and grouping information commonly used in differential expression analysis with limma.
  • I/O operations: Import and export expression data in GCT/CLS formats, tab-delimited files, and GMT gene set formats.
  • Probeset summarization and filtering: Collapse multiple probesets per gene and filter by summary statistics.
  • Expression data transformation: Convert expression matrices and ExpressionSet objects to long-format data frames for downstream analysis and visualization.

Installation

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("ribiosExpression")

Quick start

Loading the package

library(ribiosExpression)
library(Biobase)
#> Loading required package: BiocGenerics
#> Loading required package: generics
#> 
#> Attaching package: 'generics'
#> The following objects are masked from 'package:base':
#> 
#>     as.difftime, as.factor, as.ordered, intersect, is.element, setdiff,
#>     setequal, union
#> 
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:stats':
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#> 
#>     anyDuplicated, aperm, append, as.data.frame, basename, cbind,
#>     colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
#>     get, grep, grepl, is.unsorted, lapply, Map, mapply, match, mget,
#>     order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
#>     rbind, Reduce, rownames, sapply, saveRDS, table, tapply, unique,
#>     unsplit, which.max, which.min
#> Welcome to Bioconductor
#> 
#>     Vignettes contain introductory material; view with
#>     'browseVignettes()'. To cite Bioconductor, see
#>     'citation("Biobase")', and for packages 'citation("pkgname")'.

Working with DesignContrast objects

The DesignContrast class is a central data structure for representing study designs and contrasts. You can create one directly or parse it from strings:

## One-way ANOVA design from strings
dc <- parseDesignContrast(
  sampleGroups = "Control,Treatment,Control,Treatment,Control,Treatment",
  groupLevels = "Control,Treatment",
  dispLevels = "Ctrl,Trt",
  contrasts = "Treatment-Control"
)
dc
#> DesignContrast object:
#> - 6 samples in 2 groups
#>     Levels: Control, Treatment
#> - Design matrix (6 samples x 2 variables)
#>     Variables: Control, Treatment
#>   Call 'designMatrix(object)' to get the design matrix.
#> - Contrast matrix (2 variables x 1 contrasts)
#>     Contrasts: Treatment-Control
#>   Call 'contrastMatrix(object)' to get the contrast matrix.
#>   Call 'contrastAnnotation(object) to get the contrast annotation.

Access the components:

designMatrix(dc)
#>   Control Treatment
#> 1       1         0
#> 2       0         1
#> 3       1         0
#> 4       0         1
#> 5       1         0
#> 6       0         1
#> attr(,"assign")
#> [1] 1 1
#> attr(,"contrasts")
#> attr(,"contrasts")$groups
#> [1] "contr.treatment"
contrastMatrix(dc)
#>            Contrasts
#> Levels      Treatment-Control
#>   Control                  -1
#>   Treatment                 1
groups(dc)
#> [1] Control   Treatment Control   Treatment Control   Treatment
#> Levels: Control Treatment

Building from design and contrast matrices

myFac <- gl(3, 3, labels = c("baseline", "treat1", "treat2"))
myDesign <- model.matrix(~myFac)
colnames(myDesign) <- c("baseline", "treat1", "treat2")
myContrast <- limma::makeContrasts(
  contrasts = c("treat1", "treat2"),
  levels = myDesign
)
dc2 <- DesignContrast(myDesign, myContrast, groups = myFac)
dc2
#> DesignContrast object:
#> - 9 samples in 3 groups
#>     Levels: baseline, treat1, treat2
#> - Design matrix (9 samples x 3 variables)
#>     Variables: baseline, treat1, treat2
#>   Call 'designMatrix(object)' to get the design matrix.
#> - Contrast matrix (3 variables x 2 contrasts)
#>     Contrasts: treat1, treat2
#>   Call 'contrastMatrix(object)' to get the contrast matrix.
#>   Call 'contrastAnnotation(object) to get the contrast annotation.

Visualizing the design

plot(dc2, title = "Example Design")

Reading and writing expression data

Read an expression matrix into an ExpressionSet

idir <- system.file("extdata", package = "ribiosExpression")
eset <- readExprsMatrix(file.path(idir, "sample_eset_exprs.txt"))
eset
#> ExpressionSet (storageMode: lockedEnvironment)
#> assayData: 500 features, 26 samples 
#>   element names: exprs 
#> protocolData: none
#> phenoData: none
#> featureData: none
#> experimentData: use 'experimentData(object)'
#> Annotation:

Read GCT/CLS files

gct_eset <- readGctCls(file.base = file.path(idir, "test"))
gct_eset
#> ExpressionSet (storageMode: lockedEnvironment)
#> assayData: 500 features, 26 samples 
#>   element names: exprs 
#> protocolData: none
#> phenoData
#>   sampleNames: A B ... Z (26 total)
#>   varLabels: cls
#>   varMetadata: labelDescription
#> featureData
#>   featureNames: AFFX-MurIL2_at AFFX-MurIL10_at ... 31739_at (500 total)
#>   fvarLabels: desc
#>   fvarMetadata: labelDescription
#> experimentData: use 'experimentData(object)'
#> Annotation:

Write ExpressionSet to files

exprs_file <- tempfile(fileext = ".tsv")
writeEset(eset, exprs_file, exprs.file.format = "tsv")

Transforming expression data to long format

data(ribios.ExpressionSet)
longTbl <- eSetToLongTable(ribios.ExpressionSet[1:5, 1:3])
head(longTbl)
#>      exprs         ProbeID GeneID GeneSymbol isSingleGeneID      Chip    sex
#> 1 192.7420            <NA>     NA       <NA>             NA      <NA> Female
#> 2  97.1370 AFFX-MurIL10_at  16153       Il10           TRUE HG_U95AV2 Female
#> 3  45.8192  AFFX-MurIL4_at  16189        Il4           TRUE HG_U95AV2 Female
#> 4  22.5445  AFFX-MurFAS_at  14102        Fas           TRUE HG_U95AV2 Female
#> 5  96.7875            <NA>     NA       <NA>             NA      <NA> Female
#> 6  85.7533            <NA>     NA       <NA>             NA      <NA>   Male
#>      type score
#> 1 Control  0.75
#> 2 Control  0.75
#> 3 Control  0.75
#> 4 Control  0.75
#> 5 Control  0.75
#> 6    Case  0.40

Probeset summarization

When multiple probesets map to the same gene, you can summarize them:

data(ribios.ExpressionSet)
summarized <- summarizeProbesets(
  ribios.ExpressionSet,
  index.name = "GeneID",
  fun = mean
)
summarized
#> ExpressionSet (storageMode: lockedEnvironment)
#> assayData: 329 features, 26 samples 
#>   element names: exprs, se.exprs 
#> protocolData: none
#> phenoData
#>   sampleNames: A B ... Z (26 total)
#>   varLabels: sex type score
#>   varMetadata: labelDescription
#> featureData
#>   featureNames: 20 32 ... 100128124 (329 total)
#>   fvarLabels: ProbeID GeneID ... Chip (5 total)
#>   fvarMetadata: labelDescription
#> experimentData: use 'experimentData(object)'
#> Annotation: hgu95av2

Filtering probesets

Keep only the probeset with the maximum variance per gene:

filtered <- keepMaxStatProbe(
  ribios.ExpressionSet,
  probe.index.name = "GeneID",
  stat = sd,
  na.rm = TRUE
)
filtered
#> ExpressionSet (storageMode: lockedEnvironment)
#> assayData: 453 features, 26 samples 
#>   element names: exprs, se.exprs 
#> protocolData: none
#> phenoData
#>   sampleNames: A B ... Z (26 total)
#>   varLabels: sex type score
#>   varMetadata: labelDescription
#> featureData
#>   featureNames: AFFX-MurIL2_at AFFX-MurIL10_at ... 31739_at (453 total)
#>   fvarLabels: ProbeID GeneID ... Chip (5 total)
#>   fvarMetadata: labelDescription
#> experimentData: use 'experimentData(object)'
#> Annotation: hgu95av2

Session info

sessionInfo()
#> R version 4.6.0 (2026-04-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] Biobase_2.73.1         BiocGenerics_0.59.1    generics_0.1.4        
#> [4] ribiosExpression_1.3.5 BiocStyle_2.41.0      
#> 
#> loaded via a namespace (and not attached):
#>  [1] gtable_0.3.6           circlize_0.4.18        shape_1.4.6.1         
#>  [4] ggplot2_4.0.3          rjson_0.2.23           xfun_0.57             
#>  [7] bslib_0.10.0           ribiosArg_1.5.0        GlobalOptions_0.1.4   
#> [10] lattice_0.22-9         vctrs_0.7.3            tools_4.6.0           
#> [13] stats4_4.6.0           parallel_4.6.0         tibble_3.3.1          
#> [16] cluster_2.1.8.2        pkgconfig_2.0.3        Matrix_1.7-5          
#> [19] RColorBrewer_1.1-3     S7_0.2.2               S4Vectors_0.51.1      
#> [22] lifecycle_1.0.5        farver_2.1.2           compiler_4.6.0        
#> [25] statmod_1.5.1          codetools_0.2-20       ComplexHeatmap_2.29.0 
#> [28] clue_0.3-68            htmltools_0.5.9        sys_3.4.3             
#> [31] buildtools_1.0.0       sass_0.4.10            yaml_2.3.12           
#> [34] pillar_1.11.1          crayon_1.5.3           jquerylib_0.1.4       
#> [37] tidyr_1.3.2            mongolite_4.0.0        cachem_1.1.0          
#> [40] limma_3.69.0           ribiosAnnotation_3.8.0 iterators_1.0.14      
#> [43] foreach_1.5.2          tidyselect_1.2.1       zip_2.3.3             
#> [46] digest_0.6.39          stringi_1.8.7          dplyr_1.2.1           
#> [49] purrr_1.2.2            maketools_1.3.2        fastmap_1.2.0         
#> [52] grid_4.6.0             colorspace_2.1-2       cli_3.6.6             
#> [55] magrittr_2.0.5         scales_1.4.0           rmarkdown_2.31        
#> [58] matrixStats_1.5.0      ribiosIO_1.1.0         png_0.1-9             
#> [61] GetoptLong_1.1.1       openxlsx_4.2.8.1       evaluate_1.0.5        
#> [64] knitr_1.51             IRanges_2.47.0         ribiosUtils_1.7.9     
#> [67] ribiosPlot_1.3.0       doParallel_1.0.17      rlang_1.2.0           
#> [70] Rcpp_1.1.1-1.1         glue_1.8.1             BiocManager_1.30.27   
#> [73] jsonlite_2.0.0         R6_2.6.1