| Title: | Data Structures and Utilities for Gene Expression Analysis |
|---|---|
| Description: | Provides data structures and utility functions for gene expression analysis. It includes the DesignContrast class for representing study designs and contrasts used in differential expression analysis, functions for importing and exporting expression data in GCT/CLS formats, tools for probeset summarization and filtering, and interfaces to limma-based differential gene expression workflows. The package works with Biobase ExpressionSet objects and integrates with the limma framework. |
| Authors: | Jitao David Zhang [aut, cre] (ORCID: <https://orcid.org/0000-0002-3085-0909>) |
| Maintainer: | Jitao David Zhang <[email protected]> |
| License: | GPL-3 |
| Version: | 1.3.5 |
| Built: | 2026-05-16 08:34:13 UTC |
| Source: | https://github.com/bedapub/ribiosExpression |
The function annotates an object of eSet, or a vector of
characters representing probesets.
annotate(object, target, check.target, ...)annotate(object, target, check.target, ...)
object |
An object of |
target |
Chip type to be annotated |
check.target |
Logical, with |
... |
Currently not implemented |
Once successfully annotated, the annotation slot of the
eSet object is set to the value of target.
An eSet, or a data.frame containing
annotation information of the probesets.
Jitao David Zhang <[email protected]>
data(ribios.ExpressionSet) myset <- ribios.ExpressionSet[100:105,] ## eSet ## Not run: annotate(myset, "HG_U95AV2") annotate(myset, "HG_U_95AV2", check.target=TRUE) ## End(Not run) ## characters ## Not run: annotate(featureNames(myset), "HG_U95AV2") ## End(Not run)data(ribios.ExpressionSet) myset <- ribios.ExpressionSet[100:105,] ## eSet ## Not run: annotate(myset, "HG_U95AV2") annotate(myset, "HG_U_95AV2", check.target=TRUE) ## End(Not run) ## characters ## Not run: annotate(featureNames(myset), "HG_U95AV2") ## End(Not run)
Assert whether a matrix is of full rank numerically
assertFullRank(matrix)assertFullRank(matrix)
matrix |
Numeric matrix |
If not full rank, the function stops. Otherwise, an invisible TRUE is returned
myMat <- matrix(c(1,1,1,0,1,1), ncol=2, byrow=FALSE) assertFullRank(myMat)myMat <- matrix(c(1,1,1,0,1,1), ncol=2, byrow=FALSE) assertFullRank(myMat)
Extract the contrast annotation data.frame from an object
contrastAnnotation(object) ## S4 method for signature 'DesignContrast' contrastAnnotation(object) ## S4 replacement method for signature 'DesignContrast' contrastAnnotation(object) <- valuecontrastAnnotation(object) ## S4 method for signature 'DesignContrast' contrastAnnotation(object) ## S4 replacement method for signature 'DesignContrast' contrastAnnotation(object) <- value
object |
An object, see supported methods below |
value |
A contrast annotation data.frame |
A data.frame annotating contrasts
contrastAnnotation(DesignContrast): Return the contrast annotation data.frame
from a DesignContrast object
contrastAnnotation(DesignContrast) <- value: Assign a contrast annotation data.frame
to a contrastContrast object
Assign contrast annotation to an object
contrastAnnotation(object) <- valuecontrastAnnotation(object) <- value
object |
An object, see supported methods below |
value |
Contrast annotation data.frame |
The modified object
Extract the contrast matrix from an object
contrastMatrix(object) ## S4 method for signature 'DesignContrast' contrastMatrix(object) ## S4 replacement method for signature 'DesignContrast' contrastMatrix(object) <- value ## S4 method for signature 'MArrayLM' contrastMatrix(object)contrastMatrix(object) ## S4 method for signature 'DesignContrast' contrastMatrix(object) ## S4 replacement method for signature 'DesignContrast' contrastMatrix(object) <- value ## S4 method for signature 'MArrayLM' contrastMatrix(object)
object |
An object, see supported methods below |
value |
A contrast matrix |
A numeric contrast matrix
contrastMatrix(DesignContrast): Return the contrast matrix from a DesignContrast
object
contrastMatrix(DesignContrast) <- value: Assign a contrast matrix to a contrastContrast
object
contrastMatrix(MArrayLM): Extract contrast matrix from an object of MArrayLM
Assign contrast matrix to an object
contrastMatrix(object) <- valuecontrastMatrix(object) <- value
object |
An object, see supported methods below |
value |
Contrast matrix |
The modified object
Extract contrastNames from an object
contrastNames(object) ## S4 method for signature 'DesignContrast' contrastNames(object) ## S4 method for signature 'MArrayLM' contrastNames(object)contrastNames(object) ## S4 method for signature 'DesignContrast' contrastNames(object) ## S4 method for signature 'MArrayLM' contrastNames(object)
object |
An object, see supported methods below |
A character vector of contrast names
contrastNames(DesignContrast): Return contrast names, i.e., column names of the
contrast matrix
contrastNames(MArrayLM): Extract contrast names from an object of MArrayLM
Return indices of samples involved in the given contrast of two or more coefficients
contrastSampleIndices(object, contrast) ## S4 method for signature 'DesignContrast,character' contrastSampleIndices(object, contrast) ## S4 method for signature 'DesignContrast,numeric' contrastSampleIndices(object, contrast)contrastSampleIndices(object, contrast) ## S4 method for signature 'DesignContrast,character' contrastSampleIndices(object, contrast) ## S4 method for signature 'DesignContrast,numeric' contrastSampleIndices(object, contrast)
object |
A |
contrast |
Either a contrast name or a integer indicating the index of the contrast |
An integer vector, indices of samples that are involved, sorted by the ascending order of the coefficients of the contrast
contrastSampleIndices(object = DesignContrast, contrast = character): Use character string to specify the contrast
contrastSampleIndices(object = DesignContrast, contrast = numeric): Use integer indices to specify the contrast
## one-way ANOVA myDesCon <- parseDesignContrast(sampleGroups="As,Be,As,Be,As,Be", groupLevels="Be,As", dispLevels="Beryllium,Arsenic", contrasts="As-Be") contrastSampleIndices(myDesCon, 1L) myInterDesCon <- DesignContrast( designMatrix=matrix(c(rep(1,6), rep(0,2), rep(1,2), rep(0,2), rep(0,4), rep(1,2)), nrow=6, byrow=FALSE), contrastMatrix=matrix(c(0,1,0, 0,0,1, 0,-1,1), byrow=FALSE, nrow=3), groups=factor(rep(c("As", "Be", "Cd"), each=2)), dispLevels=c("Arsenic", "Beryllium", "Cadmium")) cont1Ind <- contrastSampleIndices(myInterDesCon, 1L) cont2Ind <- contrastSampleIndices(myInterDesCon, 2L) cont3Ind <- contrastSampleIndices(myInterDesCon, 3L) stopifnot(identical(cont1Ind, 1:4)) stopifnot(identical(cont2Ind, c(1:2, 5:6))) stopifnot(identical(cont3Ind, c(3:6)))## one-way ANOVA myDesCon <- parseDesignContrast(sampleGroups="As,Be,As,Be,As,Be", groupLevels="Be,As", dispLevels="Beryllium,Arsenic", contrasts="As-Be") contrastSampleIndices(myDesCon, 1L) myInterDesCon <- DesignContrast( designMatrix=matrix(c(rep(1,6), rep(0,2), rep(1,2), rep(0,2), rep(0,4), rep(1,2)), nrow=6, byrow=FALSE), contrastMatrix=matrix(c(0,1,0, 0,0,1, 0,-1,1), byrow=FALSE, nrow=3), groups=factor(rep(c("As", "Be", "Cd"), each=2)), dispLevels=c("Arsenic", "Beryllium", "Cadmium")) cont1Ind <- contrastSampleIndices(myInterDesCon, 1L) cont2Ind <- contrastSampleIndices(myInterDesCon, 2L) cont3Ind <- contrastSampleIndices(myInterDesCon, 3L) stopifnot(identical(cont1Ind, 1:4)) stopifnot(identical(cont2Ind, c(1:2, 5:6))) stopifnot(identical(cont3Ind, c(3:6)))
Build a data.frame from two vectors of potential different lengths
dataFrameTwoVecs(vec1, vec2, col.names = c("Vec1", "Vec2"))dataFrameTwoVecs(vec1, vec2, col.names = c("Vec1", "Vec2"))
vec1 |
A vector |
vec2 |
Another vector |
col.names |
A character vector of length 2 giving column names of the output The shorter vector of the two are extended to the same length by appending empty strings. |
A data.frame of two columns. The row count matches the longer vector
dataFrameTwoVecs(LETTERS[1:5], letters[2:9])dataFrameTwoVecs(LETTERS[1:5], letters[2:9])
Infer groups from a design matrix
design2group(designMatrix)design2group(designMatrix)
designMatrix |
A design matrix |
A factor vector giving the groups inferred from the design matrix
A naive logic is used: samples of the same design vectors are of the same group.
The inference is known to fail when control variables, such as age or RIN numbers, vary between samples of the same group.
myDesign <- model.matrix(~gl(3,3)) design2group(myDesign)myDesign <- model.matrix(~gl(3,3)) design2group(myDesign)
Contrast a DesignContrast object
DesignContrast( designMatrix, contrastMatrix = NULL, groups = NULL, dispLevels = NULL, contrastAnnotation = NULL )DesignContrast( designMatrix, contrastMatrix = NULL, groups = NULL, dispLevels = NULL, contrastAnnotation = NULL )
designMatrix |
A design matrix |
contrastMatrix |
A contrast matrix. If null, no comparison can be done. |
groups |
A factor vector of the same length as the number of columns of the design matrix.
If missing, |
dispLevels |
A character vector of the same length as the number of levels encoded by |
contrastAnnotation |
A data.frame or NULL, annotating contrasts |
A DesignContrast object
myFac <- gl(3,3, labels=c("baseline", "treat1", "treat2")) myDesign <- model.matrix(~myFac) colnames(myDesign) <- c("baseline", "treat1", "treat2") myContrast <- limma::makeContrasts(contrasts=c("treat1", "treat2"), levels=myDesign) DesignContrast(myDesign, myContrast, groups=myFac) DesignContrast(myDesign, myContrast, groups=myFac, dispLevels=c("C", "T1", "T2"))myFac <- gl(3,3, labels=c("baseline", "treat1", "treat2")) myDesign <- model.matrix(~myFac) colnames(myDesign) <- c("baseline", "treat1", "treat2") myContrast <- limma::makeContrasts(contrasts=c("treat1", "treat2"), levels=myDesign) DesignContrast(myDesign, myContrast, groups=myFac) DesignContrast(myDesign, myContrast, groups=myFac, dispLevels=c("C", "T1", "T2"))
The DesignContrast class represents key information in a designed experiment
## S4 method for signature 'DesignContrast' show(object)## S4 method for signature 'DesignContrast' show(object)
object |
An object of |
show(DesignContrast): The show method
designA numeric matrix. The number of rows equals the sample size. The columns corresponds to the variables of design
contrastsA numeric matrix. The number of rows equals the number of columns in the design matrix. The columns corresponds to the comparisons one wishes to make.
groupsA factor vector, giving sample groups. The length equals the number of samples.
dispLevelsA character vector, used for displaying sample groups.
The length equals the number of levels of the groups factor.
contrastAnnotationA data.frame, used to annotate the contrasts.
Objects can be created by calls of the
function DesignContrast. However, the users should not directly call this
function. Instead, parseDesignContrast should be called.
Extract the design matrix from an object
designMatrix(object) ## S4 method for signature 'DesignContrast' designMatrix(object) ## S4 replacement method for signature 'DesignContrast' designMatrix(object) <- value ## S4 method for signature 'MArrayLM' designMatrix(object)designMatrix(object) ## S4 method for signature 'DesignContrast' designMatrix(object) ## S4 replacement method for signature 'DesignContrast' designMatrix(object) <- value ## S4 method for signature 'MArrayLM' designMatrix(object)
object |
An object, see supported methods below |
value |
A design matrix |
A numeric design matrix
designMatrix(DesignContrast): Return the design matrix from a DesignContrast
object
designMatrix(DesignContrast) <- value: Assign a design matrix to a DesignContrast
object
designMatrix(MArrayLM): Extract design matrix from an object of MArrayLM
Assign design matrix to an object
designMatrix(object) <- valuedesignMatrix(object) <- value
object |
An object, see supported methods below |
value |
Design matrix |
The modified object
Extract design variable names from an object
designVariables(object) ## S4 method for signature 'DesignContrast' designVariables(object)designVariables(object) ## S4 method for signature 'DesignContrast' designVariables(object)
object |
An object, see supported methods below |
A character vector of variable names
designVariables(DesignContrast): Return the names of variables (column names)
in the design matrix of a DesignContrast object
Extract displayed group labels from an object
dispGroups(object) ## S4 method for signature 'DesignContrast' dispGroups(object)dispGroups(object) ## S4 method for signature 'DesignContrast' dispGroups(object)
object |
An object, see supported methods below |
A factor of display group labels
dispGroups(DesignContrast): Return the sample groups from a DesignContrast object
, using display labels
Transform eSet to long data.frame
eSetToLongTable( x, exprsFun = function(eset) Biobase::exprs(eset), includeOtherAssayData = FALSE )eSetToLongTable( x, exprsFun = function(eset) Biobase::exprs(eset), includeOtherAssayData = FALSE )
x |
An |
exprsFun |
A function to extract expression values, by default |
includeOtherAssayData |
Logical, whether other elements in the |
The function extracts exprs (and other values in the assayData environment), and return it in a long data.frame format with phenotypic data
A data.frame in long format.
data(ribios.ExpressionSet, package="ribiosExpression") exprsLongTbl <- eSetToLongTable(ribios.ExpressionSet) seLongTbl <- eSetToLongTable(ribios.ExpressionSet, exprsFun=function(eset) Biobase::assayData(eset)$se.exprs)data(ribios.ExpressionSet, package="ribiosExpression") exprsLongTbl <- eSetToLongTable(ribios.ExpressionSet) seLongTbl <- eSetToLongTable(ribios.ExpressionSet, exprsFun=function(eset) Biobase::assayData(eset)$se.exprs)
Transform an expression matrix to long table
exprsToLong(x, ...) ## S4 method for signature 'matrix' exprsToLong( x, idvar = "illID", timevar = "hybridID", valuevar = "value", ids = rownames(x), valueType = "raw" ) ## S4 method for signature 'eSet' exprsToLong(x)exprsToLong(x, ...) ## S4 method for signature 'matrix' exprsToLong( x, idvar = "illID", timevar = "hybridID", valuevar = "value", ids = rownames(x), valueType = "raw" ) ## S4 method for signature 'eSet' exprsToLong(x)
x |
A matrix or an ExpressionSet object |
... |
Other parameters |
idvar |
Variable name of the feature identifier, passed to |
timevar |
The time variable, passed to |
valuevar |
The value variable |
ids |
Feature identifiers |
valueType |
Character string, value type |
A data.frame
exprsToLong(matrix): The method for matrix as input
exprsToLong(eSet): The method for eSet as input
Fix design matrix colnames so that they are legal variable names
fixDesignMatrixColnames( designMatrix, interceptChar = "_", removeContrastNames = TRUE )fixDesignMatrixColnames( designMatrix, interceptChar = "_", removeContrastNames = TRUE )
designMatrix |
A design matrix, produced by |
interceptChar |
Character string, the value the interaction symbol (:) should be replaced with |
removeContrastNames |
Logical, whether the contrast variable name should be removed. |
The matrix with fixed colum names.
myFac1 <- gl(6,2, labels=sprintf("Fac1_%d", 1:6)) myFac2 <- gl(2,6, labels=c("Ctrl", "Dis")) myVar <- rnorm(12) myDesign <- model.matrix(~myFac1 * myFac2 + myVar) head(myDesign) head(fixDesignMatrixColnames(myDesign))myFac1 <- gl(6,2, labels=sprintf("Fac1_%d", 1:6)) myFac2 <- gl(2,6, labels=c("Ctrl", "Dis")) myVar <- rnorm(12) myDesign <- model.matrix(~myFac1 * myFac2 + myVar) head(myDesign) head(fixDesignMatrixColnames(myDesign))
Detect if any column has an empty string as name and fix
fixEmptyColumnName(df, prefix = "X")fixEmptyColumnName(df, prefix = "X")
df |
A |
prefix |
A character string, the prefix to be used if an column's name is empty. |
If any column has an empty string as name, its replaced by the prefix appended by an index starting from 1
A data.frame with fixed column names.
testDf <- data.frame("Col1"=LETTERS[1:3], "Col2"=letters[2:4]) colnames(testDf) <- c("", "") testDf fixEmptyColumnName(testDf) fixEmptyColumnName(testDf, prefix="fData")testDf <- data.frame("Col1"=LETTERS[1:3], "Col2"=letters[2:4]) colnames(testDf) <- c("", "") testDf fixEmptyColumnName(testDf) fixEmptyColumnName(testDf, prefix="fData")
Resulting string(s) can be exported into GMT file by
writeLines
formatGmt(title, comment, genes) ## S4 method for signature 'character,character,character' formatGmt(title, comment, genes) ## S4 method for signature 'character,missing,character' formatGmt(title, genes) ## S4 method for signature 'character,character,list' formatGmt(title, comment, genes) ## S4 method for signature 'character,missing,list' formatGmt(title, genes)formatGmt(title, comment, genes) ## S4 method for signature 'character,character,character' formatGmt(title, comment, genes) ## S4 method for signature 'character,missing,character' formatGmt(title, genes) ## S4 method for signature 'character,character,list' formatGmt(title, comment, genes) ## S4 method for signature 'character,missing,list' formatGmt(title, genes)
title |
Character, title(s) of gene set(s) |
comment |
Character, comment(s) of gene set(s). Can be of the same
length as the |
genes |
A character vector of gene names, or a list of such vectors. In
the former case, one GMT line is produced; otherwise multiple lines are
returned. In the latter case, the length of the list must match the length
of |
One or more lines of GMT file
formatGmt(title = character, comment = character, genes = character): title, comment, and genes are one character string
formatGmt(title = character, comment = missing, genes = character): title and genes are both one character string, comments are missing
formatGmt(title = character, comment = character, genes = list): title and comments are both vectors of character
strings, genes are a list of the same length
formatGmt(title = character, comment = missing, genes = list): title is vectors of character strings, comments are
missing, genes are a list of the same length as the title
Jitao David Zhang <[email protected]>
formatGmt(title="GeneSet0", comment="My geneset", genes=c("MAPT", "MAPK", "AKT1")) formatGmt(title="GeneSet0", genes=c("MAPT", "MAPK", "AKT1")) formatGmt(title=c("GeneSet0", "GeneSet1"), comment=c("My geneset 0", "My geneset 1"), genes=list(c("MAPT", "MAPK", "AKT1"), c("EGFR", "CDC42"))) formatGmt(title=c("GeneSet0", "GeneSet1"), comment="My genesets", genes=list(c("MAPT", "MAPK", "AKT1"), c("EGFR", "CDC42"))) formatGmt(title=c("GeneSet0", "GeneSet1"), genes=list(c("MAPT", "MAPK", "AKT1"), c("EGFR", "CDC42")))formatGmt(title="GeneSet0", comment="My geneset", genes=c("MAPT", "MAPK", "AKT1")) formatGmt(title="GeneSet0", genes=c("MAPT", "MAPK", "AKT1")) formatGmt(title=c("GeneSet0", "GeneSet1"), comment=c("My geneset 0", "My geneset 1"), genes=list(c("MAPT", "MAPK", "AKT1"), c("EGFR", "CDC42"))) formatGmt(title=c("GeneSet0", "GeneSet1"), comment="My genesets", genes=list(c("MAPT", "MAPK", "AKT1"), c("EGFR", "CDC42"))) formatGmt(title=c("GeneSet0", "GeneSet1"), genes=list(c("MAPT", "MAPK", "AKT1"), c("EGFR", "CDC42")))
Extract sample groups from an object
groups(object) ## S4 method for signature 'DesignContrast' groups(object)groups(object) ## S4 method for signature 'DesignContrast' groups(object)
object |
An object, see supported methods below |
A factor of sample groups
groups(DesignContrast): Return the raw sample groups from a DesignContrast object
GRP files are used by Connectivity Map on-line tool, which stores the
information of a rank-ordered list of probesets. They are simply one-column
text files, each line containing one probeset. grpFiles2gmt convert
GRP files into GMT-formatted strings, which can be written in GMT files to
be used by GSEA and other tools.
grp2gmt(txt, chiptype, name) grpFiles2gmt(..., chiptype, n = -1L)grp2gmt(txt, chiptype, name) grpFiles2gmt(..., chiptype, n = -1L)
txt |
A vector of character strings, each containing one probeset |
chiptype |
Chip type, normally character representing the microarray
chip type. If the option is missing, or is of value |
name |
Character, name of the gene set (the first field of the GMT file) |
... |
GRP file names |
n |
Integer, number of lines to be read; |
The function grp2gmt, called by grpFiles2gmt internally,
annotates probesets when chiptype is supported by GTI, and transform
them into the GMT format.
If chiptype is provided, the annotate function is called to
fetch probeset annotations from the databank.
A vector of character strings, each containing one line of a GMT
file. They can be written to a file with the writeLines
function.
It is user's responsibility to check that all GRP files do exist and are readable.
Jitao David Zhang <[email protected]>
See https://www.broadinstitute.org/connectivity-map-cmap for the use of GRP files in the Connectivity Map web tool.
up.file <- system.file("extdata/tags_up.grp", package="ribiosExpression") down.file <- system.file("extdata/tags_down.grp", package="ribiosExpression") grp2gmt(readLines(up.file, n=-1)) grpFiles2gmt(c(up.file, down.file), n=3) ## Not run: grp2gmt(readLines(up.file, n=-1), chiptype="HG_U95AV2") grpFiles2gmt(c(up.file, down.file), n=-1L, chiptype="HG_U95AV2") ## End(Not run)up.file <- system.file("extdata/tags_up.grp", package="ribiosExpression") down.file <- system.file("extdata/tags_down.grp", package="ribiosExpression") grp2gmt(readLines(up.file, n=-1)) grpFiles2gmt(c(up.file, down.file), n=3) ## Not run: grp2gmt(readLines(up.file, n=-1), chiptype="HG_U95AV2") grpFiles2gmt(c(up.file, down.file), n=-1L, chiptype="HG_U95AV2") ## End(Not run)
Test whether the input design matrix is consistent with the sample names
isInputDesignConsistent(descon, sampleNames)isInputDesignConsistent(descon, sampleNames)
descon |
A DesignContrast object |
sampleNames |
A vector of string characters, specifying sample names If the sample names in DesignContrast are identical with the given sample names,
an invisible If the two sets are identical, however the order of sample names do not match, a warning
message is raised, and an invisible If the two sets have differences, the mismatching sample names are printed for diagnosis. |
A invisible logical value. TRUE if and only if the sample names match perfectly.
The function filters features (commonly probesets) in an
ExpressionSet object. It does not affect genes with only one feature
present, or genes without an valid annotation (see details below). For genes
with multiple probesets, the function calculates the statistic of each
probeset across all samples and filter probesets by only keeping the one
with the maximum of variance. Thereby an ExpressionSet returned by
the function has only one probeset matching each gene.
keepMaxStatProbe( eset, probe.index.name, keepNAprobes = TRUE, stat = function(x) mean(x, na.rm = TRUE), ... )keepMaxStatProbe( eset, probe.index.name, keepNAprobes = TRUE, stat = function(x) mean(x, na.rm = TRUE), ... )
eset |
An |
probe.index.name |
The column name of the |
keepNAprobes |
Logical, determines whether genes without an valid index name should kept or left out. See details below. |
stat |
Function or character, a function (or the name referring to it)
which takes a vector of numerical values, and returns one value as the
statistic, e.g. |
... |
Parameters passed to the |
Names of probesets are determined by the featureNames(eset) function.
The column of probe.index.name in the fData(eset) data.frame
determines the index of genes, for example the Entrez GeneID, to which
probesets are matched. Those genes without a valid index, whose index is
either an empty string or NA, can be set to be left out by
keepNAprobes=FALSE. If the option is set as TRUE, then these
genes are kept in the returning object.
The stat function should only return one statistic, most favorably
not NA, by taking a vector of numerical values. Most statistics can be
calculated in a robust way by setting na.rm=TRUE. This option should
be always used whenver possible. Otherwise when there is one or more missing
value of a probeset, its statistic will probably be NA and this will
lead to discard the probeset. Even worse, when all probesets matching to a
gene have NAs, the gene will be totally filtered out, which is
usually not desired. Therefore, set na.rm=TRUE through the ...
option (see examples below) whenever possible.
An filtered ExpressionSet.
Note that when the statistics of two or more probesets tie (having the same value), the probeset chosed could be random (the probeset with its name ranked first when multiple names are converted into a factor vector).
Jitao David Zhang <[email protected]>
library("Biobase") example.mat <- matrix(c(1,1,3,4, 2,2,3,3, 4,5,6,7, 7,8,9,10), ncol=4, byrow=TRUE) example.eset <- new("ExpressionSet", exprs=example.mat) featureNames(example.eset) <- c("1a","1b","2","3") fData(example.eset)$geneid <- c(1,1,2,3) ## keep probesets with the maximal variance example.sd <- keepMaxStatProbe(example.eset, probe.index.name="geneid", stat=sd) featureNames(example.sd) ## keep probesets with the maximal Median Absolute Deviation (MAD) example.mad <- keepMaxStatProbe(example.eset, probe.index.name="geneid", stat=mad) featureNames(example.mad) ## keep probesets with the maximal mean value example.mean <- keepMaxStatProbe(example.eset, probe.index.name="geneid", stat=mean) featureNames(example.mean) ## note that NA value may cause problems, it is a good practice to make ## the stat function _resist_ to NA na.eset <- example.eset exprs(na.eset)[1,1] <- NA ## Not run: ## prone to error na.mean <- keepMaxStatProbe(na.eset, probe.index.name="geneid",stat=mean) featureNames(na.mean) ## better na.mean.narm <- keepMaxStatProbe(na.eset, probe.index.name="geneid",na.rm=TRUE) featureNames(na.mean.narm) ## End(Not run)library("Biobase") example.mat <- matrix(c(1,1,3,4, 2,2,3,3, 4,5,6,7, 7,8,9,10), ncol=4, byrow=TRUE) example.eset <- new("ExpressionSet", exprs=example.mat) featureNames(example.eset) <- c("1a","1b","2","3") fData(example.eset)$geneid <- c(1,1,2,3) ## keep probesets with the maximal variance example.sd <- keepMaxStatProbe(example.eset, probe.index.name="geneid", stat=sd) featureNames(example.sd) ## keep probesets with the maximal Median Absolute Deviation (MAD) example.mad <- keepMaxStatProbe(example.eset, probe.index.name="geneid", stat=mad) featureNames(example.mad) ## keep probesets with the maximal mean value example.mean <- keepMaxStatProbe(example.eset, probe.index.name="geneid", stat=mean) featureNames(example.mean) ## note that NA value may cause problems, it is a good practice to make ## the stat function _resist_ to NA na.eset <- example.eset exprs(na.eset)[1,1] <- NA ## Not run: ## prone to error na.mean <- keepMaxStatProbe(na.eset, probe.index.name="geneid",stat=mean) featureNames(na.mean) ## better na.mean.narm <- keepMaxStatProbe(na.eset, probe.index.name="geneid",na.rm=TRUE) featureNames(na.mean.narm) ## End(Not run)
Return dgeTable from a marrayLM object
limmaDgeTable(marrayLM, contrast = NULL, confint = TRUE)limmaDgeTable(marrayLM, contrast = NULL, confint = TRUE)
marrayLM |
An object returned by ‘lmFit’ and ‘eBayes’ |
contrast |
NULL, or a character string indicating the contrast of interest |
confint |
Logical, whether confidence intervals should be returned |
A data.frame
Transform limma::topTable results to a DGEtable
limmaTopTable2dgeTable(limmaTopTable)limmaTopTable2dgeTable(limmaTopTable)
limmaTopTable |
topTable returned by limma::topTable |
A data.frame known as DGEtable which has controlled column names
example.sd <- 0.3*sqrt(4/rchisq(100,df=4)) example.y <- matrix(rnorm(100*6,sd=example.sd),100,6) example.y[1:2,4:6] <- example.y[1:2,4:6] + 2 rownames(example.y) <- paste("Gene",1:100) example.design <- cbind(Grp1=1,Grp2vs1=c(0,0,0,1,1,1)) example.fit <- limma::lmFit(example.y,example.design) example.fit <- limma::eBayes(example.fit) example.tt <- limma::topTable(example.fit, coef=2) example.dt <- limmaTopTable2dgeTable(example.tt) head(example.dt)example.sd <- 0.3*sqrt(4/rchisq(100,df=4)) example.y <- matrix(rnorm(100*6,sd=example.sd),100,6) example.y[1:2,4:6] <- example.y[1:2,4:6] + 2 rownames(example.y) <- paste("Gene",1:100) example.design <- cbind(Grp1=1,Grp2vs1=c(0,0,0,1,1,1)) example.fit <- limma::lmFit(example.y,example.design) example.fit <- limma::eBayes(example.fit) example.tt <- limma::topTable(example.fit, coef=2) example.dt <- limmaTopTable2dgeTable(example.tt) head(example.dt)
Transform a matrix to long table
matrixToLongTable(x, valueLabel = "value", rowLabel = "row", colLabel = "col")matrixToLongTable(x, valueLabel = "value", rowLabel = "row", colLabel = "col")
x |
A matrix |
valueLabel |
Character string, the label of the value |
rowLabel |
Character string, the name of the column holding the row names |
colLabel |
Character string, the name of the column holding the column names |
A data.frame
myMatrix <- matrix(rnorm(24), nrow=4, dimnames=list(LETTERS[1:4], letters[1:6])) matrixToLongTable(myMatrix)myMatrix <- matrix(rnorm(24), nrow=4, dimnames=list(LETTERS[1:4], letters[1:6])) matrixToLongTable(myMatrix)
Merge two eSets by column binding
mergeEset(eset1, eset2, by.x, by.y, normalization = "quantile")mergeEset(eset1, eset2, by.x, by.y, normalization = "quantile")
eset1 |
An |
eset2 |
Another |
by.x |
Column index of feature annotation of |
by.y |
COlumn index of feature annotation of |
normalization |
|
A new eSet object
Extract the number of contrasts from an object
nContrast(object) ## S4 method for signature 'DesignContrast' nContrast(object)nContrast(object) ## S4 method for signature 'DesignContrast' nContrast(object)
object |
An object, see supported methods below |
An integer, number of contrasts
nContrast(DesignContrast): Return the number of contrasts in a DesignContrast
object
Parse contrast from strings
parseContrastStr(contrastStr)parseContrastStr(contrastStr)
contrastStr |
A vector of character strings |
A contrast matrix
Parse study design and asked questions encoded in design and contrast matrices or in one-way ANOVA designs
parseDesignContrast( designFile = NULL, contrastFile = NULL, sampleGroups = NULL, groupLevels = NULL, dispLevels = NULL, contrasts = NULL, expSampleNames = NULL )parseDesignContrast( designFile = NULL, contrastFile = NULL, sampleGroups = NULL, groupLevels = NULL, dispLevels = NULL, contrasts = NULL, expSampleNames = NULL )
designFile |
A plain tab-delimited file with headers encoding the design matrix, or NULL |
contrastFile |
A plain tab-delimited file with headers encoding the contrast matrix, or NULL |
sampleGroups |
A character string concatenated by commas (e.g. A,B,C), or a plain text file containing one string per line (e.g. AnewlineBnewlineC), encoding sample group memberships. |
groupLevels |
Similar format as 'sampleGroups', encoding levels (e.g. order) of the sampleGroups |
dispLevels |
Similar format as 'sampleGroups', encoding the display of the groupLevels. Must match 'groupLevels' |
contrasts |
Similar format as 'sampleGroups', encoding contrasts in case of one-way ANOVA designs |
expSampleNames |
A vector of character strings giving the expected sample names (e.g. those in the input matrix) |
A S4-object 'DesignContrast'
## one-way ANOVA parseDesignContrast(sampleGroups="As,Be,As,Be,As,Be",groupLevels="Be,As", dispLevels="Beryllium,Arsenic", contrasts="As-Be") ## design/contrast matrix designFile <- system.file("extdata/example-designMatrix.txt", package="ribiosExpression") contrastFile <- system.file("extdata/example-contrastMatrix.txt", package="ribiosExpression") # minimal information parseDesignContrast(designFile=designFile, contrastFile=contrastFile) # with extra information about sample groups parseDesignContrast(designFile=designFile, contrastFile=contrastFile, sampleGroups="As,Be,As,Be,As,Be", groupLevels="Be,As", dispLevels="Beryllium,Arsenic")## one-way ANOVA parseDesignContrast(sampleGroups="As,Be,As,Be,As,Be",groupLevels="Be,As", dispLevels="Beryllium,Arsenic", contrasts="As-Be") ## design/contrast matrix designFile <- system.file("extdata/example-designMatrix.txt", package="ribiosExpression") contrastFile <- system.file("extdata/example-contrastMatrix.txt", package="ribiosExpression") # minimal information parseDesignContrast(designFile=designFile, contrastFile=contrastFile) # with extra information about sample groups parseDesignContrast(designFile=designFile, contrastFile=contrastFile, sampleGroups="As,Be,As,Be,As,Be", groupLevels="Be,As", dispLevels="Beryllium,Arsenic")
Parse design and contrast from files
parseDesignContrastFile( designFile, contrastFile, groupsStr = NULL, levelStr = NULL, dispLevelStr )parseDesignContrastFile( designFile, contrastFile, groupsStr = NULL, levelStr = NULL, dispLevelStr )
designFile |
A tab-delimited file encoding the design matrix |
contrastFile |
A tab-delimited file encoding the contrast matrix |
groupsStr |
A vector of character strings, giving sample groups |
levelStr |
A vector of level strings |
dispLevelStr |
A vector of strings to be used as display labels, if exist |
A DesignContrast object
Parse design and contrast from strings
parseDesignContrastStr(groupsStr, levelStr, dispLevelStr, contrastStr)parseDesignContrastStr(groupsStr, levelStr, dispLevelStr, contrastStr)
groupsStr |
A factor vector indicating sample groups |
levelStr |
Level strings |
dispLevelStr |
Display level strings |
contrastStr |
A vector of character strings indicating contrasts |
A DesignContrast object
The function plots a ComplexHeatmap containing two matrices, one of the design matrix and the other of the contrast matrix.
## S3 method for class 'DesignContrast' plot( x, y = NULL, title = NULL, clusterDesign = FALSE, clusterSamples = FALSE, clusterContrasts = FALSE, designRange = NULL, contrastRange = NULL, designParams = NULL, contrastParams = NULL, ... )## S3 method for class 'DesignContrast' plot( x, y = NULL, title = NULL, clusterDesign = FALSE, clusterSamples = FALSE, clusterContrasts = FALSE, designRange = NULL, contrastRange = NULL, designParams = NULL, contrastParams = NULL, ... )
x |
A |
y |
NULL, ignored |
title |
Title of the object, used in the title of the heatmaps |
clusterDesign |
Logical, cluster rows of the design matrix |
clusterSamples |
Logical, cluster columns of the design matrix |
clusterContrasts |
Logical, cluster rows of the contrast matrix (notice that the contrast matrix required by limma is the transposed contrast matrix) |
designRange |
NULL or a vector of length 2, giving range of the design matrix to be visualized. If NULL, the whole range is used. |
contrastRange |
NULL or a vector of length 2, giving range of the contrast matrix to be visualized. If NULL, a symmetric range of the largest absolute value and its negate is used. |
designParams |
Other parameters passed to |
contrastParams |
Other parameters passed to |
... |
Ignored |
Heatmap object
myFac <- gl(3,3, labels=c("baseline", "treat1", "treat2")) myDesign <- model.matrix(~myFac) colnames(myDesign) <- c("baseline", "treat1", "treat2") myContrast <- limma::makeContrasts(contrasts=c("treat1", "treat2"), levels=myDesign) res1 <- DesignContrast(myDesign, myContrast, groups=myFac) res2 <- DesignContrast(myDesign, myContrast, groups=myFac, dispLevels=c("C", "T1", "T2")) plot(res1, title="DesCon 1") plot(res2, title="DesCon 1 (identical)") plot(res2, title="DesCon 1 (identical)", designRange=c(-2,2), contrastRange=c(-2,1), designParams=list(row_names_gp=grid::gpar(fontsize=8)), contrastParams=list(column_names_gp=grid::gpar(fontsize=12, color="red")))myFac <- gl(3,3, labels=c("baseline", "treat1", "treat2")) myDesign <- model.matrix(~myFac) colnames(myDesign) <- c("baseline", "treat1", "treat2") myContrast <- limma::makeContrasts(contrasts=c("treat1", "treat2"), levels=myDesign) res1 <- DesignContrast(myDesign, myContrast, groups=myFac) res2 <- DesignContrast(myDesign, myContrast, groups=myFac, dispLevels=c("C", "T1", "T2")) plot(res1, title="DesCon 1") plot(res2, title="DesCon 1 (identical)") plot(res2, title="DesCon 1 (identical)", designRange=c(-2,2), contrastRange=c(-2,1), designParams=list(row_names_gp=grid::gpar(fontsize=8)), contrastParams=list(column_names_gp=grid::gpar(fontsize=12, color="red")))
Read in an annotation file in the tsv-format, with or without row names
readAnnotationFile(file, outputKeyName = "FeatureName", ...)readAnnotationFile(file, outputKeyName = "FeatureName", ...)
file |
A tab-delimited file without quotes, the first column must contain identifiers (key names). In case the first column has no column name and it contains row names, they will be used as feature names. |
outputKeyName |
The key name used in the output |
... |
Other parameters passed to |
A data.frame containing the annotation, with the first
column named as outputKeyName that contains feature identifiers as character
strings. In case the input table contains the column with the same name,
the content in that column must match the row names, otherwise an error
is reported.
This function is called by readFeatureAnnotationFile and
readSampleAnnotationFile. Normal users are unlikely to use it.
f1 <- system.file("extdata", "featureAnnotation/featureAnnotationFile-withRowNames.txt", package="ribiosExpression") f2 <- system.file("extdata", "featureAnnotation/featureAnnotationFile-withoutRowNames.txt", package="ribiosExpression") # 'FeatureName' does not exist in the column names of f1 f1Read <- readAnnotationFile(f1, outputKeyName="FeatureName") # 'GeneID' exists in the colum names of f2, and it is the first column. f2Read <- readAnnotationFile(f2, outputKeyName="GeneID") head(f1Read) head(f2Read)f1 <- system.file("extdata", "featureAnnotation/featureAnnotationFile-withRowNames.txt", package="ribiosExpression") f2 <- system.file("extdata", "featureAnnotation/featureAnnotationFile-withoutRowNames.txt", package="ribiosExpression") # 'FeatureName' does not exist in the column names of f1 f1Read <- readAnnotationFile(f1, outputKeyName="FeatureName") # 'GeneID' exists in the colum names of f2, and it is the first column. f2Read <- readAnnotationFile(f2, outputKeyName="GeneID") head(f1Read) head(f2Read)
Read eSet object from plain files
readEset( exprs.file, fData.file, pData.file, exprs.file.format = c("gct", "tsv"), sep = "\t", header = TRUE, ... )readEset( exprs.file, fData.file, pData.file, exprs.file.format = c("gct", "tsv"), sep = "\t", header = TRUE, ... )
exprs.file |
Character, file name where |
fData.file |
Character, optional, file name where |
pData.file |
Character, optional, file name where |
exprs.file.format |
Character, write |
sep |
Character, separator |
header |
Logical, whether a head line is present |
... |
Passed to |
The function can read in eSet object saved by writeEset by parsing
three plain text files: exprs.file, fData.file, and pData.file.
Currently both tsv and gct formats are supported for expression
file.
See writeEset for limitations of these functions.
An ExpressionSet object.
data(sample.ExpressionSet, package="Biobase") fData(sample.ExpressionSet) <- data.frame( ProbeID=featureNames(sample.ExpressionSet), row.names=featureNames(sample.ExpressionSet)) exprs.file <- tempfile() fData.file <- tempfile() pData.file <- tempfile() writeEset(sample.ExpressionSet, exprs.file, fData.file, pData.file, exprs.file.format="gct") testRead1 <- readEset(exprs.file, fData.file, pData.file, exprs.file.format="gct") writeEset(sample.ExpressionSet, exprs.file, fData.file, pData.file, exprs.file.format="tsv") testRead2 <- readEset(exprs.file, fData.file, pData.file, exprs.file.format="tsv")data(sample.ExpressionSet, package="Biobase") fData(sample.ExpressionSet) <- data.frame( ProbeID=featureNames(sample.ExpressionSet), row.names=featureNames(sample.ExpressionSet)) exprs.file <- tempfile() fData.file <- tempfile() pData.file <- tempfile() writeEset(sample.ExpressionSet, exprs.file, fData.file, pData.file, exprs.file.format="gct") testRead1 <- readEset(exprs.file, fData.file, pData.file, exprs.file.format="gct") writeEset(sample.ExpressionSet, exprs.file, fData.file, pData.file, exprs.file.format="tsv") testRead2 <- readEset(exprs.file, fData.file, pData.file, exprs.file.format="tsv")
The function reads in an expression matrix into an ExpressionSet object. The
expression matrix should be saved in the file format supported by the
read_exprs_matrix function: currently supported formats
include tab-delimited file and gct files.
readExprsMatrix(x)readExprsMatrix(x)
x |
A file containing an expression matrix |
The function is a wrapper of the read_exprs_matrix function in
the ribiosIO package. The difference is it returns a valid
ExpressionSet object instead of a primitive matrix.
An ExpressionSet object holding the expression matrix. Both
pData and fData are empty except for the feature/sample names recorded in
the expression matrix.
Jitao David Zhang <[email protected]>
read_exprs_matrix in the ribiosIO package.
idir <- system.file("extdata", package="ribiosExpression") myeset <- readExprsMatrix(file.path(idir, "sample_eset_exprs.txt")) myeset2 <- readExprsMatrix(file.path(idir, "test.gct"))idir <- system.file("extdata", package="ribiosExpression") myeset <- readExprsMatrix(file.path(idir, "sample_eset_exprs.txt")) myeset2 <- readExprsMatrix(file.path(idir, "test.gct"))
Read in feature annotation file in the tsv-format, with or without row names
readFeatureAnnotationFile(file, ...)readFeatureAnnotationFile(file, ...)
file |
A tab-delimited file without quotes, the first column must contain feauture identifiers. In case the first column has no column name and it contains row names, they will be used as feature names. |
... |
Other parameters passed to |
A data.frame containing feature annotation, with the first
column named as 'FeatureName' that contains feature identifiers as character
strings. In case the input table contains the column 'FeatureName', the
content in that column must match the row names, otherwise an error
is reported.
f1 <- system.file("extdata", "featureAnnotation/featureAnnotationFile-withRowNames.txt", package="ribiosExpression") f2 <- system.file("extdata", "featureAnnotation/featureAnnotationFile-withoutRowNames.txt", package="ribiosExpression") f1Read <- readFeatureAnnotationFile(f1) f2Read <- readFeatureAnnotationFile(f2) head(f1Read) head(f2Read)f1 <- system.file("extdata", "featureAnnotation/featureAnnotationFile-withRowNames.txt", package="ribiosExpression") f2 <- system.file("extdata", "featureAnnotation/featureAnnotationFile-withoutRowNames.txt", package="ribiosExpression") f1Read <- readFeatureAnnotationFile(f1) f2Read <- readFeatureAnnotationFile(f2) head(f1Read) head(f2Read)
The concept Foreign Keys comes from relational databse systems. These
keys can be used to cross-reference tables. Say we have two
data.frames, one contains gene annotations and the other contains
protein annotations. A column named mRNArefseqID may be the foreign
key that can be used to specify relationships between gene and proteins.
readFKtable(file, fk, strict.order = FALSE, ...)readFKtable(file, fk, strict.order = FALSE, ...)
file |
A table file. |
fk |
Characters, foreign keys. |
strict.order |
Logical, whether the foreign keys must have the same order as they appear in the file. |
... |
Other parameters passed to the |
The readFKtable reads a table from file, and checks if it contains
provided foreign keys: either as row.names or in the first column.
A data.frame if the FK-matching was successful, otherwise the
function will print an error message and stop.
Jitao David Zhang <[email protected]>
test.file <- tempfile() fk.teams <- c("HSV", "FCB", "BVB") ## FK in row names test.mat <- matrix(rnorm(9), nrow=3, dimnames=list(fk.teams, NULL)) write.table(test.mat, test.file) readFKtable(test.file, fk=fk.teams) ## or: FK can be in the first column test.df <- data.frame(team=fk.teams, pts=c(15,14,15),plc=c("H", "G", "H")) write.table(test.df, test.file) readFKtable(test.file, fk=fk.teams) ## try strict.order=TRUE test.df <- data.frame(pts=c(15,14,13), plc=c("H", "G", "H"), row.names=rev(fk.teams)) write.table(test.df, test.file) readFKtable(test.file, fk=fk.teams, strict.order=FALSE) ## Not run: readFKtable(test.file, fk=fk.teams, strict.order=TRUE)test.file <- tempfile() fk.teams <- c("HSV", "FCB", "BVB") ## FK in row names test.mat <- matrix(rnorm(9), nrow=3, dimnames=list(fk.teams, NULL)) write.table(test.mat, test.file) readFKtable(test.file, fk=fk.teams) ## or: FK can be in the first column test.df <- data.frame(team=fk.teams, pts=c(15,14,15),plc=c("H", "G", "H")) write.table(test.df, test.file) readFKtable(test.file, fk=fk.teams) ## try strict.order=TRUE test.df <- data.frame(pts=c(15,14,13), plc=c("H", "G", "H"), row.names=rev(fk.teams)) write.table(test.df, test.file) readFKtable(test.file, fk=fk.teams, strict.order=FALSE) ## Not run: readFKtable(test.file, fk=fk.teams, strict.order=TRUE)
As complementary functions to writeGctCls, readGctCls reads a
pair of gct and cls files (with same base names) into an
ExpressionSet object.
readGct(gct.file) readGctCls(file.base, gct.file, cls.file, add.fData.file, add.pData.file)readGct(gct.file) readGctCls(file.base, gct.file, cls.file, add.fData.file, add.pData.file)
gct.file |
The name of the gct file (only valid when file.base is missing). |
file.base |
The full file name of gct/cls files without suffixe, if not
in the current diretory, must contain the path (dirname) as well . For
instance if it is set as |
cls.file |
The name of the cls file (only valid when file.base is missing). |
add.fData.file |
Optional, file of additional feature data, see details. |
add.pData.file |
Optional, file of additional phenotype (sample) data, see details. |
The readGctCls function calls internally the readGct and
read_cls functions to read in two formats respeectively.
readGct returns a barely annotated ExpressionSet object, and
read_cls returns a vector of levels encoding sample groups.
Since gct/cls contains only one property of features and samples each
(Description in the gct file as well as sample groups/levels in the cls
file), readGctCls allows users to provide additional fData/pData
files. They should be tab-delimited files, with first column machting
exactly the names of features or samples. They must be within the path
specified by the path option, namely in the same directory of gls/cls
files.sample
See example below.
A ExpressionSet object
An ExpressionSet object. The Description column in the
gct file is encoded in the desc column in the featureData of the
resulting object. The sample groups in the cls file is encoded in the
cls column in the phenoData.
readGct(): readGct uses the C implementation of reading in a gct file
The readGct function is a wrapper of the
read_gct_matrix function in the ribiosIO package,
which makes up the GCT matrix into an ExpressionSet object.
Jitao David Zhang <[email protected]>
writeGctCls. See
read_gct_matrix for underlying C code to import GCT
files.
idir <- system.file("extdata", package="ribiosExpression") sample.eset <- readGctCls(file.base=file.path(idir, "test")) ext.eset <- readGctCls(file.base=file.path(idir, "test"), add.fData.file=file.path(idir, "test.add.fData.txt"), add.pData.file=file.path(idir, "test.add.pData.txt")) stopifnot(identical(exprs(sample.eset), exprs(ext.eset))) ## try to compare pData(sample.eset) with pData(ext.eset), and similarly ## fData(sample.eset) with fData(ext.eset)idir <- system.file("extdata", package="ribiosExpression") sample.eset <- readGctCls(file.base=file.path(idir, "test")) ext.eset <- readGctCls(file.base=file.path(idir, "test"), add.fData.file=file.path(idir, "test.add.fData.txt"), add.pData.file=file.path(idir, "test.add.pData.txt")) stopifnot(identical(exprs(sample.eset), exprs(ext.eset))) ## try to compare pData(sample.eset) with pData(ext.eset), and similarly ## fData(sample.eset) with fData(ext.eset)
Read in sample annotation file in the tsv-format, with or without row names
readSampleAnnotationFile(file, ...)readSampleAnnotationFile(file, ...)
file |
A tab-delimited file without quotes, the first column must contain sample identifiers. In case the first column has no column name and it contains row names, they will be used as sample names. |
... |
Other parameters passed to |
A data.frame containing sample annotation, with the first
column named as 'ExperimentName' that contains sample identifiers as character
strings. In case the input table contains the column 'ExperimentName', the
content in that column must match the row names, otherwise an error
is reported.
f1 <- system.file("extdata", "sampleAnnotation/sampleAnnotationFile-withRowNames.txt", package="ribiosExpression") f2 <- system.file("extdata", "sampleAnnotation/sampleAnnotationFile-withoutRowNames.txt", package="ribiosExpression") f1Read <- readSampleAnnotationFile(f1) f2Read <- readSampleAnnotationFile(f2) head(f1Read) head(f2Read)f1 <- system.file("extdata", "sampleAnnotation/sampleAnnotationFile-withRowNames.txt", package="ribiosExpression") f2 <- system.file("extdata", "sampleAnnotation/sampleAnnotationFile-withoutRowNames.txt", package="ribiosExpression") f1Read <- readSampleAnnotationFile(f1) f2Read <- readSampleAnnotationFile(f2) head(f1Read) head(f2Read)
The function is used to transform an eSet object, which is annotated by Bioconductor annotation packages, into an object with annotation information from GTI.
reannotate(object, check.target, ...)reannotate(object, check.target, ...)
object |
An |
check.target |
Logical, with |
... |
Currently not implemented |
The translation between Bioconductor annotation package names and GTI chip
types is performed by the bioc2gti function in the
ribiosAnnotation package.
Once the re-annotation succeeds, the annotation slot of the
eSet object will be overwritten by the corresponding chip
type name in GTI.
An eSet object with feature annotations updated by
GTI, and the annotation slot is changed to the chip type in GTI.
Jitao David Zhang <[email protected]>
annotate to annotate an eSet object
without prior information of bioc-annotation, or if that information is not
saved in the annotation slot.
data(ribios.ExpressionSet) print(ribios.ExpressionSet) ## Not run: gti.eSet <- reannotate(ribios.ExpressionSet) gti.eSet <- reannotate(ribios.ExpressionSet, check.target=FALSE) print(gti.eSet) ## End(Not run)data(ribios.ExpressionSet) print(ribios.ExpressionSet) ## Not run: gti.eSet <- reannotate(ribios.ExpressionSet) gti.eSet <- reannotate(ribios.ExpressionSet, check.target=FALSE) print(gti.eSet) ## End(Not run)
Remove all-zero variables from design matrix and the corresponding contrast matrix
removeAllZeroVar(obj, contrasts) ## S3 method for class 'matrix' removeAllZeroVar(obj, contrasts) ## S3 method for class 'DesignContrast' removeAllZeroVar(obj, contrasts = NULL)removeAllZeroVar(obj, contrasts) ## S3 method for class 'matrix' removeAllZeroVar(obj, contrasts) ## S3 method for class 'DesignContrast' removeAllZeroVar(obj, contrasts = NULL)
obj |
Either a design matrix, rows are samples, columns are
independent variables, and values are coefficients. Or a
|
contrasts |
Either |
Either a list of two matrices (design and
contrasts), or a DesignContrast object,
depending on the input parameter type. The design matrix and contrast
matrix have an attribute each, notEstCoefs and
notEstContrasts, that keep track of filtered variables and contrasts.
removeAllZeroVar(matrix): S3 function for matrix as input
removeAllZeroVar(DesignContrast): S3 function for matrix as input
myTestDesign <- matrix(c(1,1,1,1, 1,1,0,0,0,0,1,1,0,0,0,0), byrow=FALSE, nrow=4L, dimnames=list(sprintf("S%d", 1:4), c("Baseline", "Trt1", "Trt2", "Trt3"))) myTestContrast <- matrix(c(0,1,0,0, 0,0,1,0, 0,0,0,1), nrow=4L, byrow=FALSE, dimnames=list(colnames(myTestDesign), c("Trt1", "Trt2", "Trt3"))) removeAllZeroVar(myTestDesign, myTestContrast) removeAllZeroVar(DesignContrast(myTestDesign, myTestContrast))myTestDesign <- matrix(c(1,1,1,1, 1,1,0,0,0,0,1,1,0,0,0,0), byrow=FALSE, nrow=4L, dimnames=list(sprintf("S%d", 1:4), c("Baseline", "Trt1", "Trt2", "Trt3"))) myTestContrast <- matrix(c(0,1,0,0, 0,0,1,0, 0,0,0,1), nrow=4L, byrow=FALSE, dimnames=list(colnames(myTestDesign), c("Trt1", "Trt2", "Trt3"))) removeAllZeroVar(myTestDesign, myTestContrast) removeAllZeroVar(DesignContrast(myTestDesign, myTestContrast))
Return rank of the matrix and the ranks of resulting matrices when each column is removed
removeColRank(matrix)removeColRank(matrix)
matrix |
A numeric matrix |
A data.frame with n+1 rows, where n is the column count of the input matrix
myMat <- matrix(c(1,1,1, 0,1,1, 0,0,1, 1,0,0), ncol=4, byrow=FALSE) removeColRank(myMat)myMat <- matrix(c(1,1,1, 0,1,1, 0,0,1, 1,0,0), ncol=4, byrow=FALSE) removeColRank(myMat)
This object is adapted from the sample.ExpressionSet object, with
feature annotations from GTI (Data stand: December 2011). It is used in case
studies where functionalities of the ribiosExpression package are
demonstrated.
An ExpressionSet object.
Jitao David Zhang <[email protected]>
data(ribios.ExpressionSet) tbl <- eSetToLongTable(ribios.ExpressionSet)data(ribios.ExpressionSet) tbl <- eSetToLongTable(ribios.ExpressionSet)
ribiosExpressionSet: An example of ExpressionSet with artificial expression data
An ExpressionSet object.
Jitao David Zhang [email protected]
data(ribiosExpressionSet)data(ribiosExpressionSet)
Perform row-wise scaling to an ExpressionSet object
## S3 method for class 'ExpressionSet' rowscale(x, center = TRUE, scale = TRUE)## S3 method for class 'ExpressionSet' rowscale(x, center = TRUE, scale = TRUE)
x |
An ExpressionSet object. |
center |
Logical, whether the mean values of rows should be set to zero. |
scale |
Logical, whether the standard deviations of rows should be normalised to one. |
An ExpressionSet object with row-scaled expression values.
Sniff the feature type of an object that implements the featureNames method
sniffFeatureType(object, majority = 0.5)sniffFeatureType(object, majority = 0.5)
object |
Any object that |
majority |
A numeric value, used for majority voting, passed to
|
A character string indicating likely feature type.
Split an eSet object and run PCA on each split, return PCA scores as one data.frame
splitPCA(eset, factor, func = function(e) exprs(e), ...)splitPCA(eset, factor, func = function(e) exprs(e), ...)
eset |
An eSet object |
factor |
One or more factor vectors, used to split the eSet object |
func |
Function to retrieve values from split sub-eset objects |
... |
Passed to |
A data.frame of PCA scores combined from all splits.
data(ribios.ExpressionSet, package="ribiosExpression") fac1 <- gl(2,13) pcaScore1 <- splitPCA(ribios.ExpressionSet, fac1)data(ribios.ExpressionSet, package="ribiosExpression") fac1 <- gl(2,13) pcaScore1 <- splitPCA(ribios.ExpressionSet, fac1)
The summarizeRows function summarizes (collapses) rows of a numeric
matrix by calculating summarizing statistics of rows that belong to the same
factor level.
summarizeProbesets( eset, index.name, fun = mean, keep.nonindex = FALSE, keep.featureNames = FALSE, ... )summarizeProbesets( eset, index.name, fun = mean, keep.nonindex = FALSE, keep.featureNames = FALSE, ... )
eset |
An |
index.name |
Charcter, one column name in the |
fun |
Function or character, the function used to summarize probes,
|
keep.nonindex |
Logical, whether probesets without valid indices should be kept or not. |
keep.featureNames |
Logical, whether the featureNames of the input
object should be kept whenever possible. When multiple probesets are
summarized into one value representing, for example, one gene (by GeneID),
one arbitrary probeset is used to name the value when this option is set to
|
... |
Futher parameters passed to the function |
summarizeRows is called internally by summarizeProbesets to
collapse probesets that belong to one index (e.g. GeneID).
The action of this function is univariate: namely the fun is applied
to all probesets on each sample independently. For example, if fun is
mean, the average value of mutliple probesets is taken for each
sample. With this function, there is no way to distinguish probesets on
their expression profiles (for instance: find the probeset with the maximum
average signal).
An ExpressionSet, with probesets summarized by indices
specified.
Jitao David Zhang <[email protected]>
summarizeRows in the ribiosUtils package.
data(ribios.ExpressionSet, package="ribiosExpression") ribios.mean <- summarizeProbesets(ribios.ExpressionSet, index.name="GeneID", fun=mean) ribios.mean data(ribios.ExpressionSet, package="ribiosExpression") ribios.mean.keepFeatureNames <- summarizeProbesets(ribios.ExpressionSet, index.name="GeneID", fun=mean, keep.featureNames=TRUE) ribios.mean ribios.inval.mean <- summarizeProbesets(ribios.ExpressionSet, index.name="GeneID", fun=mean, keep.nonindex=TRUE) ## the underlying method ribios.meanMat <- ribiosUtils::summarizeRows(exprs(ribios.ExpressionSet), fData(ribios.ExpressionSet)$GeneID, mean) stopifnot(identical(exprs(ribios.mean), ribios.meanMat)) ## keep old featureNames ribios.inval.mean.old <- summarizeProbesets(ribios.ExpressionSet, index.name="GeneID", fun=mean, keep.nonindex=TRUE, keep.featureNames=TRUE)data(ribios.ExpressionSet, package="ribiosExpression") ribios.mean <- summarizeProbesets(ribios.ExpressionSet, index.name="GeneID", fun=mean) ribios.mean data(ribios.ExpressionSet, package="ribiosExpression") ribios.mean.keepFeatureNames <- summarizeProbesets(ribios.ExpressionSet, index.name="GeneID", fun=mean, keep.featureNames=TRUE) ribios.mean ribios.inval.mean <- summarizeProbesets(ribios.ExpressionSet, index.name="GeneID", fun=mean, keep.nonindex=TRUE) ## the underlying method ribios.meanMat <- ribiosUtils::summarizeRows(exprs(ribios.ExpressionSet), fData(ribios.ExpressionSet)$GeneID, mean) stopifnot(identical(exprs(ribios.mean), ribios.meanMat)) ## keep old featureNames ribios.inval.mean.old <- summarizeProbesets(ribios.ExpressionSet, index.name="GeneID", fun=mean, keep.nonindex=TRUE, keep.featureNames=TRUE)
The function takes an eSet object and a factor of the same length as
the object, and summarizes samples of the same factor level by applying the
function.
summarizeSamples( eset, indSamples = eset$SAMPLEID, removeInvarCols = TRUE, fun = sum, ... ) poolReplicates(eset, indSamples = eset$SAMPLEID, removeInvarCols = TRUE) avgReplicates(eset, indSamples = eset$SAMPLEID, removeInvarCols = TRUE) medianReplicates(eset, indSamples = eset$SAMPLEID, removeInvarCols = TRUE)summarizeSamples( eset, indSamples = eset$SAMPLEID, removeInvarCols = TRUE, fun = sum, ... ) poolReplicates(eset, indSamples = eset$SAMPLEID, removeInvarCols = TRUE) avgReplicates(eset, indSamples = eset$SAMPLEID, removeInvarCols = TRUE) medianReplicates(eset, indSamples = eset$SAMPLEID, removeInvarCols = TRUE)
eset |
An |
indSamples |
A factor of the same length as the sample number of the object |
removeInvarCols |
Logical, whether invariant columns of the resulting
|
fun |
The function to be applied to summarize samples |
... |
Other parameters passed to the function |
poolReplicates and avgReplicates are two specific form of the
more generic summarizeSamples function: they take sum and average of
replicates given by the factor, respectively.
From version 1.1-7, the function summarizes not only exprs, but also
all other objects in assayData
A eSet object.
Jitao David Zhang <[email protected]>
The function calls summarizeColumns
internally.
Also see summarizeProbesets.
data(ribios.ExpressionSet, package="ribiosExpression") index <- factor(c(gl(12,2), 13, 14)) (ss.eset1 <- summarizeSamples(ribios.ExpressionSet, index)) (ss.eset2 <- summarizeSamples(ribios.ExpressionSet, index, fun=mean, na.rm=TRUE)) ## equivalently (ss.eset2 <- poolReplicates(ribios.ExpressionSet, index)) (ss.eset3 <- avgReplicates(ribios.ExpressionSet, index))data(ribios.ExpressionSet, package="ribiosExpression") index <- factor(c(gl(12,2), 13, 14)) (ss.eset1 <- summarizeSamples(ribios.ExpressionSet, index)) (ss.eset2 <- summarizeSamples(ribios.ExpressionSet, index, fun=mean, na.rm=TRUE)) ## equivalently (ss.eset2 <- poolReplicates(ribios.ExpressionSet, index)) (ss.eset3 <- avgReplicates(ribios.ExpressionSet, index))
Truncate dgeTable into tables of positively and negatively differentially expressed genes according to the pre-defined criteria
truncateDgeTable(dgeTable)truncateDgeTable(dgeTable)
dgeTable |
dgeTable A DGEtable defined in ribiosExpression. Notice that the column names returned by limma::topTable are remapped (see limmaTopTable2dgeTable). |
A list of two elements: 'pos' and 'neg'. Each contains a dgeTable of positively/negatively regulated genes
Gct/Cls file formats are required by the Gene Set Enrichment Analysis (GSEA)
tool. Functions writeGct and writeCls exports file of two
formats respectively, and writeGctCls calls the two function
internally to write two files.
writeCls(eset, file = stdout(), sample.group.col = "group") writeGctCls( eset, file.base, feat.name, feat.desc, sample.group.col, write.add.fData.file = TRUE, write.add.pData.file = TRUE )writeCls(eset, file = stdout(), sample.group.col = "group") writeGctCls( eset, file.base, feat.name, feat.desc, sample.group.col, write.add.fData.file = TRUE, write.add.pData.file = TRUE )
eset |
An object of the |
file |
Name of the Gct/Cls file. If left missing, the file is printed on the standard output. |
sample.group.col |
Integer, character or a factor vector of the same length as the sample number, indicating classes (groups) of samples. See details. |
file.base |
For writeGctCls, the base name of the two files: the suffix (.gct and .cls) will be appended |
feat.name |
Integer or character, indicating which column of the featureData should be used as feature descriptions. If the value is missing, the Description column of the Gct file will be left blank |
feat.desc |
Integer or character, indicating which column of the
featureData should be used as feature names; if missing, results of the
|
write.add.fData.file |
Logical, whether additional featureData should
be written into a file named |
write.add.pData.file |
Logical, whether additional phenoData should be
written into a file named |
The feat.name option specifies what identifiers should be used for
features (probesets). When the value is missing, featureNames is
called to provide feature identifiers.
In contrast, the sample.group.col cannot be missing: since cls files
encode groups (classes) of samples, and if sample.group.col was
missing, it is usually impossible to get class information from
sampleNames.
Internally writeCls calls dfFactor function to
determine factor of samples. Therefore sample.group.col is to a
certain degree generic: it can be a character string or integer index of the
pData(eset) data matrix, or a factor vector of the same length as
ncol(eset).
Functions are used for their side effects.
writeCls(): writeCls
Jitao David Zhang <[email protected]>
https://www.gsea-msigdb.org/gsea/doc/GSEAUserGuideTEXT.htm
See dfFactor for possible values of the
sample.group.col option.
See readGctCls for importing functions.
data(sample.ExpressionSet, package="Biobase") writeGct(sample.ExpressionSet[1:5, 1:4], file=stdout()) writeCls(sample.ExpressionSet, file=stdout(), sample.group.col="type") tmpfile <- tempfile() writeGctCls(sample.ExpressionSet, file.base=tmpfile, sample.group.col="type") readLines(paste(tmpfile, ".cls",sep="")) unlink(c(paste(tmpfile, ".cls", sep=""), paste(tmpfile, ".gct", sep="")))data(sample.ExpressionSet, package="Biobase") writeGct(sample.ExpressionSet[1:5, 1:4], file=stdout()) writeCls(sample.ExpressionSet, file=stdout(), sample.group.col="type") tmpfile <- tempfile() writeGctCls(sample.ExpressionSet, file.base=tmpfile, sample.group.col="type") readLines(paste(tmpfile, ".cls",sep="")) unlink(c(paste(tmpfile, ".cls", sep=""), paste(tmpfile, ".gct", sep="")))
Export an ExpressionSet object as tab-delimited (or gct) files
writeEset( eset, exprs.file, fData.file, pData.file, exprs.file.format = c("gct", "tsv"), feat.name = NULL, feat.desc = NULL )writeEset( eset, exprs.file, fData.file, pData.file, exprs.file.format = c("gct", "tsv"), feat.name = NULL, feat.desc = NULL )
eset |
The |
exprs.file |
Character, file name where |
fData.file |
Character, optional, file name where |
pData.file |
Character, optional, file name where |
exprs.file.format |
Character, write |
feat.name |
Character, feature names or a column in |
feat.desc |
Character, feature descriptions or a column in
|
NULL, only side effect is used
One limitation of readEset and writeEset functions is that
they only support the export/import of exactly one expression
matrix from one ExpressionSet object. Although an
ExpressionSet can hold more than one matrices other than the
one known as exprs, they are currently not handled by writeEset
or readEset. If such an ExprssionSet object is first
written in plain files, and then read back as an ExpressionSet,
matrices other than the one accessible by exprs will be discarded.
Similarly, other pieces of information saved in an ExpressionSet,
e.g. experimental data, are lost as well after a cycle of exporting
and subsequent importing. If keeping these information is important for you,
other functions should be considered instead of readEset and
writeEset, for instance to save an image in a binary file with
the save function.
Yet another limitation is that factor information is lost. This hits especially the phenoData where factor information, such as sample groupping and orders of levels, may be important.
data(sample.ExpressionSet, package="Biobase") exprs.file <- tempfile() fData.file <- tempfile() pData.file <- tempfile() writeEset(sample.ExpressionSet, exprs.file, fData.file, pData.file, exprs.file.format="gct") writeEset(sample.ExpressionSet, exprs.file, fData.file, pData.file, exprs.file.format="tsv")data(sample.ExpressionSet, package="Biobase") exprs.file <- tempfile() fData.file <- tempfile() pData.file <- tempfile() writeEset(sample.ExpressionSet, exprs.file, fData.file, pData.file, exprs.file.format="gct") writeEset(sample.ExpressionSet, exprs.file, fData.file, pData.file, exprs.file.format="tsv")
Export matrix or eSet that can be coerced as one into gct/cls files
writeGct(obj, file, feat.name, feat.desc) ## S4 method for signature 'matrix' writeGct(obj, file, feat.name, feat.desc) ## S4 method for signature 'eSet' writeGct(obj, file, feat.name, feat.desc)writeGct(obj, file, feat.name, feat.desc) ## S4 method for signature 'matrix' writeGct(obj, file, feat.name, feat.desc) ## S4 method for signature 'eSet' writeGct(obj, file, feat.name, feat.desc)
obj |
The input object, see methods below for supported data types |
file |
The output file |
feat.name |
Specifying feature names |
feat.desc |
Specifying feature descriptions |
Used for its side effect of writing files; returns invisibly.
writeGct(matrix): Method for matrix as input,
feta.name and feat.desc are passed to write_gct.
writeGct(eSet): Use eSet as input. feat.name and feat.desc are
variable (column) names in fData.
Write sample groups and group levels into plain text files
writeSampleGroups(sampleGroups, sampleGroups.file, sampleGroupLevels.file)writeSampleGroups(sampleGroups, sampleGroups.file, sampleGroupLevels.file)
sampleGroups |
Factor, encoding sample groups. |
sampleGroups.file |
Character, file name where the information of sample groups is written to. |
sampleGroupLevels.file |
Character, file name where the information of sample group levels is written to. |
The function is used to export sample group and group level information for differential gene expression analysis.
Used for its side effect of writing files. Returns invisibly.
writeSampleGroups(gl(3,4), stdout(), stdout())writeSampleGroups(gl(3,4), stdout(), stdout())
writexlxs: write AnnotatedDataFrame to a xlsx file
writeVarMetadata(x, path = tempfile(fileext = ".xlsx"), overwrite = TRUE) ## Default S3 method: writeVarMetadata(x, path = tempfile(fileext = ".xlsx"), overwrite = TRUE)writeVarMetadata(x, path = tempfile(fileext = ".xlsx"), overwrite = TRUE) ## Default S3 method: writeVarMetadata(x, path = tempfile(fileext = ".xlsx"), overwrite = TRUE)
x |
An |
path |
The xlsx file name to be written to. |
overwrite |
Logical, whether the file should be overwritten if it exists. |
An invisible TRUE in case the file is successfully created, else FALSE.
We also tried the writexl package but the comments are not well supported by writexl, therefore we stay with openxlsx
data("ribiosExpressionSet", package="ribiosExpression") outfile <- tempfile() writeVarMetadata(ribiosExpressionSet, path=outfile) writeVarMetadata(phenoData(ribiosExpressionSet), path=outfile)data("ribiosExpressionSet", package="ribiosExpression") outfile <- tempfile() writeVarMetadata(ribiosExpressionSet, path=outfile) writeVarMetadata(phenoData(ribiosExpressionSet), path=outfile)