| Title: | Utilities from and Interface to the 'Bioinfo-C' ('BIOS') Library |
|---|---|
| Description: | Provides interface to the 'Bioinfo-C' (internal name: 'BIOS') library and utilities. 'ribiosUtils' is a Swiss-knife for computational biology in drug discovery, providing functions and utilities with minimal external dependency and maximal efficiency. |
| Authors: | Jitao David Zhang [aut, cre, ctb] (ORCID: <https://orcid.org/0000-0002-3085-0909>), Clemens Broger [aut, ctb], F.Hoffmann-La Roche AG [cph], Junio C Hamano [cph], Jean Thierry-Mieg [cph], Konrad Rudolph [cph], Richard Durbin [cph] |
| Maintainer: | Jitao David Zhang <[email protected]> |
| License: | GPL-3 |
| Version: | 1.7.9 |
| Built: | 2026-05-16 08:31:33 UTC |
| Source: | https://github.com/bedapub/ribiosUtils |
Given several objects, the function tests whether all of them are identical.
allIdentical(...)allIdentical(...)
... |
Objects to be tested. Can be given as a list, or simplying appending names separated by commas, see example. |
Logical, whether all objects are the same
Jitao David Zhang <[email protected]>
test1 <- test2 <- test3 <- LETTERS[1:3] allIdentical(test1, test2, test3) allIdentical(list(test1, test2, test3)) num1 <- num2 <- num3 <- num4 <- sqrt(3) allIdentical(num1, num2, num3, num4)test1 <- test2 <- test3 <- LETTERS[1:3] allIdentical(test1, test2, test3) allIdentical(list(test1, test2, test3)) num1 <- num2 <- num3 <- num4 <- sqrt(3) allIdentical(num1, num2, num3, num4)
Apply isTopOrIncAndNotExcl filter to a matrix
applyTopOrIncAndNotExclFilter(matrix, MARGIN, top = 1, falseValue = 0, ...)applyTopOrIncAndNotExclFilter(matrix, MARGIN, top = 1, falseValue = 0, ...)
matrix |
A matrix. |
MARGIN |
Integer, 1 stands for row and 2 stands for column, passed to |
top |
Integer, how many top elements should be kept, passed to |
falseValue |
The same type as data in the matrix, used to replace values that is |
... |
Further parameters passed to |
A matrix with the same dimnames but with elements not satisfying isTopOrIncAndNotExcl replaced by falseValue.
myMat <- matrix(c(1,2,3,4,8,7,6,5,12,9,11,10), nrow=3, byrow=TRUE, dimnames=list(c("A", "B", "C"), c("Alpha", "Beta", "Gamma", "Delta"))) print(myMat) applyTopOrIncAndNotExclFilter(myMat, 1, top=2, falseValue=-1) applyTopOrIncAndNotExclFilter(myMat, 2, top=2, falseValue=-1) applyTopOrIncAndNotExclFilter(myMat, 2, top=2, falseValue=-1, decreasing=FALSE) applyTopOrIncAndNotExclFilter(myMat, 1, top=2, falseValue=-1, incFunc=function(x) x%%2==0) applyTopOrIncAndNotExclFilter(myMat, 1, top=2, falseValue=-1, incFunc=function(x) x%%2==0, excFunc=function(x) x<5)myMat <- matrix(c(1,2,3,4,8,7,6,5,12,9,11,10), nrow=3, byrow=TRUE, dimnames=list(c("A", "B", "C"), c("Alpha", "Beta", "Gamma", "Delta"))) print(myMat) applyTopOrIncAndNotExclFilter(myMat, 1, top=2, falseValue=-1) applyTopOrIncAndNotExclFilter(myMat, 2, top=2, falseValue=-1) applyTopOrIncAndNotExclFilter(myMat, 2, top=2, falseValue=-1, decreasing=FALSE) applyTopOrIncAndNotExclFilter(myMat, 1, top=2, falseValue=-1, incFunc=function(x) x%%2==0) applyTopOrIncAndNotExclFilter(myMat, 1, top=2, falseValue=-1, incFunc=function(x) x%%2==0, excFunc=function(x) x<5)
Convert string-valued data frame or matrix into a numeric matrix
asNumMatrix(x)asNumMatrix(x)
x |
A data.frame or matrix, most likely with string values |
A numeric matrix with the same dimension
Jitao David Zhang <[email protected]>
testDf <- data.frame(a=c("2.34", "4.55"), b=c("7.33", "9.10")) asNumMatrix(testDf) testMatrix <- matrix(c("2.34", "4.55", "9E-3","-2.44", "7.33", "9.10"), nrow=2) asNumMatrix(testMatrix)testDf <- data.frame(a=c("2.34", "4.55"), b=c("7.33", "9.10")) asNumMatrix(testDf) testMatrix <- matrix(c("2.34", "4.55", "9E-3","-2.44", "7.33", "9.10"), nrow=2) asNumMatrix(testMatrix)
The function calls matchColumnName internally to match the
column names.
assertColumnName(data.frame.cols, reqCols, ignore.case = FALSE)assertColumnName(data.frame.cols, reqCols, ignore.case = FALSE)
data.frame.cols |
column names of a data.frame. One can also provide a data.frame, which may however cause worse performance since the data.frame is copied |
reqCols |
required columns |
ignore.case |
logical, whether the case is considered |
If all required column names are present, their indices are returned *invisibly*. Otherwise an error message is printed.
myTestDf <- data.frame(HBV=1:3, VFB=0:2, BVB=4:6, FCB=2:4) myFavTeams <- c("HBV", "BVB") assertColumnName(myTestDf, myFavTeams) myFavTeamsCase <- c("hbv", "bVb") assertColumnName(myTestDf, myFavTeamsCase, ignore.case=TRUE)myTestDf <- data.frame(HBV=1:3, VFB=0:2, BVB=4:6, FCB=2:4) myFavTeams <- c("HBV", "BVB") assertColumnName(myTestDf, myFavTeams) myFavTeamsCase <- c("hbv", "bVb") assertColumnName(myTestDf, myFavTeamsCase, ignore.case=TRUE)
Check dimensionality of contrast matrix
assertContrast(design, contrast)assertContrast(design, contrast)
design |
Design matrix |
contrast |
Contrast matrix |
Side effect is used: the function stops if the ncol(design) does not equal nrow(contrast)
design <- matrix(1:20, ncol=5) contrast <- matrix(c(-1,1,0,0,0, 0,1,0,-1,0), nrow=5) assertContrast(design, contrast)design <- matrix(1:20, ncol=5) contrast <- matrix(c(-1,1,0,0,0, 0,1,0,-1,0), nrow=5) assertContrast(design, contrast)
Check dimensionality of design matrix
assertDesign(nsample, design)assertDesign(nsample, design)
nsample |
Integer, number of samples |
design |
Design matrix |
Side effect is used: the function stops if sample size does not equal ncol(matrix)
nsample <- 4 design <- matrix(1:20, ncol=5) assertDesign(nsample, design)nsample <- 4 design <- matrix(1:20, ncol=5) assertDesign(nsample, design)
Check dimensionality of both design and contrast matrix
assertDesignContrast(nsample, design, contrast)assertDesignContrast(nsample, design, contrast)
nsample |
Integer, number of samples |
design |
Design matrix |
contrast |
Contrast matrix |
Side effect is used: the function stops if there are errors in the dimensionalities
nsample <- 4 design <- matrix(1:20, ncol=5) contrast <- matrix(c(-1,1,0,0,0, 0,1,0,-1,0), nrow=5) assertDesignContrast(nsample, design, contrast)nsample <- 4 design <- matrix(1:20, ncol=5) contrast <- matrix(c(-1,1,0,0,0, 0,1,0,-1,0), nrow=5) assertDesignContrast(nsample, design, contrast)
Print BEDA project information
bedaInfo()bedaInfo()
A list, including pstore path, URL, git address, and user id The function is used at the end of the Rmarkdown report to print relevant information to help other colleagues finding relevant resources
bedaInfo()bedaInfo()
Translate BiOmics-Pathology pstore path to URL
biomicsPstorePath2URL(path)biomicsPstorePath2URL(path)
path |
Unix path |
Character string of biomics pstore path The URL is only visible inside Roche
biomicsPstorePath2URL("/pstore/data/biomics/")biomicsPstorePath2URL("/pstore/data/biomics/")
The basic concepts of these functions are borrowed from the bound
function in the Qt framework.
bound(x, low, high) boundNorm(x, low = min(x, na.rm = TRUE), high = max(x, na.rm = TRUE))bound(x, low, high) boundNorm(x, low = min(x, na.rm = TRUE), high = max(x, na.rm = TRUE))
x |
A numeric vector or matrix |
low |
New lower boundary |
high |
New higher boundary |
bound sets the values smaller than low, or larger than
high, to the value of low and high respectively.If no
such values exist, the vector or matrix is returned unchanged.
boundNorm performs a 0-1 normalization. Input vector or matrix is
transformed linearly onto the region defined between low and
high, which has the unit length (1).
A numeric vector or matrix, the same type as input.
Jitao David Zhang <[email protected]>
myVec <- c(2,4,3,-1,9,5,3,4) bound(myVec, 0, 8) boundNorm(myVec) ## boundNorm returns negative values if input values lie out of the ## given region between low and high boundNorm(myVec, 0, 8) myMat <- matrix(myVec, nrow=2) myMat bound(myMat, 0, 8) boundNorm(myMat) boundNorm(myMat, 0, 8)myVec <- c(2,4,3,-1,9,5,3,4) bound(myVec, 0, 8) boundNorm(myVec) ## boundNorm returns negative values if input values lie out of the ## given region between low and high boundNorm(myVec, 0, 8) myMat <- matrix(myVec, nrow=2) myMat bound(myMat, 0, 8) boundNorm(myMat) boundNorm(myMat, 0, 8)
Column bind by rownames
cbindByRownames(..., type = c("intersect", "union")) rbindByColnames(..., type = c("intersect", "union"))cbindByRownames(..., type = c("intersect", "union")) rbindByColnames(..., type = c("intersect", "union"))
... |
Two or more matrices, or a list of matrices. |
type |
Character string, how are row names that are not shared by all
items handled, either |
A matrix
mat1 <- matrix(1:9, nrow=3, byrow=FALSE, dimnames=list(LETTERS[1:3], LETTERS[1:3])) mat2 <- matrix(1:9, nrow=3, byrow=FALSE, dimnames=list(LETTERS[2:4], LETTERS[4:6])) mat3 <- matrix(1:9, nrow=3, byrow=FALSE, dimnames=list(LETTERS[c(2,4,5)], LETTERS[7:9])) cbindByRownames(mat1, mat2, mat3, type="intersect") cbindByRownames(mat1, mat2, mat3, type="union") ## it is also possible to pass a list cbindByRownames(list(mat1, mat2, mat3), type="union") mat4 <- matrix(1:9, nrow=3, byrow=FALSE, dimnames=list(LETTERS[1:3], LETTERS[1:3])) mat5 <- matrix(1:9, nrow=3, byrow=FALSE, dimnames=list(LETTERS[4:6], LETTERS[2:4])) mat6 <- matrix(1:9, nrow=3, byrow=TRUE, dimnames=list(LETTERS[7:9], LETTERS[c(2,4,6)])) rbindByColnames(mat4, mat5, mat6, type="intersect") rbindByColnames(mat4, mat5, mat6, type="union") ## it is also possible to pass a list rbindByColnames(list(mat4, mat5, mat6), type="union")mat1 <- matrix(1:9, nrow=3, byrow=FALSE, dimnames=list(LETTERS[1:3], LETTERS[1:3])) mat2 <- matrix(1:9, nrow=3, byrow=FALSE, dimnames=list(LETTERS[2:4], LETTERS[4:6])) mat3 <- matrix(1:9, nrow=3, byrow=FALSE, dimnames=list(LETTERS[c(2,4,5)], LETTERS[7:9])) cbindByRownames(mat1, mat2, mat3, type="intersect") cbindByRownames(mat1, mat2, mat3, type="union") ## it is also possible to pass a list cbindByRownames(list(mat1, mat2, mat3), type="union") mat4 <- matrix(1:9, nrow=3, byrow=FALSE, dimnames=list(LETTERS[1:3], LETTERS[1:3])) mat5 <- matrix(1:9, nrow=3, byrow=FALSE, dimnames=list(LETTERS[4:6], LETTERS[2:4])) mat6 <- matrix(1:9, nrow=3, byrow=TRUE, dimnames=list(LETTERS[7:9], LETTERS[c(2,4,6)])) rbindByColnames(mat4, mat5, mat6, type="intersect") rbindByColnames(mat4, mat5, mat6, type="union") ## it is also possible to pass a list rbindByColnames(list(mat4, mat5, mat6), type="union")
checkFile checks whether file exists, assertFile stops the
program if files do not exist
checkFile(...) assertFile(...)checkFile(...) assertFile(...)
... |
Files to be checked |
assertFile is often used in scripts where missing a file would cause
the script fail.
checkFile returns logical vector. assertFile returns
an invisible TRUE if files exist, otherwise halts and prints error
messages.
Jitao David Zhang <[email protected]>
myDesc <- system.file("DESCRIPTION", package="ribiosUtils") myNEWS <- system.file("NEWS", package="ribiosUtils") checkFile(myDesc, myNEWS) assertFile(myDesc, myNEWS)myDesc <- system.file("DESCRIPTION", package="ribiosUtils") myNEWS <- system.file("NEWS", package="ribiosUtils") checkFile(myDesc, myNEWS) assertFile(myDesc, myNEWS)
Print the chosen few (the first and the last) items of a long vector
chosenFew(vec, start = 3, end = 1, collapse = ",")chosenFew(vec, start = 3, end = 1, collapse = ",")
vec |
A vector of characters or other types that can be cast into characters |
start |
Integer, how many elements at the start shall be printed |
end |
Integer, how many elements at the end shall be printed |
collapse |
Character used to separate elements |
A character string ready to be printed
In case the vector is shorter than the sum of start and
end, the whole vector is printed.
Jitao David Zhang <[email protected]>
lvec1 <- 1:100 chosenFew(lvec1) chosenFew(lvec1, start=5, end=3) svec <- 1:8 chosenFew(svec) chosenFew(svec, start=5, end=4)lvec1 <- 1:100 chosenFew(lvec1) chosenFew(lvec1, start=5, end=3) svec <- 1:8 chosenFew(svec) chosenFew(svec, start=5, end=4)
registerLog)Close connections to all loggers
This function closes all open connections set up by loggers
It is automatically run at the end of the R session (setup by registerLog)
closeLoggerConnections()closeLoggerConnections()
Invisible NULL. Only side effect is used.
Pairwise jaccard/overlap coefficient can be calculated efficiently using matrix Pairwise overlap coefficient of binary matrix by column
columnOverlapCoefficient(x, y = NULL)columnOverlapCoefficient(x, y = NULL)
x |
An integer matrix, other objects will be coereced into a matrix |
y |
An integer matrix, other objects will be coereced into a matrix. In case of
|
A matrix of column-wise pairwise overlap coefficients of the binary matrix. NaN
is reported when neither of the columns have any non-zero element.
set.seed(1887) testMatrix1 <- matrix(rbinom(120, 1, 0.2), nrow=15) columnOverlapCoefficient(testMatrix1) testMatrix2 <- matrix(rbinom(150, 1, 0.2), nrow=15) testMatrix12Poe <- columnOverlapCoefficient(testMatrix1, testMatrix2)set.seed(1887) testMatrix1 <- matrix(rbinom(120, 1, 0.2), nrow=15) columnOverlapCoefficient(testMatrix1) testMatrix2 <- matrix(rbinom(150, 1, 0.2), nrow=15) testMatrix12Poe <- columnOverlapCoefficient(testMatrix1, testMatrix2)
Basic set operations are used to compare two vectors
compTwoVecs(vec1, vec2)compTwoVecs(vec1, vec2)
vec1 |
A vector of atomic types, e.g. integers, characters, etc. |
vec2 |
A vector of the same type as |
A vector of six integer elements
vec1.setdiff |
Number of
unique items only in |
intersect |
Number of items in both |
vec2.setdiff |
Number of unique items only in |
vec1.ulen |
Number of unique items in |
vec2.ulen |
Number of unique items in |
union |
Number
of unique items in |
Jitao David Zhang <[email protected]>
year1 <- c("HSV", "FCB", "BVB", "S04", "FCN") year2 <- c("HSV", "FCK", "S04") compTwoVecs(year1, year2)year1 <- c("HSV", "FCB", "BVB", "S04", "FCN") year2 <- c("HSV", "FCK", "S04") compTwoVecs(year1, year2)
Calculate correlation coefficients using common rows of the two matrices
corByRownames(mat1, mat2, ...)corByRownames(mat1, mat2, ...)
mat1 |
A numeric matrix |
mat2 |
Another numeric matrix |
... |
Passed |
A matrix of the dimension , where m and n are
number of columns in mat1 and mat2, respectively. The matrix has an attribute, commonRownames, giving the
common rownames shared by the two matrices.
myMat1 <- matrix(rnorm(24), nrow=6, byrow=TRUE, dimnames=list(sprintf("R%d", 1:6), sprintf("C%d", 1:4))) myMat2 <- matrix(rnorm(35), nrow=7, byrow=TRUE, dimnames=list(sprintf("R%d", 7:1), sprintf("C%d", 1:5))) corByRownames(myMat1, myMat2)myMat1 <- matrix(rnorm(24), nrow=6, byrow=TRUE, dimnames=list(sprintf("R%d", 1:6), sprintf("C%d", 1:4))) myMat2 <- matrix(rnorm(35), nrow=7, byrow=TRUE, dimnames=list(sprintf("R%d", 7:1), sprintf("C%d", 1:5))) corByRownames(myMat1, myMat2)
Count tokens by splitting strings
countTokens(str, split = "\t", ...)countTokens(str, split = "\t", ...)
str |
A character string vector |
split |
Character used to split the strings |
... |
Other parameters passed to the |
Integer vector: count of tokens in the strings
Jitao David Zhang <[email protected]>
strsplit to split strings, or a convenient wrapper
strtoken in this package.
myStrings <- c("HSV\t1887\tFavorite", "FCB\t1900", "FCK\t1948") countTokens(myStrings) ## the function deals with factors as well countTokens(factor(myStrings))myStrings <- c("HSV\t1887\tFavorite", "FCB\t1900", "FCK\t1948") countTokens(myStrings) ## the function deals with factors as well countTokens(factor(myStrings))
The function is particularly useful for scripting.
createDir(dir, showWarnings = FALSE, recursive = TRUE, mode = "0777")createDir(dir, showWarnings = FALSE, recursive = TRUE, mode = "0777")
dir |
Directory name |
showWarnings |
Passed to dir.create |
recursive |
Passed to dir.create |
mode |
Passed to dir.create |
Directory name (invisible)
Jitao David Zhang <[email protected]>
tempdir <- tempdir() createDir(tempdir)tempdir <- tempdir() createDir(tempdir)
This function copies the skeleton RMarkdown template shipped with
ribiosUtils to the current working directory (or any specified path)
under a user-given file name. When no file name is provided, a default name
based on the current date is used.
createRmdTemplate(filename = NULL, overwrite = FALSE)createRmdTemplate(filename = NULL, overwrite = FALSE)
filename |
Character string, the name (or path) of the destination
file. If |
overwrite |
Logical, whether to overwrite an existing file. Default is
|
The path of the copied file (invisible).
Jitao David Zhang <[email protected]>
## Not run: ## copy with default file name createRmdTemplate() ## copy with a custom file name createRmdTemplate("my_analysis.Rmd") ## End(Not run) ## copy to a temporary directory dest <- createRmdTemplate(file.path(tempdir(), "test_report.Rmd")) file.exists(dest) unlink(dest)## Not run: ## copy with default file name createRmdTemplate() ## copy with a custom file name createRmdTemplate("my_analysis.Rmd") ## End(Not run) ## copy to a temporary directory dest <- createRmdTemplate(file.path(tempdir(), "test_report.Rmd")) file.exists(dest) unlink(dest)
Cumulative Jaccard Index
cumJaccardIndex(list) cumJaccardDistance(list)cumJaccardIndex(list) cumJaccardDistance(list)
list |
A list of characters or integers |
The cumulative Jaccard Index, a vector of values between 0 and 1, of the same length as the input list
The cumulative Jaccard Index is calculated by calculating the Jaccard Index
of element i and the union of elements between 1 and
i-1. The cumulative Jaccard Index of the first element is set as 0.0.
The cumulative Jaccard distance is defined in almost the same way, with the only difference the distance is returned. The value of the first element is 1.0.
An advantage of using cumulative overlap coefficient over cumulative Jaccard Index is that it is monotonic: the value is garanteed to decrease from 1 to 0, whereas the cumulative Jaccard Index may not be monotic.
myList <- list(first=LETTERS[1:5], second=LETTERS[6:10], third=LETTERS[8:12], fourth=LETTERS[1:12]) cumJaccardIndex(myList) cumJaccardDistance(myList)myList <- list(first=LETTERS[1:5], second=LETTERS[6:10], third=LETTERS[8:12], fourth=LETTERS[1:12]) cumJaccardIndex(myList) cumJaccardDistance(myList)
Cumulative overlap coefficient
cumOverlapCoefficient(list) cumOverlapDistance(list)cumOverlapCoefficient(list) cumOverlapDistance(list)
list |
A list of characters or integers |
The cumulative overlap coefficients, a vector of values between 0 and 1, of the same length as the input list
The cumulative overlap coefficient is calculated by calculating the overlap
coefficient of element i and the union of elements between 1
and i-1. The cumulative overlap coefficient of the first element is
set as 0.0.
The cumulative overlap distance is defined in almost the same way, with the
only difference the distance is returned. The value of the first element is
1.0. Pratically it is calculated by 1-cumOverlapCoefficient.
Since the denominator of the overlap coefficient is the size of the smaller
set of the two, which is bound to be the size of element i, the
cumulative overlap distance can be interpreted as the proportion of new
items in each new element that are unseen in previous elements. Similarly,
the cumulative overlap coefficient can be interpreted as the proportion of
items in each new element that have been seen in previous elements. See
examples below.
An advantage of using cumulative overlap coefficient over cumulative Jaccard Index is that it is monotonic: the value is garanteed to decrease from 1 to 0, whereas the cumulative Jaccard Index may not be monotic.
myList <- list(first=LETTERS[1:5], second=LETTERS[6:10], third=LETTERS[8:12], fourth=LETTERS[1:12]) cumOverlapCoefficient(myList) cumOverlapDistance(myList)myList <- list(first=LETTERS[1:5], second=LETTERS[6:10], third=LETTERS[8:12], fourth=LETTERS[1:12]) cumOverlapCoefficient(myList) cumOverlapDistance(myList)
Proportion of cumulative sum over sum
cumsumprop(x)cumsumprop(x)
x |
Numeric vector |
the proportion cumulative sum over sum
x <- 1:4 cumsumprop(x) ## 0.1, 0.3, 0.6, 1x <- 1:4 cumsumprop(x) ## 0.1, 0.3, 0.6, 1
Three types of labels (levels) are supported: “cut.default” (Interval
labels returned by cut as default), “left” (Left boundary of
intervals), and “right” (Right boundary of intervals).
cutInterval( x, step = 1, labelOption = c("cut.default", "left", "right"), include.lowest = FALSE, right = TRUE, dig.lab = 3, ordered_result = FALSE, ... )cutInterval( x, step = 1, labelOption = c("cut.default", "left", "right"), include.lowest = FALSE, right = TRUE, dig.lab = 3, ordered_result = FALSE, ... )
x |
A vector of numbers |
step |
Step size. |
labelOption |
How is the label displayed.See |
include.lowest |
Logical, passed to |
right |
Logial, passed to |
dig.lab |
See |
ordered_result |
See |
... |
Other parameters that are passed to |
A vector of factors
Jitao David Zhang <[email protected]>
testNum <- rnorm(100) (testFac <- cutInterval(testNum, step=1, labelOption="cut.default")) ## compare the result to (testFacCut <- cut(testNum, 10))testNum <- rnorm(100) (testFac <- cutInterval(testNum, step=1, labelOption="cut.default")) ## compare the result to (testFacCut <- cut(testNum, 10))
Cut a tree into groups of ordered sizes
cutreeIntoOrderedGroups(tree, k = NULL, h = NULL, decreasing = TRUE)cutreeIntoOrderedGroups(tree, k = NULL, h = NULL, decreasing = TRUE)
tree |
|
k |
an integer scalar or vector with the desired number of groups |
h |
numeric scalar or vector with heights where the tree should be cut. |
decreasing |
logical, should be the first group the largest? Cut a tree, e.g. as resulting from |
A named integer vector of cluster assignments, ordered by cluster size (largest first by default). If multiple values of k or h are provided, a matrix with one column per value.
hc <- hclust(dist(USArrests)) hck5 <- cutreeIntoOrderedGroups(hc, k = 5) table(hck5) ## compare with cutree, which does not order the groups table(cutree(hc, k=5)) hck25 <- cutreeIntoOrderedGroups(hc, k = 2:5) apply(hck25, 2, table)hc <- hclust(dist(USArrests)) hck5 <- cutreeIntoOrderedGroups(hc, k = 5) table(hck5) ## compare with cutree, which does not order the groups table(cutree(hc, k=5)) hck25 <- cutreeIntoOrderedGroups(hc, k = 2:5) apply(hck25, 2, table)
The function try to assign a factor vector for a data.frame object.
See details below.
dfFactor(df, sample.group)dfFactor(df, sample.group)
df |
A |
sample.group |
A character, number or a vector of factors, from which the factor vector should be deciphered. See details below. |
The function tries to get a factor vector of the same length as the number
of rows in the data.frame. The determination is done in the following
order: Step 1: It tries to find a column in the data.frame with the
name as given by sample.group. If found, this column is transformed
into a factor if not and returned. Step 2: It tries to interpret the
sample.group as an integer, as the index of the column in the
data.frame giving the factor. Step 3: When sample.group
itself is a vector of the same length as the data.frame, it is cast to
factor when it is still not and returned.
Otherwise the program stops with error.
A factor vector with the same length as the data.frame
Jitao David Zhang <[email protected]>
df <- data.frame(gender=c("M", "M", "F", "F", "M"), age=c(12,12,14,12,14), score=c("A", "B-", "C", "B-", "A")) dfFactor(df, "gender") dfFactor(df, "score") dfFactor(df, 1L) dfFactor(df, 2L) dfFactor(df, df$score)df <- data.frame(gender=c("M", "M", "F", "F", "M"), age=c(12,12,14,12,14), score=c("A", "B-", "C", "B-", "A")) dfFactor(df, "gender") dfFactor(df, "score") dfFactor(df, 1L) dfFactor(df, 2L) dfFactor(df, df$score)
Convert factor columns in a data.frame into character strings
dfFactor2Str(df)dfFactor2Str(df)
df |
A data.frame |
A data.frame with factor columns coereced into character strings
exampleDf <- data.frame(Teams=c("HSV", "FCB", "FCB", "HSV"), Player=c("Mueller", "Mueller", "Robben", "Holtby"), scores=c(3.5, 1.5, 1.5, 1.0), stringsAsFactors=TRUE) strDf <- dfFactor2Str(exampleDf) stopifnot(identical(strDf[,1], c("HSV", "FCB", "FCB", "HSV"))) stopifnot(identical(exampleDf[,1], factor(c("HSV", "FCB", "FCB", "HSV"))))exampleDf <- data.frame(Teams=c("HSV", "FCB", "FCB", "HSV"), Player=c("Mueller", "Mueller", "Robben", "Holtby"), scores=c(3.5, 1.5, 1.5, 1.0), stringsAsFactors=TRUE) strDf <- dfFactor2Str(exampleDf) stopifnot(identical(strDf[,1], c("HSV", "FCB", "FCB", "HSV"))) stopifnot(identical(exampleDf[,1], factor(c("HSV", "FCB", "FCB", "HSV"))))
Format labels for wells in microwell plates to equal widths
equateWellLabelWidth(wells)equateWellLabelWidth(wells)
wells |
A vector of character strings indicating well positions, they may be of different widths, for instance A1, A10, A12 |
A vector of the same length, with all labels adjusted to the equal width, with left-padding zeros added whenever it makes sense. If the input labels are already of the same length, no change is applied.
equateWellLabelWidth(c("A1", "C10", "D12"))equateWellLabelWidth(c("A1", "C10", "D12"))
Many files have base and extensions in their names, for instance for the
file mybook.pdf, the base is mybook and the extension is
pdf. basefilename extname functions extract these
information from one or more file names.
extname(x, ifnotfound = NA, lower.case = FALSE)extname(x, ifnotfound = NA, lower.case = FALSE)
x |
Character vector of file names; other classes will be coereced to characters |
ifnotfound |
If no extension name was found, the value to be returned.
Default is |
lower.case |
Logical, should the names returned in lower case? |
The base file name or the extension as characters, of the same
length as the input file name character. In case that a file name does not
contain a extension, NA will be returned.
In case there are multiple dots in the input file name, the last field
will be taken as the extension, and the rest as the base name. For instance
for file test.out.txt, returned base name is test.out and
extension is txt.
Jitao David Zhang <[email protected]>
extname("mybook.pdf") extname("sequence.in.fasta") extname(c("/path/mybook.pdf", "test.doc")) extname("README") extname("README", ifnotfound="") extname("/path/my\ home/Holiday Plan.txt") basefilename("mybook.pdf") basefilename("sequence.in.fasta") basefilename(c("/path/mybook.pdf", "test.doc")) basefilename("README") basefilename("/path/my\ home/Holiday Plan.txt") basefilename("myBook.pdf", lower.case=TRUE) extname("myBook.PDF", lower.case=TRUE)extname("mybook.pdf") extname("sequence.in.fasta") extname(c("/path/mybook.pdf", "test.doc")) extname("README") extname("README", ifnotfound="") extname("/path/my\ home/Holiday Plan.txt") basefilename("mybook.pdf") basefilename("sequence.in.fasta") basefilename(c("/path/mybook.pdf", "test.doc")) basefilename("README") basefilename("/path/my\ home/Holiday Plan.txt") basefilename("myBook.pdf", lower.case=TRUE) extname("myBook.PDF", lower.case=TRUE)
Make the first alphabet of strings uppercase
firstUp(str)firstUp(str)
str |
A vector of character strings |
A vector of the same length, with the first alphabet in uppercase
firstUp('test string') firstUp(strsplit('many many years ago', ' ')[[1]])firstUp('test string') firstUp(strsplit('many many years ago', ' ')[[1]])
Shorten strings to strings with a fix width of characters
fixWidthStr(str, nchar = 8, align = c("left", "right"))fixWidthStr(str, nchar = 8, align = c("left", "right"))
str |
A vector of strings |
nchar |
The fixed with |
align |
Character, how to align
Strings with more or fewer characters than |
A vector of strings with fixed widths
NA will be converted to characters and the same fixed width
will be applied. The behavior is different from shortenStr,
where NA is kept as it is.
inputStrs <- c("abc", "abcd", "abcde", "abcdefg", "NA", NA) outputStrs <- fixWidthStr(inputStrs, nchar=4) stopifnot(all(nchar(outputStrs)==4))inputStrs <- c("abc", "abcd", "abcde", "abcdefg", "NA", NA) outputStrs <- fixWidthStr(inputStrs, nchar=4) stopifnot(all(nchar(outputStrs)==4))
If any of the expressions in ‘...’ are not all TRUE, stop is called, producing an error message defined by the msg parameter.
haltifnot(..., msg = "Error undefined. Please contact the developer")haltifnot(..., msg = "Error undefined. Please contact the developer")
... |
any number of ‘logical’ R expressions, which should
evaluate to |
msg |
Error message. |
The function is adapted from the stopifnot function, with the
difference that the error message can be defined the programmer. With
haltifnot error message can be more informative, which is desired for
diagnostic and user-interation purposes.
NULL if all statements in ... are TRUE
Jitao David Zhang <[email protected]>
haltifnot(1==1, all.equal(pi, 3.14159265), 1<2) ## all TRUE m <- matrix(c(1,3,3,1), 2,2) haltifnot(m == t(m), diag(m) == rep(1,2)) ## all TRUE op <- options(error = expression(NULL)) # "disable stop(.)" << Use with CARE! >> haltifnot(all.equal(pi, 3.141593), 2 < 2, all(1:10 < 12), "a" < "b", msg="not all conditions are TRUE. Please contact the devleoper") options(op)# revert to previous error handlerhaltifnot(1==1, all.equal(pi, 3.14159265), 1<2) ## all TRUE m <- matrix(c(1,3,3,1), 2,2) haltifnot(m == t(m), diag(m) == rep(1,2)) ## all TRUE op <- options(error = expression(NULL)) # "disable stop(.)" << Use with CARE! >> haltifnot(all.equal(pi, 3.141593), 2 < 2, all(1:10 < 12), "a" < "b", msg="not all conditions are TRUE. Please contact the devleoper") options(op)# revert to previous error handler
These two functions reassembles head and tail, showing the
first rows and columns of 2D data structures, e.g. matrix or data.frame.
headhead(x, m = 6L, n = 6L)headhead(x, m = 6L, n = 6L)
x |
A |
m |
Integer, number of rows to show |
n |
Integer, number of columns to show |
While head and tail can be applied to data.frame or
matrix as well, they show all columns of the first (last) rows even
if the matrix has a large number of columns. These two function,
headhead and tailtail, circumvent this problem by showing only
the first rows AND the first columns.
The first rows/columns of the input object
Jitao David Zhang <[email protected]>
myMat <- matrix(rnorm(10000), nrow=10L) head(myMat) headhead(myMat) tailtail(myMat)myMat <- matrix(rnorm(10000), nrow=10L) head(myMat) headhead(myMat) tailtail(myMat)
This function prints head and tail elements of a vector for visualization purposes. See examples for its usage.
headtail(vec, head = 2, tail = 1, collapse = ", ")headtail(vec, head = 2, tail = 1, collapse = ", ")
vec |
A vector of native types (e.g. character strings) |
head |
Integer, number of head elements to be printed |
tail |
Integer, number of tail elements to be printed |
collapse |
Character string, used to collapse elements |
Head and tail elements are concatenated with ellipsis, if there are any elements that are not shown in the vector.
A character string representing the vector
Jitao David Zhang <[email protected]>
testVec1 <- LETTERS[1:10] headtail(testVec1) headtail(testVec1, head=3, tail=3) headtail(testVec1, head=3, tail=3, collapse="|") testVec2 <- letters[1:3] headtail(testVec2, head=1, tail=1) headtail(testVec2, head=2, tail=1)testVec1 <- LETTERS[1:10] headtail(testVec1) headtail(testVec1, head=3, tail=3) headtail(testVec1, head=3, tail=3, collapse="|") testVec2 <- letters[1:3] headtail(testVec2, head=1, tail=1) headtail(testVec2, head=2, tail=1)
Test whether two matrices are identical by values and by dim names
identicalMatrix(x, y, epsilon = 1e-12)identicalMatrix(x, y, epsilon = 1e-12)
x |
a matrix |
y |
another matrix |
epsilon |
accuracy threshold: absolute differences below this threshold is ignored |
Logical
set.seed(1887); x <- matrix(rnorm(1000), nrow=10, dimnames=list(LETTERS[1:10],NULL)) set.seed(1887); y <- matrix(rnorm(1000), nrow=10, dimnames=list(LETTERS[1:10],NULL)) set.seed(1887); z <- matrix(rnorm(1000), nrow=10, dimnames=list(letters[1:10],NULL)) stopifnot(identicalMatrix(x,y)) stopifnot(!identicalMatrix(x,z))set.seed(1887); x <- matrix(rnorm(1000), nrow=10, dimnames=list(LETTERS[1:10],NULL)) set.seed(1887); y <- matrix(rnorm(1000), nrow=10, dimnames=list(LETTERS[1:10],NULL)) set.seed(1887); z <- matrix(rnorm(1000), nrow=10, dimnames=list(letters[1:10],NULL)) stopifnot(identicalMatrix(x,y)) stopifnot(!identicalMatrix(x,z))
Test whether two matrices have the same numerica values given certain accuracy
identicalMatrixValue(x, y, epsilon = 1e-12)identicalMatrixValue(x, y, epsilon = 1e-12)
x |
a matrix |
y |
another matrix |
epsilon |
accuracy threshold: absolute differences below this threshold is ignored |
Logical
set.seed(1887); x <- matrix(rnorm(1000), nrow=10) set.seed(1887); y <- matrix(rnorm(1000), nrow=10) set.seed(1882); z <- matrix(rnorm(1000), nrow=10) stopifnot(identicalMatrixValue(x,y)) stopifnot(!identicalMatrixValue(x,y+1E-5)) stopifnot(!identicalMatrixValue(x,y-1E-5)) stopifnot(!identicalMatrixValue(x,z))set.seed(1887); x <- matrix(rnorm(1000), nrow=10) set.seed(1887); y <- matrix(rnorm(1000), nrow=10) set.seed(1882); z <- matrix(rnorm(1000), nrow=10) stopifnot(identicalMatrixValue(x,y)) stopifnot(!identicalMatrixValue(x,y+1E-5)) stopifnot(!identicalMatrixValue(x,y-1E-5)) stopifnot(!identicalMatrixValue(x,z))
Case-insensitive match and pmatch functions, especially useful
in parsing user inputs, e.g. from command line.
imatch(x, table, ...)imatch(x, table, ...)
x |
String vector |
table |
A vector to be matched |
... |
Other parameters passed to |
imatch and ipmatch works similar as match and
pmatch, except that they are case-insensitive.
matchv, imatchv and ipmatchv are shot-cuts to get the
matched value (therefore the ‘v’) if the match succeeded, or
NA if not. match(x, table) is equivalent to
table[match(x, table)]. See examples.
imatch and ipmatch returns matching indices, or
NA (by default) if the match failed.
matchv, imatchv and ipmatchv returns the matching
element in table, or NA if the match failed. Note that when
cases are different in x and table, the one in table
will be returned. This is especially useful for cases where user's input has
different cases as the internal options.
Jitao David Zhang <[email protected]>
user.input <- c("hsv", "BvB") user.input2 <- c("HS", "BV") internal.options <- c("HSV", "FCB", "BVB", "FCN") match(user.input, internal.options) imatch(user.input, internal.options) ipmatch(user.input, internal.options) ipmatch(user.input2, internal.options) matchv(user.input, internal.options) matchv(tolower(user.input), tolower(internal.options)) imatchv(user.input, internal.options) ipmatchv(user.input, internal.options) ipmatchv(user.input2, internal.options)user.input <- c("hsv", "BvB") user.input2 <- c("HS", "BV") internal.options <- c("HSV", "FCB", "BVB", "FCN") match(user.input, internal.options) imatch(user.input, internal.options) ipmatch(user.input, internal.options) ipmatch(user.input2, internal.options) matchv(user.input, internal.options) matchv(tolower(user.input), tolower(internal.options)) imatchv(user.input, internal.options) ipmatchv(user.input, internal.options) ipmatchv(user.input2, internal.options)
Invert the names and elements of a list
invertList(inputList, simplify = FALSE)invertList(inputList, simplify = FALSE)
inputList |
a list, other classed (e.g. named vectors) will be converted to lists |
simplify |
Logical, if yes and if no duplicated names, return a vector |
A list with values from the input becoming names and vice versa. When simplify=TRUE and there are no duplicated names, a named character vector is returned instead.
myList <- list("A"=c("a", "alpha"), "B"=c("b", "Beta"), "C"="c") invertList(myList) invertList(myList, simplify=TRUE)myList <- list("A"=c("a", "alpha"), "B"=c("b", "Beta"), "C"="c") invertList(myList) invertList(myList, simplify=TRUE)
Checks whether given character strings point to valid directories
isDir(...) checkDir(...) assertDir(...)isDir(...) checkDir(...) assertDir(...)
... |
One or more character strings giving directory names to be tested |
isDir tests whether the given string represent a valid, existing
directory. assertDir performs a logical test, and stops the program
if the given string does not point to a given directory.
checkDir is synonymous to isDir
isDir returns logical vector.
assertDir returns an invisible TRUE if directories exist,
otherwise halts and prints error messages.
Jitao David Zhang <[email protected]>
file.info, checkFile and
assertFile
dir1 <- tempdir() dir2 <- tempdir() isDir(dir1, dir2) assertDir(dir1, dir2)dir1 <- tempdir() dir2 <- tempdir() isDir(dir1, dir2) assertDir(dir1, dir2)
Determines whether an object is of class try-error
isError(x)isError(x)
x |
Any object, potentially produced within a |
Logical value, TRUE if x inherits the try-error
class.
Jitao David Zhang <[email protected]>
if(exists("nonExistObj")) rm(nonExistsObj) myObj <- try(nonExistObj/5, silent=TRUE) isError(myObj)if(exists("nonExistObj")) rm(nonExistsObj) myObj <- try(nonExistObj/5, silent=TRUE) isError(myObj)
Whether an integer is odd (or even)
isOdd(x) isEven(x)isOdd(x) isEven(x)
x |
An integer. |
Logical, whether the input number is odd or even.
isOdd and isEven returns whether an integer is odd or even,
respectively.
isOdd(3) isEven(4)isOdd(3) isEven(4)
Tell whether a character string is a Roche compound ID
isRocheCompoundID(str)isRocheCompoundID(str)
str |
Character string(s) |
A logical vector of the same length as str, indicating whether each element is a Roche compound ID or not.
Short versions (RO[1-9]{2,7}) are supported.
isRocheCompoundID(c("RO1234567", "RO-1234567", "RO1234567-000", "RO1234567-000-000", "ROnoise-000-000"))isRocheCompoundID(c("RO1234567", "RO-1234567", "RO1234567-000", "RO1234567-000-000", "ROnoise-000-000"))
Logical vector of being top or included and not excluded
isTopOrIncAndNotExcl(x, top = 1, incFunc, excFunc, decreasing = TRUE)isTopOrIncAndNotExcl(x, top = 1, incFunc, excFunc, decreasing = TRUE)
x |
An atomic vector that can be sorted by |
top |
Integer, number of top elements that we want to consider. |
incFunc |
Function, applied to |
excFunc |
Function, applied to |
decreasing |
Logical, passed to |
A logical vector of the same length as the input x, indicating whether each element is being either top or included, and not excluded.
The function can be used to keep top elements of a vector while considering both inclusion and exclusion criteria.
myVal <- c(2, 4, 8, 7, 1) isTopOrIncAndNotExcl(myVal, top=1) isTopOrIncAndNotExcl(myVal, top=3) isTopOrIncAndNotExcl(myVal, top=3, incFunc=function(x) x>=2) isTopOrIncAndNotExcl(myVal, top=3, excFunc=function(x) x%%2==1) isTopOrIncAndNotExcl(myVal, top=3, incFunc=function(x) x>=2, excFunc=function(x) x%%2==1) myVal2 <- c("a", "A", "a", "A", "A") isTopOrIncAndNotExcl(myVal2, 2) isTopOrIncAndNotExcl(myVal2, 2, incFunc=function(x) x=="A") isTopOrIncAndNotExcl(myVal2, 4) isTopOrIncAndNotExcl(myVal2, 4, excFunc=function(x) x=="a") ## the function returns all TRUEs if top is larger than the length of the vector isTopOrIncAndNotExcl(myVal, top=9)myVal <- c(2, 4, 8, 7, 1) isTopOrIncAndNotExcl(myVal, top=1) isTopOrIncAndNotExcl(myVal, top=3) isTopOrIncAndNotExcl(myVal, top=3, incFunc=function(x) x>=2) isTopOrIncAndNotExcl(myVal, top=3, excFunc=function(x) x%%2==1) isTopOrIncAndNotExcl(myVal, top=3, incFunc=function(x) x>=2, excFunc=function(x) x%%2==1) myVal2 <- c("a", "A", "a", "A", "A") isTopOrIncAndNotExcl(myVal2, 2) isTopOrIncAndNotExcl(myVal2, 2, incFunc=function(x) x=="A") isTopOrIncAndNotExcl(myVal2, 4) isTopOrIncAndNotExcl(myVal2, 4, excFunc=function(x) x=="a") ## the function returns all TRUEs if top is larger than the length of the vector isTopOrIncAndNotExcl(myVal, top=9)
Calculate the Jaccard Index between two vectors
jaccardIndex(x, y) jaccardDistance(x, y)jaccardIndex(x, y) jaccardDistance(x, y)
x |
A vector |
y |
A vector |
The Jaccard Index, a number between 0 and 1
JaccardDistance is defined as 1-JaccardIndex.
myX <- 1:6 myY <- 4:9 jaccardIndex(myX, myY) jaccardDistance(myX, myY) myX <- LETTERS[1:5] myY <- LETTERS[6:10] jaccardIndex(myX, myY) jaccardDistance(myX, myY)myX <- 1:6 myY <- 4:9 jaccardIndex(myX, myY) jaccardDistance(myX, myY) myX <- LETTERS[1:5] myY <- LETTERS[6:10] jaccardIndex(myX, myY) jaccardDistance(myX, myY)
A common task in expression analysis is to collapse multiple features that are mapped to the same gene by some statistic. This function does this job by keeping the matrix row (normally features) with the higheest statistic specified by the user.
keepMaxStatRow( matrix, keys, keepNArows = TRUE, stat = function(x) mean(x, na.rm = TRUE), levels = c("rownames", "attribute", "discard"), ... )keepMaxStatRow( matrix, keys, keepNArows = TRUE, stat = function(x) mean(x, na.rm = TRUE), levels = c("rownames", "attribute", "discard"), ... )
matrix |
A numeric matrix |
keys |
A vector of character giving the keys the rows are mapped to. A common scenario is that each row represents one probeset, while the vector keys give the genes that the probesets are mapped to. Thus keys can be redundant, namely multiple probesets can map to the same gene. |
keepNArows |
Logical, whether rows with |
stat |
The function to calculate the univariate statistic. By default
the |
levels |
How should the information of the levels of keys, e.g. unique
keys, be kept. |
... |
Other parameters passed to the |
isMaxStatRow returns a logical vector, with rows with maximal
statistics each key as TRUE and otherwise as FALSE.
keepMaxStatRowInd returns the integer indices of such rows. Finally
keepMaxStatRow returns the resulting matrices.
For use see examples
A numeric matrix with rows mapped to unique keys, selected by the maximum statistics. See examples below
Jitao David Zhang <[email protected]>
myFun1 <- function(x) mean(x, na.rm=TRUE) myFun2 <- function(x) sd(x, na.rm=TRUE) mat1 <- matrix(c(1,3,4,-5, 0,1,2,3, 7,9,5,3, 0,1,4,3), ncol=4, byrow=TRUE) keys1 <- c("A", "B", "A", "B") isMaxStatRow(mat1, keys1, stat=myFun1) isMaxStatRow(mat1, keys1, stat=myFun2) keepMaxStatRowInd(mat1, keys1, stat=myFun1) keepMaxStatRowInd(mat1, keys1, stat=myFun2) keepMaxStatRow(mat1, keys1, stat=myFun1) keepMaxStatRow(mat1, keys1, stat="myFun2") keepMaxStatRow(mat1, keys1, stat="myFun2", levels="discard") keepMaxStatRow(mat1, keys1, stat="myFun2", levels="attribute") mat2 <- matrix(c(1,3,4,5, 0,1,2,3, 7,9,5,3, 0,1,4,3, 4,0,-1,3.1, 9,4,-3,2, 8,9,1,2, 0.1,0.2,0.5,NA, NA, 4, 3,NA), ncol=4, byrow=TRUE, dimnames=list(LETTERS[1:9], NULL)) keys2 <- c("A", "B", "A", "B", NA, NA, "C", "A", "D") isMaxStatRow(mat2, keys2, keepNArows=FALSE, stat=myFun1) keepMaxStatRowInd(mat2, keys2, keepNArows=FALSE, stat=myFun1) keepMaxStatRow(mat2, keys2, keepNArows=FALSE, stat=myFun1) keepMaxStatRow(mat2, keys2, keepNArows=TRUE, stat=myFun1) keepMaxStatRow(mat2, keys2, keepNArows=TRUE, stat=myFun1, levels="discard") keepMaxStatRow(mat2, keys2, keepNArows=TRUE, stat=myFun1, levels="attribute")myFun1 <- function(x) mean(x, na.rm=TRUE) myFun2 <- function(x) sd(x, na.rm=TRUE) mat1 <- matrix(c(1,3,4,-5, 0,1,2,3, 7,9,5,3, 0,1,4,3), ncol=4, byrow=TRUE) keys1 <- c("A", "B", "A", "B") isMaxStatRow(mat1, keys1, stat=myFun1) isMaxStatRow(mat1, keys1, stat=myFun2) keepMaxStatRowInd(mat1, keys1, stat=myFun1) keepMaxStatRowInd(mat1, keys1, stat=myFun2) keepMaxStatRow(mat1, keys1, stat=myFun1) keepMaxStatRow(mat1, keys1, stat="myFun2") keepMaxStatRow(mat1, keys1, stat="myFun2", levels="discard") keepMaxStatRow(mat1, keys1, stat="myFun2", levels="attribute") mat2 <- matrix(c(1,3,4,5, 0,1,2,3, 7,9,5,3, 0,1,4,3, 4,0,-1,3.1, 9,4,-3,2, 8,9,1,2, 0.1,0.2,0.5,NA, NA, 4, 3,NA), ncol=4, byrow=TRUE, dimnames=list(LETTERS[1:9], NULL)) keys2 <- c("A", "B", "A", "B", NA, NA, "C", "A", "D") isMaxStatRow(mat2, keys2, keepNArows=FALSE, stat=myFun1) keepMaxStatRowInd(mat2, keys2, keepNArows=FALSE, stat=myFun1) keepMaxStatRow(mat2, keys2, keepNArows=FALSE, stat=myFun1) keepMaxStatRow(mat2, keys2, keepNArows=TRUE, stat=myFun1) keepMaxStatRow(mat2, keys2, keepNArows=TRUE, stat=myFun1, levels="discard") keepMaxStatRow(mat2, keys2, keepNArows=TRUE, stat=myFun1, levels="attribute")
Return last characters from strings
lastChar(str)lastChar(str)
str |
A vector of character strings |
A vector of the same length, containing last characters
lastChar("Go tell it on the mountain") lastChar(c("HSV", "FCB", "BVB"))lastChar("Go tell it on the mountain") lastChar(c("HSV", "FCB", "BVB"))
The specified library is loaded mutedly by suppressing all messages. If the
library is not found, or its version under the specification of
minVer, the R session dies with a message.
libordie(package, minVer, missing.quit.status = 1, ver.quit.status = 1)libordie(package, minVer, missing.quit.status = 1, ver.quit.status = 1)
package |
One package name (can be character or non-quoted symbol (see examples) |
minVer |
Optional, character string, the minimum working version |
missing.quit.status |
Integer, the status of quitting when the package was not found |
ver.quit.status |
Integer, the status of quitting when the package was found, but older than the minimum working version |
Only one package should be tested once.
NULL if success, otherwise the session will be killed.
Jitao David Zhang <[email protected]>
The function calls qqmsg internally to kill the
session
libordie(stats) libordie("methods") libordie(base, minVer="2.15-1")libordie(stats) libordie("methods") libordie(base, minVer="2.15-1")
Transform a list of character strings into a data.frame
list2df(list, names = NULL, col.names = c("Name", "Item"))list2df(list, names = NULL, col.names = c("Name", "Item"))
list |
A list of character strings |
names |
Values in the 'Name' column of the result, used if the input list has no names |
col.names |
Column names of the |
A data.frame
myList <- list(HSV=c("Mueller", "Papadopoulos", "Wood"), FCB=c("Lewandowski", "Robben", "Hummels"), BVB=c("Reus", "Goetze", "Kagawa")) list2df(myList, col.names=c("Club", "Player"))myList <- list(HSV=c("Mueller", "Papadopoulos", "Wood"), FCB=c("Lewandowski", "Robben", "Hummels"), BVB=c("Reus", "Goetze", "Kagawa")) list2df(myList, col.names=c("Club", "Player"))
Pairwise overlap coefficient of lists
listOverlapCoefficient(x, y = NULL, checkUniqueNonNA = TRUE)listOverlapCoefficient(x, y = NULL, checkUniqueNonNA = TRUE)
x |
A list of vectors that are interpreted as sets of elements |
y |
A list of vectors that are interpreted as sets of elements. In case of |
checkUniqueNonNA |
Logical, should vectors in the list be first cleaned up so that NA values
are removed and the elements are made unique? Default is set as |
A matrix of column-wise pairwise overlap coefficients.
set.seed(1887) testSets1 <- sapply(rbinom(10, size=26, prob=0.3), function(x) sample(LETTERS, x, replace=FALSE)) names(testSets1) <- sprintf("List%d", seq(along=testSets1)) testSets1Poe <- listOverlapCoefficient(testSets1) testSets1PoeNoCheck <- listOverlapCoefficient(testSets1, checkUniqueNonNA=FALSE) stopifnot(identical(testSets1Poe, testSets1PoeNoCheck)) testSets2 <- sapply(rbinom(15, size=26, prob=0.3), function(x) sample(LETTERS, x, replace=FALSE)) names(testSets2) <- sprintf("AnotherList%d", seq(along=testSets2)) testSets12Poe <- listOverlapCoefficient(testSets1, testSets2)set.seed(1887) testSets1 <- sapply(rbinom(10, size=26, prob=0.3), function(x) sample(LETTERS, x, replace=FALSE)) names(testSets1) <- sprintf("List%d", seq(along=testSets1)) testSets1Poe <- listOverlapCoefficient(testSets1) testSets1PoeNoCheck <- listOverlapCoefficient(testSets1, checkUniqueNonNA=FALSE) stopifnot(identical(testSets1Poe, testSets1PoeNoCheck)) testSets2 <- sapply(rbinom(15, size=26, prob=0.3), function(x) sample(LETTERS, x, replace=FALSE)) names(testSets2) <- sprintf("AnotherList%d", seq(along=testSets2)) testSets12Poe <- listOverlapCoefficient(testSets1, testSets2)
Input data.frame must contain at least three columns: one contains row names
(specified by row.col), one contains column names
(column.col), and one contains values in matrix cells
(value.col). The output is a 2D matrix.
longdf2matrix( df, row.col = 1L, column.col = 2L, value.col = 3L, missingValue = NULL )longdf2matrix( df, row.col = 1L, column.col = 2L, value.col = 3L, missingValue = NULL )
df |
Long-format data frame |
row.col |
Character or integer, which column of the input data.frame contains row names? |
column.col |
Character or integer, which column contains column names? |
value.col |
Character or integer, which column contains matrix values? |
missingValue |
Values assigned in case of missing data |
A 2D matrix equivalent to the long-format data frame
Jitao David Zhang <[email protected]>
matrix2longdf
test.df <- data.frame(H=c("HSV", "BVB", "HSV", "BVB"), A=c("FCB", "S04", "S04", "FCB"), score=c(3, 1, 1, 0)) longdf2matrix(test.df, row.col=1L, column.col=2L, value.col=3L) data(Indometh) longdf2matrix(Indometh, row.col="time", column.col="Subject",value.col="conc") longdf2matrix(Indometh, row.col="Subject", column.col="time", value.col="conc")test.df <- data.frame(H=c("HSV", "BVB", "HSV", "BVB"), A=c("FCB", "S04", "S04", "FCB"), score=c(3, 1, 1, 0)) longdf2matrix(test.df, row.col=1L, column.col=2L, value.col=3L) data(Indometh) longdf2matrix(Indometh, row.col="time", column.col="Subject",value.col="conc") longdf2matrix(Indometh, row.col="Subject", column.col="time", value.col="conc")
Given a vector known as master vcector, a data.frame and one column of the
data.frame, the function matchColumnIndex matches the values in the
column to the master vector, and returns the indices of each value in the
column with respect to the vector. The function matchColumn returns
whole or subset of the data.frame, with the matching column in the exact
order of the vector.
matchColumn(vector, data.frame, column, multi = FALSE)matchColumn(vector, data.frame, column, multi = FALSE)
vector |
A vector, probably of character strings. |
data.frame |
A |
column |
The column name (character) or index (integer between 1 and
the column number), indicating the column to be matched. Exceptionally
|
multi |
Logical, deciding what to do if a value in the vector is
matched to several values in the data.frame column. If set to |
See more details below.
The function is used to address the following question: how can one order a
data.frame by values of one of its columns, the order for which is
given in a vector (known as “master vector”). matchColumnIndex
and matchColumn provide thoroughly-tested implementation to address
this question.
For one-to-one cases, where both the column and the vector have no
duplicates and can be matched one-to-one, the question is straightforward to
solve with the match function in R. In one-to-many or
many-to-many matching cases, the parameter multi determines
whether multiple rows matching the same value should be shown. If
mutli=FALSE, then the sorted data.frame that are returned has exactly
the same row number as the input vector; otherwise, the returned data.frame
has more rows. See the examples below.
In either case, in the returned data.frame object by
matchColumn, values in the column used for matching are overwritten
by the master vector.If multi=TRUE, the order of values in the column
is also obeying the order of the master vector, with exceptions of repeating
values casued by mutliple matching.
The column parameter can be either character string or non-negative
integers. In the exceptional case, where column=0L (“L”
indicates integer), the row names of the data.frame is used for
matching instead of any of the columns.
Both functions are NA-friendly, since NAs in neither vector nor column should break the code.
For matchColumnIndex, if multi is set to FALSE,
an integer vector of the same length as the master vector, indicating the
order of the data.frame rows by which the column can be re-organized
into the master vector. When multi is TRUE, the returning
object is a list of the same length as the master vector, each item
containing the index (indices) of data.frame rows which match to the master
vector.
For matchColumn, a data.frame is always returned. In case
multi=FALSE, the returning data frame has the same number of rows as
the length of the input master vector, and the column which was specified to
match contains the master vector in its order. If multi=TRUE,
returned data frame can contain equal or more numbers of rows than the
master vector, and multiple-matched items are repeated.
When multi=TRUE, the indices within each list element
(for matchColumnIndex) are returned in ascending order.
Jitao David Zhang <[email protected]>
See match for basic matching operations.
df <- data.frame(Team=c("HSV", "BVB", "HSC", "FCB", "HSV"), Pkt=c(25,23,12,18,21), row.names=c("C", "B", "A", "F", "E")) teams <- c("HSV", "BVB", "BRE", NA) ind <- c("C", "A", "G", "F", "C", "B", "B", NA) matchColumnIndex(teams, df, 1L, multi=FALSE) matchColumnIndex(teams, df, 1L, multi=TRUE) matchColumnIndex(teams, df, "Team", multi=FALSE) matchColumnIndex(teams, df, "Team", multi=TRUE) matchColumnIndex(teams, df, 0, multi=FALSE) matchColumnIndex(ind, df, 0, multi=FALSE) matchColumnIndex(ind, df, 0, multi=TRUE) matchColumn(teams, df, 1L, multi=FALSE) matchColumn(teams, df, 1L, multi=TRUE) matchColumn(teams, df, "Team", multi=FALSE) matchColumn(teams, df, "Team", multi=TRUE) matchColumn(ind, df, 0, multi=FALSE) matchColumn(ind, df, 0, multi=TRUE)df <- data.frame(Team=c("HSV", "BVB", "HSC", "FCB", "HSV"), Pkt=c(25,23,12,18,21), row.names=c("C", "B", "A", "F", "E")) teams <- c("HSV", "BVB", "BRE", NA) ind <- c("C", "A", "G", "F", "C", "B", "B", NA) matchColumnIndex(teams, df, 1L, multi=FALSE) matchColumnIndex(teams, df, 1L, multi=TRUE) matchColumnIndex(teams, df, "Team", multi=FALSE) matchColumnIndex(teams, df, "Team", multi=TRUE) matchColumnIndex(teams, df, 0, multi=FALSE) matchColumnIndex(ind, df, 0, multi=FALSE) matchColumnIndex(ind, df, 0, multi=TRUE) matchColumn(teams, df, 1L, multi=FALSE) matchColumn(teams, df, 1L, multi=TRUE) matchColumn(teams, df, "Team", multi=FALSE) matchColumn(teams, df, "Team", multi=TRUE) matchColumn(ind, df, 0, multi=FALSE) matchColumn(ind, df, 0, multi=TRUE)
Match a given vector to column names of a data.frame or matrix
matchColumnName(data.frame.cols, reqCols, ignore.case = FALSE)matchColumnName(data.frame.cols, reqCols, ignore.case = FALSE)
data.frame.cols |
column names of a data.frame. One can also provide a data.frame, which may however cause worse performance since the data.frame is copied |
reqCols |
required columns |
ignore.case |
logical, whether the case is considered |
A vector of integers as indices
myTestDf <- data.frame(HBV=1:3, VFB=0:2, BVB=4:6, FCB=2:4) myFavTeams <- c("HBV", "BVB") matchColumnName(myTestDf, myFavTeams) myFavTeamsCase <- c("hbv", "bVb") matchColumnName(myTestDf, myFavTeamsCase, ignore.case=TRUE) ## NA will be returned in this case if ignore.case is set to FALSE matchColumnName(myTestDf, myFavTeamsCase, ignore.case=FALSE)myTestDf <- data.frame(HBV=1:3, VFB=0:2, BVB=4:6, FCB=2:4) myFavTeams <- c("HBV", "BVB") matchColumnName(myTestDf, myFavTeams) myFavTeamsCase <- c("hbv", "bVb") matchColumnName(myTestDf, myFavTeamsCase, ignore.case=TRUE) ## NA will be returned in this case if ignore.case is set to FALSE matchColumnName(myTestDf, myFavTeamsCase, ignore.case=FALSE)
The function converts a matrix into a long-format, three-column data.frame, containing row, columna nd value. Such ‘long’ data.frames can be useful in data visualization and modelling.
matrix2longdf( mat, row.names, col.names, longdf.colnames = c("row", "column", "value") )matrix2longdf( mat, row.names, col.names, longdf.colnames = c("row", "column", "value") )
mat |
A matrix |
row.names |
Character, row names to appear in the |
col.names |
Charater, column names to appear in the |
longdf.colnames |
Character, column names of the output long data frame |
The function converts a matrix into a three-column, ‘long’ format data.frame containing row names, column names, and values of the matrix.
A data.frame object with three columns: row,
column and value. If the input matrix is of dimesion
MxN, the returning data.frame is of the dimension MNx3.
The length of row.names and col.names should be as the
same as the matrix dimension. Otherwise the function raises warnings.
Jitao David Zhang <[email protected]>
test.mat <- matrix(1:12, ncol=4, nrow=3, dimnames=list(LETTERS[1:3], LETTERS[1:4])) print(test.mat) print(matrix2longdf(test.mat)) print(matrix2longdf(test.mat, longdf.colnames=c("From", "To", "Time")))test.mat <- matrix(1:12, ncol=4, nrow=3, dimnames=list(LETTERS[1:3], LETTERS[1:4])) print(test.mat) print(matrix2longdf(test.mat)) print(matrix2longdf(test.mat, longdf.colnames=c("From", "To", "Time")))
Merge infrequent levels by setting the threshold of the proportion of cumulative sum over sum a.k.a. cumsumprop
mergeInfreqLevelsByCumsumprop( classes, thr = 0.9, mergedLevel = "others", returnFactor = TRUE )mergeInfreqLevelsByCumsumprop( classes, thr = 0.9, mergedLevel = "others", returnFactor = TRUE )
classes |
Character strings or factor. |
thr |
Numeric, between 0 and 1, how to define frequent levels. Default: 0.9, namely levels which make up over 90% of all instances. |
mergedLevel |
Character, how the merged level should be named. |
returnFactor |
Logical, whether the value returned should be coereced into a factor. |
A character string vector or a factor, of the same length as the
input classes, but with potentially fewer levels.
In case only one class is deemed as infrequent, its label is unchanged.
set.seed(1887) myVals <- sample(c(rep("A", 4), rep("B", 3), rep("C", 2), "D")) ## in the example below, since A, B, C make up of 90% of the total, ## D is infrequent. Since it is alone, it is not merged mergeInfreqLevelsByCumsumprop(myVals, 0.9) mergeInfreqLevelsByCumsumprop(myVals, 0.9, returnFactor=FALSE) ## return characters ## in the example below, since A and B make up 70% of the total, ## and A, B, C 90%, they are all frequent and D is infrequent. ## Following the logic above, no merging happens mergeInfreqLevelsByCumsumprop(myVals, 0.8) mergeInfreqLevelsByCumsumprop(myVals, 0.7) ## A and B are left, C and D are merged mergeInfreqLevelsByCumsumprop(myVals, 0.5) ## A and B are left, C and D are merged mergeInfreqLevelsByCumsumprop(myVals, 0.4) ## A is left mergeInfreqLevelsByCumsumprop(myVals, 0.3) ## A is leftset.seed(1887) myVals <- sample(c(rep("A", 4), rep("B", 3), rep("C", 2), "D")) ## in the example below, since A, B, C make up of 90% of the total, ## D is infrequent. Since it is alone, it is not merged mergeInfreqLevelsByCumsumprop(myVals, 0.9) mergeInfreqLevelsByCumsumprop(myVals, 0.9, returnFactor=FALSE) ## return characters ## in the example below, since A and B make up 70% of the total, ## and A, B, C 90%, they are all frequent and D is infrequent. ## Following the logic above, no merging happens mergeInfreqLevelsByCumsumprop(myVals, 0.8) mergeInfreqLevelsByCumsumprop(myVals, 0.7) ## A and B are left, C and D are merged mergeInfreqLevelsByCumsumprop(myVals, 0.5) ## A and B are left, C and D are merged mergeInfreqLevelsByCumsumprop(myVals, 0.4) ## A is left mergeInfreqLevelsByCumsumprop(myVals, 0.3) ## A is left
Testing whether multiple objects are identical
midentical( ..., num.eq = TRUE, single.NA = TRUE, attrib.as.set = TRUE, ignore.bytecode = TRUE, ignore.environment = FALSE, ignore.srcref = TRUE, extptr.as.ref = FALSE )midentical( ..., num.eq = TRUE, single.NA = TRUE, attrib.as.set = TRUE, ignore.bytecode = TRUE, ignore.environment = FALSE, ignore.srcref = TRUE, extptr.as.ref = FALSE )
... |
Objects to be tested, or a list of them |
num.eq, single.NA, attrib.as.set, ignore.bytecode
|
See
|
ignore.environment, ignore.srcref
|
See |
extptr.as.ref |
See |
midentical extends identical to test multiple objects instead
of only two.
A logical value, TRUE if all objects are identical
Jitao David Zhang <[email protected]>
identical
set1 <- "HSV" set2 <- set3 <- set4 <- c("HSV", "FCB") midentical(set1, set2) midentical(list(set1, set2)) midentical(set2, set3, set4) midentical(list(set2, set3, set4)) ## other options passed to identical midentical(0, -0, +0, num.eq=FALSE) midentical(0, -0, +0, num.eq=TRUE)set1 <- "HSV" set2 <- set3 <- set4 <- c("HSV", "FCB") midentical(set1, set2) midentical(list(set1, set2)) midentical(set2, set3, set4) midentical(list(set2, set3, set4)) ## other options passed to identical midentical(0, -0, +0, num.eq=FALSE) midentical(0, -0, +0, num.eq=TRUE)
Multiple matching between two vectors. Different from R-native match
function, where only one match is returned even if there are multiple
matches, mmatch returns all of them.
mmatch(x, table, nomatch = NA_integer_)mmatch(x, table, nomatch = NA_integer_)
x |
vector or |
table |
vector or |
nomatch |
the value to be returned in case when no match is found. |
Multiple matches can be useful in many cases, and there is no native R
function for this purpose. User can write their own functions combining
lapplying with match or %in%, our experience however
shows that such non-vectorized function can be extremely slow, especially
when the x or table vector gets longer.
mmatch delegates the multiple-matching task to a C-level function,
which is optimized for speed. Internal benchmarking shows improvement of
hundred fold, namely using mmatching costs about 1/100 of the time
used by R-implementation.
A list of the same length as the input x vector. Each list
item contains the matching indices in ascending order (similar to
match).
Jitao David Zhang <[email protected]>, C-code was adapted from the program written by Roland Schmucki.
match
vec1 <- c("HSV", "BVB", "FCB", "HSV", "BRE", "HSV", NA, "BVB") vec2 <- c("FCB", "FCN", "FCB", "HSV", "BVB", "HSV", "FCK", NA, "BRE", "BRE") mmatch(vec1, vec2) ## compare to match match(vec1, vec2)vec1 <- c("HSV", "BVB", "FCB", "HSV", "BRE", "HSV", NA, "BVB") vec2 <- c("FCB", "FCN", "FCB", "HSV", "BVB", "HSV", "FCK", NA, "BRE", "BRE") mmatch(vec1, vec2) ## compare to match match(vec1, vec2)
Set operation functions in the base package, union,
intersect and setdiff, can only be applied to binary
manipulations involving two sets. Following functions, munion,
mintersect and msetdiff, extend their basic versions to deal
with multiple sets.
munion(...)munion(...)
... |
Vectors of items, or a list of them. See examples below. |
These functions apply set manipulations (union, intersect, or difference) in a sequential manner: the first two sets are considered first, then the third, the fourth and so on, till all sets have been visited.
A vector of set operation results. Can be an empty vector if no results were returned.
Jitao David Zhang <[email protected]>
set1 <- c("HSV", "FCB", "BVB", "FCN", "HAN") set2 <- c("HSV", "FCB", "BVB", "HAN") set3 <- c("HSV", "BVB", "FSV") munion(set1, set2, set3) mintersect(set1, set2, set3) msetdiff(set1, set2, set3) ## sets can be given in a list as well munion(list(set1, set2, set3)) mintersect(list(set1, set2, set3)) msetdiff(list(set1, set2, set3))set1 <- c("HSV", "FCB", "BVB", "FCN", "HAN") set2 <- c("HSV", "FCB", "BVB", "HAN") set3 <- c("HSV", "BVB", "FSV") munion(set1, set2, set3) mintersect(set1, set2, set3) msetdiff(set1, set2, set3) ## sets can be given in a list as well munion(list(set1, set2, set3)) mintersect(list(set1, set2, set3)) msetdiff(list(set1, set2, set3))
Replace NA in a vector with FALSE
na.false(x)na.false(x)
x |
A logical vector or matrix |
Logical vector or matrix with NAs replaced by FALSE
Jitao David Zhang <[email protected]>
myX <- c("HSV", "FCK", "FCN", NA, "BVB") res <- myX == "HSV" na.false(res)
Calculate pairwise distances between each pair of items in a list
naivePairwiseDist(list, fun = jaccardIndex)naivePairwiseDist(list, fun = jaccardIndex)
list |
A list |
fun |
A function that receives two vectors (such as jaccardIndex) and returns a number (scale) |
A symmetric matrix of dimension mxm, where m is the
length of the list
This function is inefficient compared with matrix-based methods. It is exported just for education and for verifying results of matrix-based methods.
myList <- list(first=LETTERS[3:5], second=LETTERS[1:3], third=LETTERS[1:5], fourth=LETTERS[6:10]) naivePairwiseDist(myList, fun=jaccardIndex) ## despite of the name, any function that returns a number can work naivePairwiseDist(myList, fun=jaccardDistance)myList <- list(first=LETTERS[3:5], second=LETTERS[1:3], third=LETTERS[1:5], fourth=LETTERS[6:10]) naivePairwiseDist(myList, fun=jaccardIndex) ## despite of the name, any function that returns a number can work naivePairwiseDist(myList, fun=jaccardDistance)
Build a factor using the order of input character strings
ofactor(x, ...)ofactor(x, ...)
x |
A vector of character strings |
... |
Other parameters passed to |
Factor with levels in the same order of the input strings.
Jitao David Zhang <[email protected]>
factor
testStrings <- c("A", "C", "B", "B", "C") (testFac <- factor(testStrings)) (testOfac <- ofactor(testStrings)) stopifnot(identical(levels(testOfac), c("A", "C", "B")))testStrings <- c("A", "C", "B", "B", "C") (testFac <- factor(testStrings)) (testOfac <- ofactor(testStrings)) stopifnot(identical(levels(testOfac), c("A", "C", "B")))
Reorder the groups by their group size
orderCutgroup(groups, decreasing = TRUE)orderCutgroup(groups, decreasing = TRUE)
groups |
Named vectors of integers as group indices |
decreasing |
Logical, should the first group be the largest? The function permutes a vector of names integers so that the names
matching the same integer match to the same or another integer, while
assuring that the permuted group matching the first integer
(or the last integer if |
Overlap coefficient, also known as Szymkiewicz-Simpson coefficient
overlapCoefficient(x, y, checkUniqueNonNA = FALSE) overlapDistance(x, y, checkUniqueNonNA = FALSE)overlapCoefficient(x, y, checkUniqueNonNA = FALSE) overlapDistance(x, y, checkUniqueNonNA = FALSE)
x |
A vector |
y |
A vector |
checkUniqueNonNA |
Logical, if |
The overlap coefficient
overlapCofficient calculates the overlap coefficient, and
overlapDistance is defined by 1-overlapCoefficient.
myX <- 1:6 myY <- 4:9 overlapCoefficient(myX, myY) myY2 <- 4:10 overlapCoefficient(myX, myY2) ## compare the result with Jaccard Index jaccardIndex(myX, myY2) ## overlapDistance overlapDistance(myX, myY2)myX <- 1:6 myY <- 4:9 overlapCoefficient(myX, myY) myY2 <- 4:10 overlapCoefficient(myX, myY2) ## compare the result with Jaccard Index jaccardIndex(myX, myY2) ## overlapDistance overlapDistance(myX, myY2)
Overwrite a directory
overwriteDir(dir, action = c("ask", "overwrite", "append", "no"))overwriteDir(dir, action = c("ask", "overwrite", "append", "no"))
dir |
Chacater, path to a directory. |
action |
Ask the user to input the option ( |
If action is set to overwrite, the directory will be
deleted recursively if it exists, a new directory with the same name will be
created, and the function returns TRUE. If append is set, the
function creates the directory if necessary and returns TRUE. If
no is set, the function does nothing and returns.
If action is set to ask, user will be prompted for actions.
If overwrite is set, the directory will be removed and written anew.
If append is set, in contrast to overwrite, the directory and
the files in it are not removed if they exists. In this case, files with the
same name will be overwritten. Otherwise, new directories or files
will be simply created. On the other hand, if the directory does not exist,
it will be created.
If no is set, no action will be taken. The funciton returns
FALSE.
## Helper to create a test directory with files createTestDir <- function() { testdir <- file.path(tempdir(), "overwriteDir_test") dir.create(testdir, showWarnings = FALSE) writeLines("First file", file.path(testdir, "file1.txt")) writeLines("Second file", file.path(testdir, "file2.txt")) return(testdir) } addFileToDir <- function(testdir) { writeLines("New file", tempfile(tmpdir=testdir)) } ## overwrite: delete the directory and create it anew testdir <- createTestDir() length(dir(testdir)) ## two files should be there overwriteDir(testdir, action="overwrite") addFileToDir(testdir) length(dir(testdir)) ## now there should be only one file ## append: keep existing files, add new ones overwriteDir(testdir, action="append") addFileToDir(testdir) length(dir(testdir)) ## now two files ## no: no action, returns FALSE noRes <- overwriteDir(testdir, action="no") noRes ## cleanup unlink(testdir, recursive = TRUE) ## Not run: ## ask: prompts user for action (interactive only) testdir <- createTestDir() overwriteDir(testdir, action="ask") ## End(Not run)## Helper to create a test directory with files createTestDir <- function() { testdir <- file.path(tempdir(), "overwriteDir_test") dir.create(testdir, showWarnings = FALSE) writeLines("First file", file.path(testdir, "file1.txt")) writeLines("Second file", file.path(testdir, "file2.txt")) return(testdir) } addFileToDir <- function(testdir) { writeLines("New file", tempfile(tmpdir=testdir)) } ## overwrite: delete the directory and create it anew testdir <- createTestDir() length(dir(testdir)) ## two files should be there overwriteDir(testdir, action="overwrite") addFileToDir(testdir) length(dir(testdir)) ## now there should be only one file ## append: keep existing files, add new ones overwriteDir(testdir, action="append") addFileToDir(testdir) length(dir(testdir)) ## now two files ## no: no action, returns FALSE noRes <- overwriteDir(testdir, action="no") noRes ## cleanup unlink(testdir, recursive = TRUE) ## Not run: ## ask: prompts user for action (interactive only) testdir <- createTestDir() overwriteDir(testdir, action="ask") ## End(Not run)
The function maps p values between 0 and 1 to continuous scores ranging on R by the following equation:
pAbsLog10Score(p, sign = 1, replaceZero = TRUE)pAbsLog10Score(p, sign = 1, replaceZero = TRUE)
p |
p-value(s) between (0,1] |
sign |
Sign of the score, either positive (in case of positive
numbers), negative (in case of negative numbers), or zero. In case a
logical vector, |
replaceZero |
Logical, whether to replace zero p-values with the
minimal double value specified by the machine. Default is |
A numeric vector of transformed p-values using signed -log10 transformation.
pQnormScore, pScore, replaceZeroPvalue
testPvals <- c(0.001, 0.01, 0.05, 0.1, 0.5, 1) pAbsLog10Score(testPvals) testPvalSign <- rep(c(-1,1), 3) pAbsLog10Score(testPvals, sign=testPvalSign) testLog <- rep(c(TRUE, FALSE),3) pAbsLog10Score(testPvals, testLog)testPvals <- c(0.001, 0.01, 0.05, 0.1, 0.5, 1) pAbsLog10Score(testPvals) testPvalSign <- rep(c(-1,1), 3) pAbsLog10Score(testPvals, sign=testPvalSign) testLog <- rep(c(TRUE, FALSE),3) pAbsLog10Score(testPvals, testLog)
Calculate pairwise Jaccard Indices between each pair of items in a list
pairwiseJaccardIndex(list) pairwiseJaccardDistance(list)pairwiseJaccardIndex(list) pairwiseJaccardDistance(list)
list |
A list |
A symmetric matrix of dimension mxm, where m is the
length of the list
pairwiseJaccardDistance is defined as 1-pairwiseJaccardIndex.
myList <- list(first=LETTERS[3:5], second=LETTERS[1:3], third=LETTERS[1:5], fourth=LETTERS[6:10]) pairwiseJaccardIndex(myList) poormanPJI <- function(list) { sapply(list, function(x) sapply(list, function(y) jaccardIndex(x,y))) } stopifnot(identical(pairwiseJaccardIndex(myList), poormanPJI(myList)))myList <- list(first=LETTERS[3:5], second=LETTERS[1:3], third=LETTERS[1:5], fourth=LETTERS[6:10]) pairwiseJaccardIndex(myList) poormanPJI <- function(list) { sapply(list, function(x) sapply(list, function(y) jaccardIndex(x,y))) } stopifnot(identical(pairwiseJaccardIndex(myList), poormanPJI(myList)))
Calculate pairwise overlap coefficients between each pair of items in a list
pairwiseOverlapDistance(list) pairwiseOverlapCoefficient(list)pairwiseOverlapDistance(list) pairwiseOverlapCoefficient(list)
list |
A list |
A symmetric matrix of dimension mxm, where m is the
length of the list
pairwiseOverlapDistance is defined the pairwise overlap distance.
myList <- list(first=LETTERS[3:5], second=LETTERS[1:3], third=LETTERS[1:5], fourth=LETTERS[6:10]) pairwiseOverlapCoefficient(myList) pairwiseOverlapDistance(myList) poormanPOC <- function(list) { sapply(list, function(x) sapply(list, function(y) overlapCoefficient(x,y))) } stopifnot(identical(pairwiseOverlapCoefficient(myList), poormanPOC(myList)))myList <- list(first=LETTERS[3:5], second=LETTERS[1:3], third=LETTERS[1:5], fourth=LETTERS[6:10]) pairwiseOverlapCoefficient(myList) pairwiseOverlapDistance(myList) poormanPOC <- function(list) { sapply(list, function(x) sapply(list, function(y) overlapCoefficient(x,y))) } stopifnot(identical(pairwiseOverlapCoefficient(myList), poormanPOC(myList)))
Print a decimal number in procent format
percentage(x, fmt = "1.1")percentage(x, fmt = "1.1")
x |
a decimal number, usually between -1 and 1 |
fmt |
format string, '1.1' means a digit before and after the decimal point will be printed |
Character string
percentage(c(0,0.1,0.25,1)) percentage(c(0,0.1,0.25,1), fmt="1.4") percentage(c(0,-0.1,0.25,-1), fmt="+1.1")percentage(c(0,0.1,0.25,1)) percentage(c(0,0.1,0.25,1), fmt="1.4") percentage(c(0,-0.1,0.25,-1), fmt="+1.1")
Quantile function, also known as the inverse of cumulative distribution function of the normal
distribution, is used to map p-values to continuous scores raging on . The signs of the
resulting scores are positive by default and are determined by the parameter sign.
pQnormScore(p, sign = 1, replaceZero = TRUE)pQnormScore(p, sign = 1, replaceZero = TRUE)
p |
p-value(s) between |
sign |
Signs of the scores, either positive (in case of positive numbers),
negative (in case of negative numbers), or zero. In case of a logical vector,
|
replaceZero |
Logical, whether to replace zero p-values with the
minimal double value specified by the machine. Default is |
A numeric vector of transformed p-values using signed quantile normal transformation.
pAbsLog10Score, pScore, double
testPvals <- c(0.001, 0.01, 0.05, 0.1, 0.5, 1) pQnormScore(testPvals) testPvalSign <- rep(c(-1,1), 3) pQnormScore(testPvals, sign=testPvalSign) testLog <- rep(c(TRUE, FALSE),3) pQnormScore(testPvals, testLog)testPvals <- c(0.001, 0.01, 0.05, 0.1, 0.5, 1) pQnormScore(testPvals) testPvalSign <- rep(c(-1,1), 3) pQnormScore(testPvals, sign=testPvalSign) testLog <- rep(c(TRUE, FALSE),3) pQnormScore(testPvals, testLog)
Print BEDAinfo object
## S3 method for class 'BEDAinfo' print(x, ...)## S3 method for class 'BEDAinfo' print(x, ...)
x |
A BEDA info object, returned by |
... |
Ignored |
Invisible NULL, only side effect is used
print(bedaInfo())print(bedaInfo())
The function wraps other functions to map p values ranging on
to continuous scores ranging on in a number of ways.
pScore(p, sign = 1, method = c("qnorm", "absLog10"), replaceZero = TRUE)pScore(p, sign = 1, method = c("qnorm", "absLog10"), replaceZero = TRUE)
p |
p-value between (0,1] |
sign |
Sign of the score, either positive (in case of positive
numbers), negative (in case of negative numbers), or zero. In case a
logical vector, |
method |
Currently available methods include |
replaceZero |
Logical, whether to replace zero p-values with the
minimal double value specified by the machine. Default is |
A numeric vector of transformed p-values using the specified method.
testPvals <- c(0.001, 0.01, 0.05, 0.1, 0.5, 1) pScore(testPvals, method="absLog10") pScore(testPvals, method="qnorm") testPvalSign <- rep(c(-1,1), 3) pScore(testPvals, sign=testPvalSign, method="absLog10") pScore(testPvals, sign=testPvalSign, method="qnorm") testLog <- rep(c(TRUE, FALSE),3) pScore(testPvals, testLog, method="absLog10") pScore(testPvals, testLog, method="qnorm") testPvals <- 10^seq(-5, 0, 0.05) plot(pScore(testPvals, method="qnorm"), pScore(testPvals, method="absLog10"), xlab="pQnormScore", ylab="pAbsLog10Score"); abline(0,1, col="red", lty=2)testPvals <- c(0.001, 0.01, 0.05, 0.1, 0.5, 1) pScore(testPvals, method="absLog10") pScore(testPvals, method="qnorm") testPvalSign <- rep(c(-1,1), 3) pScore(testPvals, sign=testPvalSign, method="absLog10") pScore(testPvals, sign=testPvalSign, method="qnorm") testLog <- rep(c(TRUE, FALSE),3) pScore(testPvals, testLog, method="absLog10") pScore(testPvals, testLog, method="qnorm") testPvals <- 10^seq(-5, 0, 0.05) plot(pScore(testPvals, method="qnorm"), pScore(testPvals, method="absLog10"), xlab="pQnormScore", ylab="pAbsLog10Score"); abline(0,1, col="red", lty=2)
This function is helpful to export tables where certain columns are desired to be placed to the most left of the data.frame
putColsFirst(data.frame, columns)putColsFirst(data.frame, columns)
data.frame |
Data.frame |
columns |
Character vector, names of columns which are to be put to the left |
data.frame with re-arranged columns
Jitao David Zhang <[email protected]>
clubs <- data.frame(Points=c(21,23,28,24), Name=c("BVB", "FCB", "HSV", "FCK"), games=c(12,11,11,12)) putColsFirst(clubs, c("Name")) putColsFirst(clubs, c("Name", "games"))clubs <- data.frame(Points=c(21,23,28,24), Name=c("BVB", "FCB", "HSV", "FCK"), games=c(12,11,11,12)) putColsFirst(clubs, c("Name")) putColsFirst(clubs, c("Name", "games"))
Decode password encypted with pwencode.
pwdecode(password)pwdecode(password)
password |
Character string to be decoded. If starting with a empty character, the string is sent for decoding; otherwise, it is deemed as clear text password and returned. |
See pwdecode function documentation in BIOS for implemetnation details.
Note that since R does not support strings embedding null values
(\000), the password to be decoded has to be given with two slashes,
e.g. ‘ \001\000\129\235’.
Decoded character string, or empty string if decoding fails
Jitao David Zhang <[email protected]>. The C library code was written by Detlef Wolf.
mycode <- " \\001\\000\\141\\314\\033\\033\\033\\033\\033\\142\\303\\056\\166\\311\\037\\042" pwdecode(mycode)mycode <- " \\001\\000\\141\\314\\033\\033\\033\\033\\033\\142\\303\\056\\166\\311\\037\\042" pwdecode(mycode)
Encode a password
pwencode(label = "VAR", key)pwencode(label = "VAR", key)
label |
label used to encode the password |
key |
password key |
Character string, encoded password
Quitely quit R with messages in non-interactive sessions
qqmsg(..., status = 0, save = FALSE, runLast = TRUE)qqmsg(..., status = 0, save = FALSE, runLast = TRUE)
... |
Messages to be passed to |
status |
Quit stats |
save |
Logical, should current working environment be saved? |
runLast |
Logical, should |
The function prints messages in any case, and quits R if the current session is non-interactive, e.g. in the command-line running Rscript mode
Invisible NULL, only side effect is used.
Jitao David Zhang <[email protected]>
## the example should not run because it will lead the R session to quit ## Not run: qqmsg() qqmsg("die", status=0) qqmsg("Avada kedavra", status=-1) qqmsg("Crucio!", "\n", "Avada kedavra", status=-100) ## End(Not run)## the example should not run because it will lead the R session to quit ## Not run: qqmsg() qqmsg("die", status=0) qqmsg("Avada kedavra", status=-1) qqmsg("Crucio!", "\n", "Avada kedavra", status=-100) ## End(Not run)
Quietly runs a system command: the output is internalized and returned as an invisible variable, and the standard error output is ignored.
qsystem(command)qsystem(command)
command |
A system command |
The function runs the system command in a quiet mode. The function can be useful in CGI scripts, for instance
(Invisibly) the internalized output of the command
Jitao David Zhang <[email protected]>
dateIntern <- system("date")dateIntern <- system("date")
Factor variables with numbers as levels are alphabetically ordered by default, which requires rearrangements for various purposes, e.g. modelling or visualizations. This function re-orders levels of numeric factor variables numerically.
refactorNum(x, decreasing = FALSE)refactorNum(x, decreasing = FALSE)
x |
A factor variable with numeric values as levels |
decreasing |
Logical, should the levels sorted descendingly? |
A factor variable, with sorted numeric values as levels
Jitao David Zhang <[email protected]>
(nums <- factor(c("2","4","24","1","2","125","1","2","125"))) (nums.new <- refactorNum(nums))(nums <- factor(c("2","4","24","1","2","125","1","2","125"))) (nums.new <- refactorNum(nums))
registerLog and doLog provide a simple mechanism
to handle loggings (printing text messages to files or other types of
connections) in R.Users can register arbitrary numbers of loggers with registerLog, and
the functions take care of low-level details such as openning and closing
the connections.
registerLog(..., append = FALSE)registerLog(..., append = FALSE)
... |
Arbitrary numbers of file names (character strings) or connection objects (see example). |
append |
Logical, log will be appended to the existing file but not overwriting. Only valid for files but not for connections such as standard output. |
Input parameters can be either character strings or connections (such as the
objects returned by stdout() or pipe().
If a character string is registered as a logger, it is assumed as a file
name (user must make sure that it is writable/appendable). In case the file
exists, new logging messages will be appended; otherwise if the file
does not exists, it will be created and the logging messages will be
written to the file.
A special case is the parameter value "-": it will be interpreted as
standard output.
if a connection is registered as a logger, it must be writable in order to write the logging messages.
Each parameter will be converted to a connection object, which will
be closed (when applicable) automatically before R quits.
If the parameter is missing (or set to NA or NULL), no logging
will take place.
No value returned: its side effect is used.
Currently, the loggers are stored in a variable in the namespace of
ribiosUtils named RIBIOS_LOGGERS. This is only for internal use of
the package and may change any time, therefore users are not advised to
manipulate this variable directly.
To clear the registered loggers, use clearLog.To flush the registered
loggers, use flushLog. Usually it is not necessary to use
flushLog in R scripts, since by program exit the active R session
will automatically flush and close the connections (in addition, frequent
flushing may decrease the program's efficiency). However, if used in
interactive sessions, sometimes flushLog is needed to force R write
all log files to all connections that are registered.
Jitao David Zhang <[email protected]>
doLog writes messages iteratively to each connection
registered by registerLog.
## the following code section is not run to prevent issues with pkgdown ## Not run: logfile1 <- tempfile() logfile2 <- tempfile() logcon3 <- stdout() if(.Platform$OS.type == "unix") { registerLog("/dev/null") } else { registerLog(tempfile()) } registerLog(logfile1) registerLog(logfile2) registerLog(logcon3) doLog("Start logging") doLog("Do something...") doLog("End logging") flushLog() ## usually not needed, see notes txt1 <- readLines(logfile1) txt2 <- readLines(logfile2) cat(txt1) cat(txt2) clearLog() registerLog(logfile1, logfile2, logcon3) doLog("Start logging - round 2") doLog("Do something again ...") doLog("End logging - for good") flushLog() ## usually not needed, see notes txt1 <- readLines(logfile1) txt2 <- readLines(logfile2) cat(txt1) cat(txt2) ## clean up files and objects to close unused connections closeLoggerConnections() ## End(Not run)## the following code section is not run to prevent issues with pkgdown ## Not run: logfile1 <- tempfile() logfile2 <- tempfile() logcon3 <- stdout() if(.Platform$OS.type == "unix") { registerLog("/dev/null") } else { registerLog(tempfile()) } registerLog(logfile1) registerLog(logfile2) registerLog(logcon3) doLog("Start logging") doLog("Do something...") doLog("End logging") flushLog() ## usually not needed, see notes txt1 <- readLines(logfile1) txt2 <- readLines(logfile2) cat(txt1) cat(txt2) clearLog() registerLog(logfile1, logfile2, logcon3) doLog("Start logging - round 2") doLog("Do something again ...") doLog("End logging - for good") flushLog() ## usually not needed, see notes txt1 <- readLines(logfile1) txt2 <- readLines(logfile2) cat(txt1) cat(txt2) ## clean up files and objects to close unused connections closeLoggerConnections() ## End(Not run)
This function wraps relevelsByNamedVec for named vector and
relevelsByNotNamedVec for not named vectors
relevels( x, refs, missingLevels = c("pass", "warning", "error"), unrecognisedLevels = c("warning", "pass", "error") )relevels( x, refs, missingLevels = c("pass", "warning", "error"), unrecognisedLevels = c("warning", "pass", "error") )
x |
A factor or a character string vector that will be cast into factor |
refs |
A named vector or unnamed vector. |
missingLevels |
Actions taken in case existing levels are missing: 'pass', 'warning', or 'error'. |
unrecognisedLevels |
Actions taken in case unrecognised levels are found: 'pass', 'warning', or 'error'. |
A vector of factor
relevelsByNamedVec and
relevelsByNotNamedVec
oldFactor <- factor(c("A", "B", "A", "C", "B"), levels=LETTERS[1:3]) refLevels <- c("B", "C", "A") refDict <- c("A"="a", "B"="b", "C"="c") newFactor <- relevels(oldFactor, refLevels) stopifnot(identical(newFactor, factor(c("A", "B", "A", "C", "B"), levels=c("B", "C", "A")))) newFactor2 <- relevels(oldFactor, refDict) stopifnot(identical(newFactor2, factor(c("a", "b", "a", "c", "b"), levels=c("a", "b", "c"))))oldFactor <- factor(c("A", "B", "A", "C", "B"), levels=LETTERS[1:3]) refLevels <- c("B", "C", "A") refDict <- c("A"="a", "B"="b", "C"="c") newFactor <- relevels(oldFactor, refLevels) stopifnot(identical(newFactor, factor(c("A", "B", "A", "C", "B"), levels=c("B", "C", "A")))) newFactor2 <- relevels(oldFactor, refDict) stopifnot(identical(newFactor2, factor(c("a", "b", "a", "c", "b"), levels=c("a", "b", "c"))))
If names contain character strings other than the levels in the old factor
and warning is set to TRUE, a warning will be raised.
relevelsByNamedVec( x, refs, missingLevels = c("pass", "warning", "error"), unrecognisedLevels = c("warning", "pass", "error") )relevelsByNamedVec( x, refs, missingLevels = c("pass", "warning", "error"), unrecognisedLevels = c("warning", "pass", "error") )
x |
A factor |
refs |
A named vector. The names of the vector are all or a subset of levels in the old factor. And the values are new levels |
missingLevels |
Actions taken in case existing levels are missing: 'pass', 'warning', or 'error'. |
unrecognisedLevels |
Actions taken in case unrecognised levels are found: 'pass', 'warning', or 'error'. |
The levels of the factor are the names of the ref vector, and
the order of the ref vector matters: it is the levels of the new factor.
A vector of factor
oldFactor <- factor(c("A", "B", "A", "C", "B"), levels=LETTERS[1:3]) factorDict <- c("A"="a", "B"="b", "C"="c") newFactor <- relevelsByNamedVec(oldFactor, factorDict) stopifnot(identical(newFactor, factor(c("a", "b", "a", "c", "b"), levels=c("a", "b", "c")))) ## TODO: test warning and erroroldFactor <- factor(c("A", "B", "A", "C", "B"), levels=LETTERS[1:3]) factorDict <- c("A"="a", "B"="b", "C"="c") newFactor <- relevelsByNamedVec(oldFactor, factorDict) stopifnot(identical(newFactor, factor(c("a", "b", "a", "c", "b"), levels=c("a", "b", "c")))) ## TODO: test warning and error
If names contain character strings other than the levels in the old factor
and warning is set to TRUE, a warning will be raised
relevelsByNotNamedVec( x, refs, missingLevels = c("pass", "warning", "error"), unrecognisedLevels = c("warning", "pass", "error") )relevelsByNotNamedVec( x, refs, missingLevels = c("pass", "warning", "error"), unrecognisedLevels = c("warning", "pass", "error") )
x |
A factor |
refs |
A unnamed vector. The values of the vector are levels of
|
missingLevels |
Actions taken in case existing levels are missing: 'pass', 'warning', or 'error'. |
unrecognisedLevels |
Actions taken in case unrecognised levels are found: 'pass', 'warning', or 'error'. |
A vector of factor
oldFactor <- factor(c("A", "B", "A", "C", "B"), levels=LETTERS[1:3]) refLevels <- c("B", "C", "A") newFactor <- relevelsByNotNamedVec(oldFactor, refLevels) stopifnot(identical(newFactor, factor(c("A", "B", "A", "C", "B"), levels=c("B", "C", "A")))) ## TODO: test warning and erroroldFactor <- factor(c("A", "B", "A", "C", "B"), levels=LETTERS[1:3]) refLevels <- c("B", "C", "A") newFactor <- relevelsByNotNamedVec(oldFactor, refLevels) stopifnot(identical(newFactor, factor(c("A", "B", "A", "C", "B"), levels=c("B", "C", "A")))) ## TODO: test warning and error
Reload a package by first detaching and loading the library.
reload(pkg)reload(pkg)
pkg |
Character string, name of the package |
Side effect is used.
So far only character is accepted
Jitao David Zhang <[email protected]>
## the example should not run because it will reload the package ## Not run: reload(ribiosUtils) ## End(Not run)## the example should not run because it will reload the package ## Not run: reload(ribiosUtils) ## End(Not run)
Remove columns from a data.frame object
removeColumns(data.frame, columns, drop = FALSE)removeColumns(data.frame, columns, drop = FALSE)
data.frame |
data.frame |
columns |
names of columns to be removed |
drop |
Logical, whether the matrix should be dropped to vector if only one column is left |
The function is equivalent to the subsetting operation with brackets. It provides a tidy programming interface to manupulate data.frames.
data.frame with specified columns removed
Jitao David Zhang <[email protected]>
clubs <- data.frame(Points=c(21,23,28,24), Name=c("BVB", "FCB", "HSV", "FCK"), games=c(12,11,11,12)) removeColumns(clubs,c("Name"))clubs <- data.frame(Points=c(21,23,28,24), Name=c("BVB", "FCB", "HSV", "FCK"), games=c(12,11,11,12)) removeColumns(clubs,c("Name"))
Remove rows or column by function
removeColumnsByFunc(matrix, removeFunc) removeRowsByFunc(matrix, removeFunc)removeColumnsByFunc(matrix, removeFunc) removeRowsByFunc(matrix, removeFunc)
matrix |
A matrix |
removeFunc |
A function which should return boolean results |
A matrix with rows or columns whose return value of removeFunc
is TRUE
myMat <- matrix(c(1, 3 ,5, 4, 5, 6, 7, 9, 11), byrow=FALSE, nrow=3) removeColumnsByFunc(myMat, removeFunc=function(x) any(x %% 2 == 0)) removeRowsByFunc(myMat, removeFunc=function(x) any(x %% 2 == 0))myMat <- matrix(c(1, 3 ,5, 4, 5, 6, 7, 9, 11), byrow=FALSE, nrow=3) removeColumnsByFunc(myMat, removeFunc=function(x) any(x %% 2 == 0)) removeRowsByFunc(myMat, removeFunc=function(x) any(x %% 2 == 0))
Remove columns in a matrix that contains one or more NAs
removeColumnsWithNA(mat)removeColumnsWithNA(mat)
mat |
A matrix |
A matrix, with columns containing one or more NAs removed
myMat <- matrix(c(1:9, NA, 10:17), nrow=6, byrow=TRUE, dimnames=list(sprintf("R%d", 1:6), sprintf("C%d", 1:3))) removeColumnsWithNA(myMat)myMat <- matrix(c(1:9, NA, 10:17), nrow=6, byrow=TRUE, dimnames=list(sprintf("R%d", 1:6), sprintf("C%d", 1:3))) removeColumnsWithNA(myMat)
Columns with one unique value are invariable. The functions help to remove such columns from a data frame (or matrix) in order to highlight the variables.
removeInvarCol(df)removeInvarCol(df)
df |
A data frame or matrix |
removeInvarCol the data frame removing invariable column(s).
isVarCol and isInvarCol are helper functions, returning a
logical vector indicating the variable and invariable columns respectively.
isVarCol and isInvarCol return a logical vector
indicating the variable and invariable columns respectively.
removeInvarCol removes invariable columns.
Jitao David Zhang <[email protected]>
testDf <- data.frame(a=1:4, b=7, c=LETTERS[1:4]) isVarCol(testDf) isInvarCol(testDf) removeInvarCol(testDf)testDf <- data.frame(a=1:4, b=7, c=LETTERS[1:4]) isVarCol(testDf) isInvarCol(testDf) removeInvarCol(testDf)
Remove rows in a matrix that contains one or more NAs
removeRowsWithNA(mat)removeRowsWithNA(mat)
mat |
A matrix |
A matrix, with rows containing one or more NAs removed
myMat <- matrix(c(1:9, NA, 10:17), nrow=6, byrow=TRUE, dimnames=list(sprintf("R%d", 1:6), sprintf("C%d", 1:3))) removeRowsWithNA(myMat)myMat <- matrix(c(1:9, NA, 10:17), nrow=6, byrow=TRUE, dimnames=list(sprintf("R%d", 1:6), sprintf("C%d", 1:3))) removeRowsWithNA(myMat)
Replace column names in data.frame
replaceColumnName(data.frame, old.names, new.names)replaceColumnName(data.frame, old.names, new.names)
data.frame |
A data.frame |
old.names |
Old column names to be replaced |
new.names |
New column names |
Data.frame with column names updated
Jitao David Zhang <[email protected]>
clubs <- data.frame(Points=c(21,23,28,24), Name=c("BVB", "FCB", "HSV", "FCK"), games=c(12,11,11,12)) replaceColumnName(clubs, c("Points", "games"), c("Punkte", "Spiele"))clubs <- data.frame(Points=c(21,23,28,24), Name=c("BVB", "FCB", "HSV", "FCK"), games=c(12,11,11,12)) replaceColumnName(clubs, c("Points", "games"), c("Punkte", "Spiele"))
Replace p-values of zero
replaceZeroPvalue(p, factor = 1)replaceZeroPvalue(p, factor = 1)
p |
A numeric vector, containing p-values. Zero values will be replaced by a small, non-zero value. |
factor |
A numeric vector, the minimal p-value will be multiplied by it. Useful for |
A numeric vector of the same length as the input vector, with zeros replaced by the minimal absolute double value defined by the machine multiplied by the factor.
Values under the minimal positive double value are considered zero and replaced.
ps <- seq(0,1,0.1) replaceZeroPvalue(ps) replaceZeroPvalue(ps, factor=2)ps <- seq(0,1,0.1) replaceZeroPvalue(ps) replaceZeroPvalue(ps, factor=2)
A temporary directory which (1) every machine in the cluster has access to and (2) has sufficient space
ribiosTempdir()ribiosTempdir()
a character string of the directory name
A temporary file which (1) every machine in the cluster has access to and (2) there is sufficient space
ribiosTempfile(pattern = "file", tmpdir = ribiosTempdir(), fileext = "")ribiosTempfile(pattern = "file", tmpdir = ribiosTempdir(), fileext = "")
pattern |
Character string, file pattern |
tmpdir |
Character string, temp directory |
fileext |
CHaracter string, file name extension (suffix) |
a character string of the file name
ribiosUtils is a swiss-knife package providing misc utilities
Jitao David Zhang <[email protected]>, with inputs from Clemens Broger, Martin Ebeling, Laura Badi and Roland Schmucki
Send a at job to remove (probably temporary) files in the future with
a specified time interval from now
rmat(..., days = NULL, hours = NULL, minutes = NULL, dry = TRUE)rmat(..., days = NULL, hours = NULL, minutes = NULL, dry = TRUE)
... |
Files to be removed |
days |
Numeric, interval in days |
hours |
Numeric, interval in hours |
minutes |
Numeric, interval in minutes |
dry |
Logical, if set to |
The command will delete files, and there is usually no way to get deleted files back. So make sure you know what you are doing!
Days, hours, and minutes can be given in a mixed way: they will be summed up to give the interval.
(Invisibly) the output of the at job.
Since the command uses at internally, it is unlikely the
command will work in the Windows system “out of box”.
Jitao David Zhang <[email protected]>
qsystem for running system commands quietly.
tmp1 <- tempfile() tmp2 <- tempfile() rmat(tmp1, tmp2, minutes=1)tmp1 <- tempfile() tmp2 <- tempfile() rmat(tmp1, tmp2, minutes=1)
Extract core identifiers from Roche compound IDs
rocheCore(str, short = FALSE)rocheCore(str, short = FALSE)
str |
Character strings |
short |
Logical, if |
Core identifiers if the element is a Roche compound ID, the original element otherwise Non-character input will be converted to character strings first.
rocheCore(c("RO1234567-001", "RO1234567-001-000", "RO1234567", "ROnoise-001", "anyOther-not-affected")) rocheCore(c("RO1234567-001", "RO1234567-001-000", "RO1234567", "ROnoise-001","anyOther-not-affected"), short=TRUE)rocheCore(c("RO1234567-001", "RO1234567-001-000", "RO1234567", "ROnoise-001", "anyOther-not-affected")) rocheCore(c("RO1234567-001", "RO1234567-001-000", "RO1234567", "ROnoise-001","anyOther-not-affected"), short=TRUE)
S3 method for row-scaling
rowscale(x, center = TRUE, scale = TRUE)rowscale(x, center = TRUE, scale = TRUE)
x |
Any object |
center |
Logical, whether centering should be done before scaling |
scale |
Logical, whether scaling should be done |
The input object with rows scaled
Scaling a matrix by row can be slightly slower due to a transposing step.
## S3 method for class 'matrix' rowscale(x, center = TRUE, scale = TRUE)## S3 method for class 'matrix' rowscale(x, center = TRUE, scale = TRUE)
x |
An matrix |
center |
Logical, passed to |
scale |
Logical, passed to |
A matrix with each row scaled.
Jitao David Zhang <[email protected]>
mat <- matrix(rnorm(20), nrow=4) rs.mat <- rowscale(mat) print(mat) print(rs.mat) rowMeans(rs.mat) apply(rs.mat, 1L, sd) rowscale(mat, center=FALSE, scale=FALSE) ## equal to mat rowscale(mat, center=TRUE, scale=FALSE) rowscale(mat, center=FALSE, scale=TRUE)mat <- matrix(rnorm(20), nrow=4) rs.mat <- rowscale(mat) print(mat) print(rs.mat) rowMeans(rs.mat) apply(rs.mat, 1L, sd) rowscale(mat, center=FALSE, scale=FALSE) ## equal to mat rowscale(mat, center=TRUE, scale=FALSE) rowscale(mat, center=FALSE, scale=TRUE)
Scaling a table by row can be slightly slower due to a transposing step.
## S3 method for class 'table' rowscale(x, center = TRUE, scale = TRUE)## S3 method for class 'table' rowscale(x, center = TRUE, scale = TRUE)
x |
An matrix |
center |
Logical, passed to |
scale |
Logical, passed to |
A table with each row scaled.
Jitao David Zhang <[email protected]>
letterDf <- data.frame(from=c("A", "A", "B", "C"), to=c("A", "B", "C", "A")) tbl <- table(letterDf$from, letterDf$to) tblRowscale <- rowscale(tbl) print(tbl) print(tblRowscale) rowMeans(tblRowscale) apply(tblRowscale, 1L, sd) rowscale(tbl, center=FALSE, scale=FALSE) ## equal to mat rowscale(tbl, center=TRUE, scale=FALSE) rowscale(tbl, center=FALSE, scale=TRUE)letterDf <- data.frame(from=c("A", "A", "B", "C"), to=c("A", "B", "C", "A")) tbl <- table(letterDf$from, letterDf$to) tblRowscale <- rowscale(tbl) print(tbl) print(tblRowscale) rowMeans(tblRowscale) apply(tblRowscale, 1L, sd) rowscale(tbl, center=FALSE, scale=FALSE) ## equal to mat rowscale(tbl, center=TRUE, scale=FALSE) rowscale(tbl, center=FALSE, scale=TRUE)
Reverse rank order
rrank(x, ...) ## Default S3 method: rrank(x, ...)rrank(x, ...) ## Default S3 method: rrank(x, ...)
x |
A numeric, complex, character or logical vector |
... |
Passed to |
A vector of numbers of the same length as the input, giving reverse rank orders.
The function returns the reverse rank order, i.e. in the descending order
testVec <- c(3,6,4,5) rank(testVec) rrank(testVec)testVec <- c(3,6,4,5) rank(testVec) rrank(testVec)
Get reverse rank orders in each column
## S3 method for class 'matrix' rrank(x, ...)## S3 method for class 'matrix' rrank(x, ...)
x |
A matrix |
... |
Passed to |
A matrix of the same dimension and attributes of the input matrix, with reverse rank orders of each column
testMatrix <- matrix(c(3,6,4,5,2,4,8,3,2,5,4,7), ncol=3, byrow=FALSE) rrank(testMatrix)testMatrix <- matrix(c(3,6,4,5,2,4,8,3,2,5,4,7), ncol=3, byrow=FALSE) rrank(testMatrix)
Return a matrix that highlights reverse rank orders of features of interest by column
rrankInd(matrix, ind, inValue = 1L, outValue = 0L, ...)rrankInd(matrix, ind, inValue = 1L, outValue = 0L, ...)
matrix |
A matrix |
ind |
An integer vector or a logical vector that gives the index |
inValue |
Value to highlight the reverse ranks indexed by |
outValue |
Values assigned to other values not indexed by |
... |
Passed to |
A matrix of the same dimension and attributes of the input matrix,
each column contains a vector of inValue and outValue.
Positions that match the reverse ranks of matrix values indexed by
ind are assigned the inValue, otherwise,
the outValue.
The function can be used to visualize the reverse ranks of features of interest (rows of the input matrix) in each sample (columns of the input matrix). This is for instance useful for rank plots of features for gene-set enrichment analysis.
Imagine that all features indexed by ind are the larger than
all other features in each sample, then the returned matrix will have
value 1 in the first rows (the number determined by the features
indxed by ind), and 0 in the rest rows.
testMatrix <- matrix(c(3,6,4,5,2,4,8,3,2,5,4,7), ncol=3, byrow=FALSE) print(testMatrix) testInd <- c(2,4) ## verify that the command below returns 1 in positions occupied by ## the reverse ranks of the values indexed by testInd rrankInd(testMatrix, testInd) testIndBool <- c(FALSE, TRUE, FALSE, TRUE) rrankInd(testMatrix, testIndBool)testMatrix <- matrix(c(3,6,4,5,2,4,8,3,2,5,4,7), ncol=3, byrow=FALSE) print(testMatrix) testInd <- c(2,4) ## verify that the command below returns 1 in positions occupied by ## the reverse ranks of the values indexed by testInd rrankInd(testMatrix, testInd) testIndBool <- c(FALSE, TRUE, FALSE, TRUE) rrankInd(testMatrix, testIndBool)
reverse setdiff, i.e. rsetdiff(x,y) equals setdiff(y,x)
rsetdiff(x, y)rsetdiff(x, y)
x |
a vector |
y |
another vector |
Similar to setdiff, but with elements in y but not in x
Jitao David Zhang
testVec1 <- LETTERS[3:6] testVec2 <- LETTERS[5:7] rsetdiff(testVec1, testVec2)testVec1 <- LETTERS[3:6] testVec2 <- LETTERS[5:7] rsetdiff(testVec1, testVec2)
The function prepares R for an interactive session (e.g. in a script). Currently it defines behaviour in case of errors: a file named “ribios.dump” is written.
scriptInit()scriptInit()
Side effect is used.
Jitao David Zhang <[email protected]>
## do not run unless the script mode is needed scriptInit()## do not run unless the script mode is needed scriptInit()
These functions are used to debug command-line executable Rscripts in R sessions
setDebug()setDebug()
setDebug sets the environmental variable RIBIOS_SCRIPT_DEBUG
as TRUE. unsetDebug unsets the variable. isDebugging
checks whether the variable is set or not. isIntDebugging tests
whether the scripts runs interactively or runs in the debugging mode. The
last one can be useful when debugging Rscript in a R session.
A programmer wishing to debug a Rscript can explicitly set (or unset) the
RIBIOS_SCRIPT_DEBUG variable in order to activate (inactivate)
certain trunks of codes. This can be automated via isDebugging, or
probably more conveniently, by isIntDebugging: if the script runs in
an interactive mode, or the debugging flag is set, the function returns
TRUE.
setDebug and unsetDebug returns an invisible value
indicating whether the variable setting (unsetting) was successful.
isDebugging and isIntDebugging returns logical values.
Jitao David Zhang <[email protected]>
unsetDebug() print(isDebugging()) setDebug() print(isDebugging()) unsetDebug() print(isDebugging()) print(isIntDebugging())unsetDebug() print(isDebugging()) setDebug() print(isDebugging()) unsetDebug() print(isDebugging()) print(isIntDebugging())
Shorten Roche compounds identifiers
shortenRocheCompoundID(str)shortenRocheCompoundID(str)
str |
Character strings that contains one or more Roche core identifiers ( |
Character strings of the same length as the input, with all core identifiers shortened
In contrast to rocheCore, which only handles character strings that are valid Roche compound identifiers, this function takes any input
string and performs a gsub operation to shorten Roche core numbers. Therefore, it even works when only a substring matches the pattern of a Roche compound name.
shortenRocheCompoundID(c("RO1234567-001", "RO1234567-001-000", "RO1234567", "ROnoise-001", "anyOther-not-affected", "RO1234567 and RO9876543 are two imaginary compounds."))shortenRocheCompoundID(c("RO1234567-001", "RO1234567-001-000", "RO1234567", "ROnoise-001", "anyOther-not-affected", "RO1234567 and RO9876543 are two imaginary compounds."))
Shorten strings to a given number of characters
shortenStr(str, nchar = 8)shortenStr(str, nchar = 8)
str |
A vector of strings |
nchar |
The maximal number of characters to keep |
A vector of strings of the same length as the input, with each string shortened to the desired length
Strings with more characters than nchar will be shortened.
NA will be kept as they are
inputStrs <- c("abc", "abcd", "abcde", NA) shortenStr(inputStrs, nchar=4) ## expected outcome: abc, abcd, abcd..., NAinputStrs <- c("abc", "abcd", "abcde", NA) shortenStr(inputStrs, nchar=4) ## expected outcome: abc, abcd, abcd..., NA
The function is used to keep the command silent by suppressing warnings and messages
silencio(...)silencio(...)
... |
Any function call |
The same as the function call
Jitao David Zhang <[email protected]>
suppressWarnings, suppressMessages
wsqrt <- function(x) {warning("Beep");message("Calculating square");return(x^2)} silencio(wsqrt(3))wsqrt <- function(x) {warning("Beep");message("Calculating square");return(x^2)} silencio(wsqrt(3))
Sort a numeric vector and filter by a threshold of cumsumprop
sortAndFilterByCumsumprop(x, thr = 0.9)sortAndFilterByCumsumprop(x, thr = 0.9)
x |
Numeric vector, usually named |
thr |
Threshold, default 0.9, meaning that items whose proportion of cumulative sum just above 0.9 are kept. |
Another numeric vector, likely shorter than x, items whose
cumsumprop is equal or lower than thr. The rest items are
summed into one new item, with the name rest
This function can be useful to extract from a long numeric vector the largest items that dominate the sum of the vector
x <- c("A"=1,"B"=2,"C"=3,"D"=4,"E"=400,"F"=500) sortAndFilterByCumsumprop(x, thr=0.99) ## F and E should be returnedx <- c("A"=1,"B"=2,"C"=3,"D"=4,"E"=400,"F"=500) sortAndFilterByCumsumprop(x, thr=0.99) ## F and E should be returned
Sort rows of an data.frame by values in specified columns.
sortByCol( data.frame, columns, na.last = TRUE, decreasing = TRUE, orderAsAttr = FALSE )sortByCol( data.frame, columns, na.last = TRUE, decreasing = TRUE, orderAsAttr = FALSE )
data.frame |
A |
columns |
Column name(s) which sould be ordered |
na.last |
Logical, whether NA should be sorted as last |
decreasing |
Logical, whether the sorting should be in the decreasing order |
orderAsAttr |
Logical, whether the order index vectors should be
returned in the attribute “order” of the sorted |
Columns can be specified by integer indices, logical vectors or character names.
Sorted data.frame
Jitao David Zhang <[email protected]>
sample.df <- data.frame(teams=c("HSV", "BVB", "FCB", "FCN"),pts=c(18,17,17,9), number=c(7,7,6,6)) sortByCol(sample.df, 1L) sortByCol(sample.df, 1L, decreasing=FALSE) sortByCol(sample.df, c(3L, 1L)) sortByCol(sample.df, c(3L, 1L), decreasing=FALSE) sortByCol(sample.df, c(3L, 2L)) sortByCol(sample.df, c(TRUE, FALSE, TRUE)) sortByCol(sample.df, c("teams", "pts")) sortByCol(sample.df, c("pts", "number", "teams")) sortByCol(sample.df, c("pts", "teams", "number"))sample.df <- data.frame(teams=c("HSV", "BVB", "FCB", "FCN"),pts=c(18,17,17,9), number=c(7,7,6,6)) sortByCol(sample.df, 1L) sortByCol(sample.df, 1L, decreasing=FALSE) sortByCol(sample.df, c(3L, 1L)) sortByCol(sample.df, c(3L, 1L), decreasing=FALSE) sortByCol(sample.df, c(3L, 2L)) sortByCol(sample.df, c(TRUE, FALSE, TRUE)) sortByCol(sample.df, c("teams", "pts")) sortByCol(sample.df, c("pts", "number", "teams")) sortByCol(sample.df, c("pts", "teams", "number"))
Rearrange rows and columns of a matrix by dim names
sortByDimnames(x, row.decreasing = FALSE, col.decreasing = FALSE)sortByDimnames(x, row.decreasing = FALSE, col.decreasing = FALSE)
x |
A matrix or data.frame |
row.decreasing |
Logical, whether rows should be sorted decreasingly |
col.decreasing |
Logical, whether columns should be sorted decreasingly |
Resorted matrix or data frame
Jitao David Zhang <[email protected]>
testMat <- matrix(1:16, nrow=4, dimnames=list(c("B", "D", "A", "C"), c("t", "f", "a", "g"))) sortByDimnames(testMat) sortByDimnames(testMat, row.decreasing=TRUE, col.decreasing=FALSE)testMat <- matrix(1:16, nrow=4, dimnames=list(c("B", "D", "A", "C"), c("t", "f", "a", "g"))) sortByDimnames(testMat) sortByDimnames(testMat, row.decreasing=TRUE, col.decreasing=FALSE)
Tokenize strings by character in a similar way as the strsplit
function in the base package. The function can return a matrix of
tokenized items when index is missing. If index is given,
tokenized items in the selected position(s) are returned. See examples.
strtoken(x, split, index, ...)strtoken(x, split, index, ...)
x |
A vector of character strings; non-character vectors are cast into characters. |
split |
A character to split the strings. |
index |
Numeric vector indicating which fields should be returned; if
missing or set to |
... |
Other parameters passed to |
A matrix if index is missing, NULL, or contains more
than one integer indices; otherwise a character vector.
Jitao David Zhang <[email protected]>
The main body of the function is modified from the
strsplit2 function in the limma package.
myStr <- c("HSV\t1887", "FCB\t1900", "FCK\t1948") strsplit(myStr, "\t") strtoken(myStr, "\t") strtoken(myStr, "\t", index=1L) strtoken(myStr, "\t", index=2L) myFac <- factor(myStr) strtoken(myFac, "\t") strtoken(myFac, "\t", index=1L)myStr <- c("HSV\t1887", "FCB\t1900", "FCK\t1948") strsplit(myStr, "\t") strtoken(myStr, "\t") strtoken(myStr, "\t", index=1L) strtoken(myStr, "\t", index=2L) myFac <- factor(myStr) strtoken(myFac, "\t") strtoken(myFac, "\t", index=1L)
stubborngc repeats collecting garbage untill no more resource can be freed
stubborngc(verbose = FALSE, reset = TRUE)stubborngc(verbose = FALSE, reset = TRUE)
verbose |
Logical, verbose or not |
reset |
Logical, reset or not. |
Side effect is used.
Jitao David Zhang <[email protected]>
stubborngc()stubborngc()
The function calls assertColumnName internally to match the
column names.
subsetByColumnName(data.frame, reqCols, ignore.case = FALSE)subsetByColumnName(data.frame, reqCols, ignore.case = FALSE)
data.frame |
A data.frame object |
reqCols |
required columns |
ignore.case |
logical, whether the case is considered |
If all required column names are present, the data.frame object will be subset to include only these columns and the result data.frame is returned. Otherwise an error message is printed.
myTestDf <- data.frame(HBV=1:3, VFB=0:2, BVB=4:6, FCB=2:4) myFavTeams <- c("HBV", "BVB") subsetByColumnName(myTestDf, myFavTeams) myFavTeamsCase <- c("hbv", "bVb") subsetByColumnName(myTestDf, myFavTeamsCase, ignore.case=TRUE)myTestDf <- data.frame(HBV=1:3, VFB=0:2, BVB=4:6, FCB=2:4) myFavTeams <- c("HBV", "BVB") subsetByColumnName(myTestDf, myFavTeams) myFavTeamsCase <- c("hbv", "bVb") subsetByColumnName(myTestDf, myFavTeamsCase, ignore.case=TRUE)
Apply a function to summarize rows/columns that assigned to the same level by a factor vector.
summarizeRows(matrix, factor, fun = mean, ...)summarizeRows(matrix, factor, fun = mean, ...)
matrix |
A numeric matrix |
factor |
A vector of factors, either of the length of
|
fun |
A function or a name for a function, the summarizing function applied to rows/columns sharing the same level |
... |
Further parameters passed to the function |
NA levels are neglected, and corresponding rows/columns will not
contribute to the summarized matrix.
summarizeCols is synonymous to summarizeColumns
A matrix, the dimension will be determined by the number of levels of the factor vector.
Jitao David Zhang <[email protected]>
my.matrix <- matrix(1:25, nrow=5) print(my.matrix) my.factor <- factor(c("A", "B", "A", "C", "B")) summarizeRows(matrix=my.matrix, factor=my.factor, fun=mean) summarizeRows(matrix=my.matrix, factor=my.factor, fun=prod) summarizeColumns(matrix=my.matrix, factor=my.factor, fun=mean) summarizeColumns(matrix=my.matrix, factor=my.factor, fun=prod) ## NA values in factor my.na.factor <- factor(c("A", "B", "A", "C", NA)) summarizeRows(matrix=my.matrix, factor=my.na.factor, fun=mean) summarizeRows(matrix=my.matrix, factor=my.na.factor, fun=prod) summarizeColumns(matrix=my.matrix, factor=my.na.factor, fun=mean) summarizeColumns(matrix=my.matrix, factor=my.na.factor, fun=prod)my.matrix <- matrix(1:25, nrow=5) print(my.matrix) my.factor <- factor(c("A", "B", "A", "C", "B")) summarizeRows(matrix=my.matrix, factor=my.factor, fun=mean) summarizeRows(matrix=my.matrix, factor=my.factor, fun=prod) summarizeColumns(matrix=my.matrix, factor=my.factor, fun=mean) summarizeColumns(matrix=my.matrix, factor=my.factor, fun=prod) ## NA values in factor my.na.factor <- factor(c("A", "B", "A", "C", NA)) summarizeRows(matrix=my.matrix, factor=my.na.factor, fun=mean) summarizeRows(matrix=my.matrix, factor=my.na.factor, fun=prod) summarizeColumns(matrix=my.matrix, factor=my.na.factor, fun=mean) summarizeColumns(matrix=my.matrix, factor=my.na.factor, fun=prod)
The function trims leading and/or tailing spaces from string(s), using C function implemented in the BIOS library.
trim(x, left = " \n\r\t", right = " \n\r\t")trim(x, left = " \n\r\t", right = " \n\r\t")
x |
A character string, or a vector of strings |
left |
Characters that are trimmed from the left side. |
right |
Characters that are trimmed from the right side |
left and right can be set to NULL. In such cases no trimming
will be performed.
Trimmed string(s)
Jitao David Zhang <[email protected]>
myStrings <- c("This is a fine day\n", " Hallo Professor!", " NUR DER HSV ") trim(myStrings)myStrings <- c("This is a fine day\n", " Hallo Professor!", " NUR DER HSV ") trim(myStrings)
Length of unique elements in a vector
uniqueLength(x, incomparables = FALSE)uniqueLength(x, incomparables = FALSE)
x |
A vector |
incomparables |
See |
An integer indicating the number of unique elements in the input vector
Jitao David Zhang <[email protected]>
unique
test.vec1 <- c("HSV", "FCB", "BVB", "HSV", "BVB") uniqueLength(test.vec1) test.vec2 <- c(1L, 2L, 3L, 5L, 3L, 4L, 2L, 1L, 5L) ulen(test.vec2)test.vec1 <- c("HSV", "FCB", "BVB", "HSV", "BVB") uniqueLength(test.vec1) test.vec2 <- c(1L, 2L, 3L, 5L, 3L, 4L, 2L, 1L, 5L) ulen(test.vec2)
Make a vector free of NA and unique
uniqueNonNA(x)uniqueNonNA(x)
x |
A vector |
A unique vector without NA
testVec <- c(3,4,5,NA,3,5) uniqueNonNA(testVec)testVec <- c(3,4,5,NA,3,5) uniqueNonNA(testVec)
The verbose level can be represented by non-negative integers. The larger the number is, the more verbose is the program: it prints then more messages for users' information.
verbose(..., global = 1L, this = 1L)verbose(..., global = 1L, this = 1L)
... |
Messages to be printed, will be passed to the |
global |
Integer, the global verbose level |
this |
Integer, the verbose level of this message |
This function decides whether or not to print a message, dependent on the global verbose level and the specific level of the message. If the specific level is larger than the global level, the message is suppresed; otherwise it is printed. see the details section for an example.
Suppose the global verbose level is set to 5, and two messages have
levels of 1 and 7 repsectively. Since 1 suggests a
low-threshold of being verbose, the first message is printed; whereas the
message of level 7 is only printed when the program should run in a
more verbose way (7,8,9,...{}), it is suppressed in the current
global verbose level.
The function is used for its side effect by printing messages.
Jitao David Zhang <[email protected]>
Gv <- 5L verbose("Slightly verbosing", global=Gv, this=1L) verbose("Moderately verbosing", global=Gv, this=5L) verbose("Heavily verbosing", global=Gv, this=9L)Gv <- 5L verbose("Slightly verbosing", global=Gv, this=1L) verbose("Moderately verbosing", global=Gv, this=5L) verbose("Heavily verbosing", global=Gv, this=9L)
Translate well index numbers to well positions
wellIndex2position(ind, format = c("96", "384"))wellIndex2position(ind, format = c("96", "384"))
ind |
Well index, integer numbers starting from 1, running rowwise. Non-integer parameters will be coereced to integers. |
format |
Character string, well format |
A data.frame containing three columns: input WellIndex, Row (characters) and Column (integers)
wellIndex2position(1:96, format="96") wellIndex2position(c(3,2,5,34,85, NA), format="96") wellIndex2position(1:384, format="384")wellIndex2position(1:96, format="96") wellIndex2position(c(3,2,5,34,85, NA), format="96") wellIndex2position(1:384, format="384")
System user name
whoami()whoami()
System user name
whoami()whoami()
The function writeLog can be used to log outputs and/or running
status of scripts to one connection. To use it one does not
need to run registerLog first.
writeLog(fmt, ..., con = stdout(), level = 0)writeLog(fmt, ..., con = stdout(), level = 0)
fmt |
Format string to passed on to sprintf |
... |
Parameters passed on to sprintf |
con |
A connection, for instance a file (or its name) or
|
level |
Logging level: each higher level will add one extra space before the message. See examples |
In contrast, doLog can be used to log on multiple connections that
are registered by registerLog. Therefore, to register logger(s) with
registerLog is a prerequisite of calling doLog. Internally
doLog calls writeLog sequentially to make multiple-connection
logging.
Side effect is used.
Jitao David Zhang <[email protected]>
registerLog to register more than one loggers so that
doLog can write to them sequentially.
## the following code section is not run to prevent issues with pkgdown ## Not run: writeLog("This is the start of a log") writeLog("Message 1", level=1) writeLog("Message 1.1", level=2) writeLog("Message 1.2", level=2) writeLog("Message 2", level=1) writeLog("Message 3", level=1) writeLog("Message 3 (special)", level=4) writeLog("End of the log"); ## log with format writeLog("This is Message %d", 1) writeLog("Square of 2 is %2.2f", sqrt(2)) ## NA is handled automatically writeLog("This is a not available value: %s", NA, level=1) writeLog("This is a NULL value: %s", NULL, level=1) ## End(Not run)## the following code section is not run to prevent issues with pkgdown ## Not run: writeLog("This is the start of a log") writeLog("Message 1", level=1) writeLog("Message 1.1", level=2) writeLog("Message 1.2", level=2) writeLog("Message 2", level=1) writeLog("Message 3", level=1) writeLog("Message 3 (special)", level=4) writeLog("End of the log"); ## log with format writeLog("This is Message %d", 1) writeLog("Square of 2 is %2.2f", sqrt(2)) ## NA is handled automatically writeLog("This is a not available value: %s", NA, level=1) writeLog("This is a NULL value: %s", NULL, level=1) ## End(Not run)