| Title: | Annotation of Genes, RNAs, and Proteins in 'ribios' |
|---|---|
| Description: | Retrieves annotation information of genomic features including genes, RNAs, and proteins from databases. It supports querying by gene identifiers, gene symbols, UniProt accessions, Ensembl identifiers, and RefSeq identifiers, as well as mapping orthologs across species using NCBI data. |
| Authors: | Jitao David Zhang [aut, cre] (ORCID: <https://orcid.org/0000-0002-3085-0909>) |
| Maintainer: | Jitao David Zhang <[email protected]> |
| License: | GPL-3 |
| Version: | 3.8.0 |
| Built: | 2026-05-17 09:15:02 UTC |
| Source: | https://github.com/bedapub/ribiosAnnotation |
This annotates any identifies that can be recognized by GTI.
annotateAnyIDs(ids, orthologue = FALSE, multiOrth = FALSE)annotateAnyIDs(ids, orthologue = FALSE, multiOrth = FALSE)
ids |
A vector of identifiers. They must be of the same type. Supported types include Entrez GeneID, GeneSymbol, Probesets, UniProt identifiers, NCBI RefSeq mRNA identifiers, and Ensembl gene identifiers (with possible version suffixes). |
orthologue |
Logical, is orthologous mapping needed? |
multiOrth |
Logical, is more than one orthologs allowed |
A data.frame containing annotation information. Following
columns exist at least:
Input Input string, it will be in the first column.
IDType Input ID type
GeneID (Human) Entrez GeneID
GeneSymbol (Human) official gene symbol
GeneName (Human) gene name
TaxID NCBI taxonomy ID
Jitao David Zhang <[email protected]>
annotateGeneIDs, annotateGeneSymbols
## Not run: # GeneID annotateAnyIDs(ids=c(780, 5982, 3310, NA)) # GeneSymbol annotateAnyIDs(ids=c("DDR1", "RFC2", "HSPA6", "HSAP6")) # Probesets myprobes <- c("1000_at", "1004_at", "1002_f_at", "nonsense_at") annotateAnyIDs(myprobes) # UniProt annotateAnyIDs(ids=c("P38398", "Q8NDF8")) # EnsEMBL ensemblIDs <- c("ENSG00000197535", "ENST00000399231.7", "ENSP00000418960.2") annotateAnyIDs(ensemblIDs) # RefSeq annotateAnyIDs(c("NM_000235", "NM_000498")) ## End(Not run)## Not run: # GeneID annotateAnyIDs(ids=c(780, 5982, 3310, NA)) # GeneSymbol annotateAnyIDs(ids=c("DDR1", "RFC2", "HSPA6", "HSAP6")) # Probesets myprobes <- c("1000_at", "1004_at", "1002_f_at", "nonsense_at") annotateAnyIDs(myprobes) # UniProt annotateAnyIDs(ids=c("P38398", "Q8NDF8")) # EnsEMBL ensemblIDs <- c("ENSG00000197535", "ENST00000399231.7", "ENSP00000418960.2") annotateAnyIDs(ensemblIDs) # RefSeq annotateAnyIDs(c("NM_000235", "NM_000498")) ## End(Not run)
Annotate Enesembl GeneIDs
annotateEnsemblGeneIDs(ids, orthologue = FALSE, multiOrth = FALSE)annotateEnsemblGeneIDs(ids, orthologue = FALSE, multiOrth = FALSE)
ids |
A vector of EnsemblGeneIDs in form of
|
orthologue |
Logical, whether human orthologues should be returned.
Default: |
multiOrth |
Logical, whether mutliple orthologues should be returned if
exist. Deafult: |
A data.frame object containing the annotations:
* GeneID EntrezGeneID
* GeneSymbol Official gene symbol
* Description Gene description
* TaxID Taxonomy ID
* Type Gene type
If orthologue is TRUE, following columns are appended:
* HumanGeneID
* HumanGeneSymbol
* HumanDescription
* HumanType
annotateEnsemblGeneIDsWithoutHumanOrtholog and
annotateEnsemblGeneIDsWithHumanOrtholog
## Not run: annotateEnsemblGeneIDs(ids=c("ENSG00000236453", "ENSG00000170782", "ENSG00000187867")) annotateEnsemblGeneIDs(ids=c("ENSG00000236453", "ENSG00000170782", "ENSG00000187867", NA), orthologue=TRUE) annotateEnsemblGeneIDs(ids=c("ENSG00000174827", "ENSMUSG00000038298", "ENSG00000198483", "ENSMUSG00000038354", "ENSRNOG00000054947", "ENSG00000278099"), orthologue=TRUE) ## End(Not run)## Not run: annotateEnsemblGeneIDs(ids=c("ENSG00000236453", "ENSG00000170782", "ENSG00000187867")) annotateEnsemblGeneIDs(ids=c("ENSG00000236453", "ENSG00000170782", "ENSG00000187867", NA), orthologue=TRUE) annotateEnsemblGeneIDs(ids=c("ENSG00000174827", "ENSMUSG00000038298", "ENSG00000198483", "ENSMUSG00000038354", "ENSRNOG00000054947", "ENSG00000278099"), orthologue=TRUE) ## End(Not run)
Annotate EnsEMBL GeneID with data from EnsEMBL
annotateEnsemblGeneIDsWithEnsembl(ids)annotateEnsemblGeneIDsWithEnsembl(ids)
ids |
Character strings, Ensembl GeneIDs in form of
|
The ensembl_genes collection is used. Note that Ensembl
IDs often refer to novel transcripts which do not have identifiers in other
databases like NCBI Genes. If an EnsemblID is invalid or obsolete, the fields
GeneName and TaxID will be NA.
A data.frame containing following columns:
EnsemblID: The input EnsemblID
GeneID: NCBI GeneID
GeneSymbol: Official gene symbol
Description: Gene description
TaxID: Taxonomy ID
This function uses data from EnsEMBL to annotate EnsEMBL GeneIDs. For most
users, it is recommended to use annotateEnsemblGeneIDs,
because it uses both data from EnsEMBL and data from NCBI to perform the
task.
Function annotateEnsemblGeneIDsWithNCBI annotates
EnsEMBL GeneIDs with data from NCBI, and annotateEnsemblGeneIDs
annotates EnsEMBL GeneIDs with both data from EnsEMBL and data from NCBI.
## Not run: ensIDs <- readLines(system.file(file.path("extdata/ribios_annotate_testdata", "ensemble_geneids.txt"), package="ribiosAnnotation")) ensAnno <- annotateEnsemblGeneIDsWithEnsembl(ensIDs) ## End(Not run)## Not run: ensIDs <- readLines(system.file(file.path("extdata/ribios_annotate_testdata", "ensemble_geneids.txt"), package="ribiosAnnotation")) ensAnno <- annotateEnsemblGeneIDsWithEnsembl(ensIDs) ## End(Not run)
Annotate Ensembl GeneIDs while appending human orthologs
annotateEnsemblGeneIDsWithHumanOrtholog(ids, multiOrth = FALSE)annotateEnsemblGeneIDsWithHumanOrtholog(ids, multiOrth = FALSE)
ids |
A vector of character strings, Ensembl GeneIDs in form of
|
multiOrth |
Logical, whether mutliple orthologues should be returned if
exist. Deafult: |
A data.frame containing following columns:
EnsemblID: The input EnsemblID
GeneID: NCBI GeneID
GeneSymbol: Official gene symbol
Description: Gene description
TaxID: Taxonomy ID
Type: Gene type
HumanGeneID: NCBI GeneID of the human orthologue
HumanGeneSymbol: Official gene symbol of the human orthologue
HumanDescription: Gene description of the human orthologue
HumanType: Gene type of the human orthologue
Currently the human orthologs are looked up in NCBI. It remains to be changed to EnsEMBL
## Not run: ensIDs <- readLines(system.file(file.path("extdata/ribios_annotate_testdata", "ensemble_geneids.txt"), package="ribiosAnnotation")) enAnnoHumanOrt <- annotateEnsemblGeneIDsWithHumanOrtholog(ensIDs) ## End(Not run)## Not run: ensIDs <- readLines(system.file(file.path("extdata/ribios_annotate_testdata", "ensemble_geneids.txt"), package="ribiosAnnotation")) enAnnoHumanOrt <- annotateEnsemblGeneIDsWithHumanOrtholog(ensIDs) ## End(Not run)
Annotate EnsEMBL GeneID with data from NCBI
annotateEnsemblGeneIDsWithNCBI(ids)annotateEnsemblGeneIDsWithNCBI(ids)
ids |
Character strings, Ensembl GeneIDs in form of
|
The ncbi_gene2ensembl collection is used.
A data.frame containing following columns:
EnsemblID: The input EnsemblID
GeneID: NCBI GeneID
GeneSymbol: Official gene symbol
Description: Gene description
TaxID: Taxonomy ID
Type: Gene type
This function uses data from NCBI to annotate EnsEMBL GeneIDs. For most
users, it is recommended to use annotateEnsemblGeneIDs,
because it uses both data from EnsEMBL and data from NCBI to perform the
task.
Function annotateEnsemblGeneIDsWithEnsembl annotates
EnsEMBL GeneIDs with data from Ensembl, and
annotateEnsemblGeneIDs annotates EnsEMBL GeneIDs with both
data from EnsEMBL and data from NCBI.
## Not run: ensIDs <- readLines(system.file(file.path("extdata/ribios_annotate_testdata", "ensemble_geneids.txt"), package="ribiosAnnotation")) ncbiAnno <- annotateEnsemblGeneIDsWithNCBI(ensIDs) ## End(Not run)## Not run: ensIDs <- readLines(system.file(file.path("extdata/ribios_annotate_testdata", "ensemble_geneids.txt"), package="ribiosAnnotation")) ncbiAnno <- annotateEnsemblGeneIDsWithNCBI(ensIDs) ## End(Not run)
Annotate Ensembl GeneIDs with data from both EnsEMBL and NCBI
annotateEnsemblGeneIDsWithoutHumanOrtholog(ids)annotateEnsemblGeneIDsWithoutHumanOrtholog(ids)
ids |
A vector of character strings, Ensembl GeneIDs in form of
|
First, both EnsEMBL and NCBI annotation is queried. Next, we use the NCBI annotation as the template. Finally, we take the EnsEMBL annotation for those genes that are annotated by EnsEMBL but not by NCBI, merging the information from both sources.
A data.frame containing following columns:
EnsemblID: The input EnsemblID
GeneID: NCBI GeneID
GeneSymbol: Official gene symbol
Description: Gene description
TaxID: Taxonomy ID
Type: Gene type
## Not run: ensIDs <- readLines(system.file(file.path("extdata/ribios_annotate_testdata", "ensemble_geneids.txt"), package="ribiosAnnotation")) enAnno <- annotateEnsemblGeneIDsWithoutHumanOrtholog(ensIDs) ## End(Not run)## Not run: ensIDs <- readLines(system.file(file.path("extdata/ribios_annotate_testdata", "ensemble_geneids.txt"), package="ribiosAnnotation")) enAnno <- annotateEnsemblGeneIDsWithoutHumanOrtholog(ensIDs) ## End(Not run)
Annotate Entrez GeneIDs
annotateGeneIDs(ids, orthologue = FALSE, multiOrth = FALSE)annotateGeneIDs(ids, orthologue = FALSE, multiOrth = FALSE)
ids |
A vector of integers or characters, encoding NCBI Entrez GeneIDs.
It can contain |
orthologue |
Logical, whether human orthologues should be returned.
Default: |
multiOrth |
Logical, whether mutliple orthologues should be returned if
exist. Deafult: |
A data.frame object containing the annotations:
* GeneID EntrezGeneID
* GeneSymbol Official gene symbol
* Description Gene description
* TaxID Taxonomy ID
* Type Gene type
If orthologue is TRUE, following columns are appended:
* HumanGeneID
* HumanGeneSymbol
* HumanDescription
* HumanType
annotateGeneIDsWithoutHumanOrtholog and
annotateGeneIDsWithHumanOrtholog
## Not run: annotateGeneIDs(ids=c(780, 5982, 3310)) annotateGeneIDs(ids=c(780, 5982, 3310, NA), orthologue=TRUE) annotateGeneIDs(ids=c(780, 1506, 1418, 114483548, 57300, 20, 1506, 102129055), orthologue=TRUE) ## End(Not run)## Not run: annotateGeneIDs(ids=c(780, 5982, 3310)) annotateGeneIDs(ids=c(780, 5982, 3310, NA), orthologue=TRUE) annotateGeneIDs(ids=c(780, 1506, 1418, 114483548, 57300, 20, 1506, 102129055), orthologue=TRUE) ## End(Not run)
Annotate Entrez GeneIDs with the query of human orthologs
annotateGeneIDsWithHumanOrtholog(ids, multiOrth = FALSE)annotateGeneIDsWithHumanOrtholog(ids, multiOrth = FALSE)
ids |
Vector of integer or character strings, EntrezIDs to be annotated |
multiOrth |
Logical, whether mutliple orthologues should be returned if
exist. Deafult: |
A data.frame object containing the annotations:
* GeneID EntrezGeneID
* GeneSymbol Official gene symbol
* Description Gene description
* TaxID Taxonomy ID
* Type Gene type
* HumanGeneID
* HumanGeneSymbol
* HumanDescription
* HumanType
## Not run: annotateGeneIDsWithHumanOrtholog(ids=c(780, 5982, 3310)) annotateGeneIDsWithHumanOrtholog(ids=c(780, 5982, 3310, NA)) annotateGeneIDsWithHumanOrtholog(ids=c(780, 5982, 3310, NULL)) annotateGeneIDsWithHumanOrtholog(ids=c(780, 5982, 3310, "1418")) annotateGeneIDsWithHumanOrtholog(ids=c(780, 5982, 3310, "NotValidGeneID")) annotateGeneIDsWithHumanOrtholog(ids=c(780, 5982, 3310, 1418, 5982)) annotateGeneIDsWithHumanOrtholog(ids=c(780, 5982, 1418, 5982, 25120, 114483548, 57300, 20, 1506, 1545, 102129055)) ## End(Not run)## Not run: annotateGeneIDsWithHumanOrtholog(ids=c(780, 5982, 3310)) annotateGeneIDsWithHumanOrtholog(ids=c(780, 5982, 3310, NA)) annotateGeneIDsWithHumanOrtholog(ids=c(780, 5982, 3310, NULL)) annotateGeneIDsWithHumanOrtholog(ids=c(780, 5982, 3310, "1418")) annotateGeneIDsWithHumanOrtholog(ids=c(780, 5982, 3310, "NotValidGeneID")) annotateGeneIDsWithHumanOrtholog(ids=c(780, 5982, 3310, 1418, 5982)) annotateGeneIDsWithHumanOrtholog(ids=c(780, 5982, 1418, 5982, 25120, 114483548, 57300, 20, 1506, 1545, 102129055)) ## End(Not run)
Annotate Entrez GeneIDs without querying human orthologs
annotateGeneIDsWithoutHumanOrtholog(ids)annotateGeneIDsWithoutHumanOrtholog(ids)
ids |
A vector of integers or characters, encoding NCBI Entrez GeneIDs.
It can contain |
The collection ncbi_gene_info is used.
A data.frame object containing the annotations:
* GeneID EntrezGeneID
* GeneSymbol Official gene symbol
* Description Gene description
* TaxID Taxonomy ID
* Type Gene type
annotatemRNAs is an alias of annotateRefSeqs
Jitao David Zhang <[email protected]>
## Not run: annotateGeneIDsWithoutHumanOrtholog(ids=c(780, 5982, 3310)) annotateGeneIDsWithoutHumanOrtholog(ids=c(780, 5982, 3310, NA)) annotateGeneIDsWithoutHumanOrtholog(ids=c(780, 5982, 3310, NULL)) annotateGeneIDsWithoutHumanOrtholog(ids=c(780, 5982, 3310, "1418")) annotateGeneIDsWithoutHumanOrtholog(ids=c(780, 5982, 3310, "NotValidGeneID")) annotateGeneIDsWithoutHumanOrtholog(ids=c(780, 5982, 3310, 1418, 5982)) annotateGeneIDsWithoutHumanOrtholog(ids=c(780, 5982, 1418, 5982, 25120, 114483548, 57300, 20, 1506, 1545, 102129055)) ## End(Not run)## Not run: annotateGeneIDsWithoutHumanOrtholog(ids=c(780, 5982, 3310)) annotateGeneIDsWithoutHumanOrtholog(ids=c(780, 5982, 3310, NA)) annotateGeneIDsWithoutHumanOrtholog(ids=c(780, 5982, 3310, NULL)) annotateGeneIDsWithoutHumanOrtholog(ids=c(780, 5982, 3310, "1418")) annotateGeneIDsWithoutHumanOrtholog(ids=c(780, 5982, 3310, "NotValidGeneID")) annotateGeneIDsWithoutHumanOrtholog(ids=c(780, 5982, 3310, 1418, 5982)) annotateGeneIDsWithoutHumanOrtholog(ids=c(780, 5982, 1418, 5982, 25120, 114483548, 57300, 20, 1506, 1545, 102129055)) ## End(Not run)
Annotate GeneSymbols
annotateGeneSymbols(ids, taxId = 9606, orthologue = FALSE, multiOrth = FALSE)annotateGeneSymbols(ids, taxId = 9606, orthologue = FALSE, multiOrth = FALSE)
ids |
Character strings, gene symbols |
taxId |
Integer, NCBI taxonomy ID. Default value: 9606 (human). See |
orthologue |
Logical, whether orthologues are to be returned |
multiOrth |
Logical, only valid when orthologue is set to TRUE, whether multiple orthologues are returned |
A data.frame containing following columns
Entrez Gene ID
Official gene symbols
Description
NCBI Taxonomy ID
Gene type
If orthologue is TRUE, then additional columns are appended:
Human orthologue Entrez GeneID
Human orthologue official gene symbol
Human orthologue gene description
Human orthologue gene type
The function is a convenient wrapper of two functions: annotateGeneSymbolsWithoutHumanOrtholog and annotateGeneSymbolsWithHumanOrtholog.
## Not run: annotateGeneSymbols(c("AKT1", "ERBB2", "NoSuchAGene", "TGFBR1"), 9606) annotateGeneSymbols(c("Akt1", "Erbb2", "NoSuchAGene", "Tlr7"), taxId=10116, orthologue=FALSE) annotateGeneSymbols(c("Akt1", "Erbb2", "NoSuchAGene", "Tlr7"), taxId=10116, orthologue=TRUE) annotateGeneSymbols(c("Akt1", "Erbb2", "NoSuchAGene", "Tlr7"), taxId=10116, orthologue=TRUE, multiOrth=TRUE) ## End(Not run)## Not run: annotateGeneSymbols(c("AKT1", "ERBB2", "NoSuchAGene", "TGFBR1"), 9606) annotateGeneSymbols(c("Akt1", "Erbb2", "NoSuchAGene", "Tlr7"), taxId=10116, orthologue=FALSE) annotateGeneSymbols(c("Akt1", "Erbb2", "NoSuchAGene", "Tlr7"), taxId=10116, orthologue=TRUE) annotateGeneSymbols(c("Akt1", "Erbb2", "NoSuchAGene", "Tlr7"), taxId=10116, orthologue=TRUE, multiOrth=TRUE) ## End(Not run)
Annotate GeneSymbol with human ortholog
annotateGeneSymbolsWithHumanOrtholog(ids, taxId, multiOrth = FALSE)annotateGeneSymbolsWithHumanOrtholog(ids, taxId, multiOrth = FALSE)
ids |
Character strings, gene symbols |
taxId |
Integer, NCBI taxonomy ID. Default value: 9606 (human). See |
multiOrth |
Logical, only valid when orthologue is set to TRUE, whether multiple orthologues are returned |
A data.frame containing following columns:
Entrez Gene ID
Official gene symbols
Description
NCBI Taxonomy ID
Gene type
Human orthologue Entrez GeneID
Human orthologue official gene symbol
Human orthologue gene description
Human orthologue gene type
## Not run: annotateGeneSymbolsWithHumanOrtholog(c("Akt1", "Erbb2", "NoSuchAGene", "Tgfbr1"), taxId=10090, multiOrth=FALSE) ## End(Not run)## Not run: annotateGeneSymbolsWithHumanOrtholog(c("Akt1", "Erbb2", "NoSuchAGene", "Tgfbr1"), taxId=10090, multiOrth=FALSE) ## End(Not run)
Annotate gene symbols without human ortholog
annotateGeneSymbolsWithoutHumanOrtholog(ids, taxId = 9606)annotateGeneSymbolsWithoutHumanOrtholog(ids, taxId = 9606)
ids |
Character vector, gene symbols to be queried |
taxId |
Integer, NCBI taxonomy ID of the species. Default value: 9606 (human). See |
A data.frame of following columns
Entrez Gene ID
Official gene symbols
Description
NCBI Taxonomy ID
Gene type
## Not run: annotateGeneSymbolsWithoutHumanOrtholog(c("AKT1", "ERBB2", "NoSuchAGene", "TGFBR1"), 9606) annotateGeneSymbolsWithoutHumanOrtholog(c("Akt1", "Erbb2", "NoSuchAGene", "Tgfbr1"), 10090) ## End(Not run)## Not run: annotateGeneSymbolsWithoutHumanOrtholog(c("AKT1", "ERBB2", "NoSuchAGene", "TGFBR1"), 9606) annotateGeneSymbolsWithoutHumanOrtholog(c("Akt1", "Erbb2", "NoSuchAGene", "Tgfbr1"), 10090) ## End(Not run)
Annotate human orthologs with data from NCBI
annotateHumanOrthologsWithNCBI(geneids, multiOrth = FALSE)annotateHumanOrthologsWithNCBI(geneids, multiOrth = FALSE)
geneids |
Integer GeneIDs, can contain human GeneIDs. |
multiOrth |
Logical, whether one gene is allowed to map to multiple
human orthologs? Default value is |
ncbi_gene_info and ncbi_gene_orthologs collections
are used.
A data.frame containing following columns:
* GeneID: Input GeneID
* TaxID: Taxonomy ID of the input gene
* HumanGeneID: Human Entrez GeneID
This function annotates human orthologs for any GeneID, including human
genes, in which case the ortholog will be itself. Use
annotateNonHumanGenesHumanOrthologsWithNCBI if you are sure
that input GeneIDs do not come from human.
annotateNonHumanGenesHumanOrthologsWithNCBI
## Not run: annotateHumanOrthologsWithNCBI(c(25120, 114483548, 57300, 20)) annotateHumanOrthologsWithNCBI(c(25120, 114483548, 57300, 20, 1506, 1545, 102129055)) ## End(Not run)## Not run: annotateHumanOrthologsWithNCBI(c(25120, 114483548, 57300, 20)) annotateHumanOrthologsWithNCBI(c(25120, 114483548, 57300, 20, 1506, 1545, 102129055)) ## End(Not run)
Annotate human orthologs with data from NCBI
annotateNonHumanGenesHumanOrthologsWithNCBI(geneids, multiOrth = FALSE)annotateNonHumanGenesHumanOrthologsWithNCBI(geneids, multiOrth = FALSE)
geneids |
Integer GeneIDs, can contain human GeneIDs. |
multiOrth |
Logical, whether one gene is allowed to map to multiple
human orthologs? Default value is |
ncbi_gene_info and ncbi_gene_orthologs collections
are used.
A data.frame containing following columns:
* GeneID: Input GeneID
* TaxID: Taxonomy ID of the input gene
* HumanGeneID: Human Entrez GeneID
This function annotates human orthologs for any non-human genes. Use
annotateHumanOrthologsWithNCBI if you are not sure
whether all input GeneIDs are non-human.
annotateHumanOrthologsWithNCBI
## Not run: annotateNonHumanGenesHumanOrthologsWithNCBI(c(25120, 114483548, 57300, 20)) annotateNonHumanGenesHumanOrthologsWithNCBI(c(25120, 114483548, 57300, 20, 1506, 1545, 102129055)) ## End(Not run)## Not run: annotateNonHumanGenesHumanOrthologsWithNCBI(c(25120, 114483548, 57300, 20)) annotateNonHumanGenesHumanOrthologsWithNCBI(c(25120, 114483548, 57300, 20, 1506, 1545, 102129055)) ## End(Not run)
Annotate protein groups for proteomics studies
annotateProteinGroups( ids, delimiter = ";", orthologue = FALSE, multiOrth = FALSE )annotateProteinGroups( ids, delimiter = ";", orthologue = FALSE, multiOrth = FALSE )
ids |
Character, Protein groups with identifiers (e.g. UniProt/SwissProt IDs) by the delimiter |
delimiter |
Character, delimiter, default semicolon |
orthologue |
Logical, whether human orthologues should be queried. |
multiOrth |
Logical, in case of multiple orthologues, whether all of them should be returned The function queries proteins in protein groups, and annotate all proteins that cannot be annotated. For protein groups in which no protein can be annotated, all proteins will be returned as they are, without annotation |
A data.frame with following columns:
* ProteinGroup
* Protein
* GeneID
* GeneSymbol
* GeneName
* TaxID
In case orthologue is TRUE, human orthologue information is
returned as well.
## Not run: annotateProteinGroups(c("A0A024RBG1;Q9NZJ9", "A0A0B4J2D5;P0DPI2", "A0A0B4J2F0;A0A0U1RRL7")) ## End(Not run)## Not run: annotateProteinGroups(c("A0A024RBG1;Q9NZJ9", "A0A0B4J2D5;P0DPI2", "A0A0B4J2F0;A0A0U1RRL7")) ## End(Not run)
The function returns annotations (see details below) of all features (probably probesets) associated with the given taxon.
annotateTaxID(taxId, orthologue = FALSE, multiOrth = FALSE)annotateTaxID(taxId, orthologue = FALSE, multiOrth = FALSE)
taxId |
Integer, the TaxID of the species in interest. For instance ‘9606’ for Homo sapiens. |
orthologue |
Logical, whether human orthologues should be returned |
multiOrth |
Logical, in case |
The function reads from the backend, the MongoDB bioinfo database.
A data.frame object with very similar structure as the
EG_GENE_INFO table in the database. In case orthologue is TRUE, additional
columns containing human orthologue information are returned.
Rownames of the data.frame are set to NULL.
Jitao David Zhang <[email protected]>
## Not run: hsAnno <- annotateTaxID("9606") dim(hsAnno) head(hsAnno) hsMtAnno <- annotateTaxID("10092") dim(hsMtAnno) head(hsMtAnno) mtOrthAnno <- annotateTaxID(10090, orthologue=TRUE) dim(mtOrthAnno) head(mtOrthAnno) pigMultiOrthAnno <- annotateTaxID(9823, orthologue=TRUE, multiOrth=TRUE) dim(pigMultiOrthAnno) head(pigMultiOrthAnno) ## End(Not run)## Not run: hsAnno <- annotateTaxID("9606") dim(hsAnno) head(hsAnno) hsMtAnno <- annotateTaxID("10092") dim(hsMtAnno) head(hsMtAnno) mtOrthAnno <- annotateTaxID(10090, orthologue=TRUE) dim(mtOrthAnno) head(mtOrthAnno) pigMultiOrthAnno <- annotateTaxID(9823, orthologue=TRUE, multiOrth=TRUE) dim(pigMultiOrthAnno) head(pigMultiOrthAnno) ## End(Not run)
Annotate UniProt accessions or names
annotateUniprotAccession(accessions, orthologue = FALSE, multiOrth = FALSE)annotateUniprotAccession(accessions, orthologue = FALSE, multiOrth = FALSE)
accessions |
Character strings, UniProt accessions or names |
orthologue |
Logical, whether orthologues are returned |
multiOrth |
Logical, only valid if |
## Not run: annotateUniprotAccession(c("B4E0K5")) ## End(Not run)## Not run: annotateUniprotAccession(c("B4E0K5")) ## End(Not run)
Append human orthologs to an existing annotation dataframe
appendHumanOrthologsWithNCBI(anno, multiOrth = FALSE)appendHumanOrthologsWithNCBI(anno, multiOrth = FALSE)
anno |
A |
multiOrth |
Logical, whether one row is allowed to map to multiple orthologues The function appends human orthologs to an existing annotation data.frame. It is usually called by another function. Please make sure of what you are doing if you call it directly. |
A data.frame with annotation and human orthologs appended.
The function does not sort the rows by GeneID. It is the responsibility of the calling function to do so.
## Not run: anno <- data.frame(GeneID=c(780, 1506, 114483548, 102129055, NA), TaxID=c(9606, 9606, 10116, 9541, NA)) appendHumanOrthologsWithNCBI(anno) tol_anno <- data.frame(GeneID=c(780, 1506, 114483548, 102129055, NA, "NotV"), TaxID=c(9606, 9606, 10116, 9541, NA, NA)) appendHumanOrthologsWithNCBI(tol_anno) ## End(Not run)## Not run: anno <- data.frame(GeneID=c(780, 1506, 114483548, 102129055, NA), TaxID=c(9606, 9606, 10116, 9541, NA)) appendHumanOrthologsWithNCBI(anno) tol_anno <- data.frame(GeneID=c(780, 1506, 114483548, 102129055, NA, "NotV"), TaxID=c(9606, 9606, 10116, 9541, NA, NA)) appendHumanOrthologsWithNCBI(tol_anno) ## End(Not run)
check single integer Tax ID
checkSingleIntegerTaxId(taxId)checkSingleIntegerTaxId(taxId)
taxId |
Integer tax identifier, or character that can be converted to an integer |
An integer tax ID if successful, otherwise the function stops and prints error
Common species taxonomy IDs
commonSpeciescommonSpecies
A data.frame containing three columns:
NCBI taxonomy ID
Scientific name
Common name
Connect to a MongoDB instance
connectMongoDB( instance = "bioinfo_read", collection = "ncbi_gene_info", verbose = FALSE )connectMongoDB( instance = "bioinfo_read", collection = "ncbi_gene_info", verbose = FALSE )
instance |
Character string, the MongoDB instance to connect to |
collection |
Character string, the collection to be used |
verbose |
Logical |
A pointer to a collection on the server, as returned by
mongo.
## Not run: giCon <- connectMongoDB(instance="bioinfo_read", collection="ncbi_gene_info") ## End(Not run)## Not run: giCon <- connectMongoDB(instance="bioinfo_read", collection="ncbi_gene_info") ## End(Not run)
Prepare a vector for SQL SELECT query with the IN syntax
formatIn(x)formatIn(x)
x |
A vector to be queried with the IN syntax |
A character string to be used after IN. See examples.
Jitao David Zhang <[email protected]>
myvec <- c("HH", "HM", "TH") formatIn(myvec) mysel <- "SELECT * FROM table WHERE city IN" paste(mysel,formatIn(myvec))myvec <- c("HH", "HM", "TH") formatIn(myvec) mysel <- "SELECT * FROM table WHERE city IN" paste(mysel,formatIn(myvec))
Get all taxonomy ID and scientific names offered by NCBI
getAllTaxIDs()getAllTaxIDs()
A data.frame containing two columns, TaxID and ScientificName.
## Not run: all_tax_ids <- getAllTaxIDs() ## End(Not run)## Not run: all_tax_ids <- getAllTaxIDs() ## End(Not run)
gti2bioc converts chip types from GTI array names into Bioconductor
names, and bioc2gti converts Bioconductor array names to GTI names.
If the array name is not valid or not found, NA will be returned.
gti2bioc(chipname)gti2bioc(chipname)
chipname |
Character vector, chip names (types). If missing, chip types supported by both GTI and Bioconductor will be printed, see details. |
The translation table gtibioc was compiled manually in December 2011.
When the parameter ‘chipname’ is missing, chip types supported by
both GTI and Bioconductor will be printed: gti2bioc returns a
character vector of the Bioconductor names, and bioc2gti returns such
a vector of the GTI names. Both vectors have the chip types in the other
system as names. See examples.
Chracter vector of the same length as the input
Jitao David Zhang <[email protected]>
bioc2gti("hgu133plus2") bioc2gti(c("hgu133plus2", "hgu95av2", "bad_array")) gti2bioc("HG_U95AV2") gti2bioc(c("HG_U95AV2", "CANINE", "HG_U95A")) ## supporting empty option bioc2gti() gti2bioc()bioc2gti("hgu133plus2") bioc2gti(c("hgu133plus2", "hgu95av2", "bad_array")) gti2bioc("HG_U95AV2") gti2bioc(c("HG_U95AV2", "CANINE", "HG_U95A")) ## supporting empty option bioc2gti() gti2bioc()
A data frame mapping GTI array names to Bioconductor array names.
gtibiocgtibioc
A data frame with columns:
GTI chip type name
Bioconductor chip type name
Compiled manually in December 2011.
Guess feature ID type by majority voting and annotate them
guessAndAnnotate( featureIDs, majority = 0.5, orthologue = FALSE, multiOrth = FALSE, taxId = 9606 )guessAndAnnotate( featureIDs, majority = 0.5, orthologue = FALSE, multiOrth = FALSE, taxId = 9606 )
featureIDs |
A vector of character strings. Other input types will be converted to character strings. |
majority |
Numeric value between 0 and 1. If the proportion of valid feature IDs in the input matching the pattern of a certain feature type exceeds this value, the function returns a character string representing the feature ID type. |
orthologue |
Logical, whether orthologue should be returned if the input features are not of human |
multiOrth |
Logical, in case multiple human orthologues are available, should they all be returned? |
taxId |
Integer, in case the input identifiers are gene
symbols, the user can specify the organism to be used with the NCBI taxonomy ID.
The option is passed to |
A data.frame, containing annotations of following ID types
GeneID
GeneSymbol
RefSeq
EnsemblGeneID
Ensembl
UniProt
Unknown
.
In case of Unknown, a data.frame with one column (FeatureName), containing input ids, is returned.
The difference between guessAndAnnotate and annotateAnyIDs is that the later does not assume that all IDs are of the same type.
## Not run: guessAndAnnotate(c("AKT1", "AKT2", "MAPK14")) guessAndAnnotate(c(1,2,14,149)) guessAndAnnotate(c("NM_000259", "NM_000331")) guessAndAnnotate(c("ENST00000613858.4", "ENST00000553916.5", "ENST00000399229.6")) guessAndAnnotate(c("O60583", "P05997", "Q7Z624")) guessAndAnnotate(c("CM000677.2", "AB003434.2")) ## End(Not run)## Not run: guessAndAnnotate(c("AKT1", "AKT2", "MAPK14")) guessAndAnnotate(c(1,2,14,149)) guessAndAnnotate(c("NM_000259", "NM_000331")) guessAndAnnotate(c("ENST00000613858.4", "ENST00000553916.5", "ENST00000399229.6")) guessAndAnnotate(c("O60583", "P05997", "Q7Z624")) guessAndAnnotate(c("CM000677.2", "AB003434.2")) ## End(Not run)
Guess feature ID type by majority voting
guessFeatureType(featureIDs, majority = 0.5)guessFeatureType(featureIDs, majority = 0.5)
featureIDs |
A vector of character strings. Other input types will be converted to character strings. |
majority |
Numeric value between 0 and 1. If the proportion of valid feature IDs in the input matching the pattern of a certain feature type exceeds this value, the function returns a character string representing the feature ID type. |
A character string, one of the following values:
GeneID
GeneSymbol
RefSeq
EnsemblGeneID
Ensembl
UniProt
Unknown
. The majority voting is done in the same order
guessFeatureType(c("AKT1", "AKT2", "MAPK14")) guessFeatureType(c(1,2,14,149)) guessFeatureType(c("NM_000259", "NM_000331")) guessFeatureType(c("ENST00000613858.4", "ENST00000553916.5", "ENST00000399229.6")) guessFeatureType(c("A2BC19", "P12345", "A0A023GPI8")) guessFeatureType(c("CM000677.2"))guessFeatureType(c("AKT1", "AKT2", "MAPK14")) guessFeatureType(c(1,2,14,149)) guessFeatureType(c("NM_000259", "NM_000331")) guessFeatureType(c("ENST00000613858.4", "ENST00000553916.5", "ENST00000399229.6")) guessFeatureType(c("A2BC19", "P12345", "A0A023GPI8")) guessFeatureType(c("CM000677.2"))
Retrieve human orthologs of genes of another species with its Taxonomy ID
humanOrthologsByTaxID(taxid)humanOrthologsByTaxID(taxid)
taxid |
An integer, a NCBI taxonomy ID to identify a species,
for instance |
A data.frame contains following columns:
GeneIDNCBI Gene ID of the query species
GeneSymbolNCBI Gene symbol of the query species
DescriptionGene description of the query species
HumanGeneIDNCBI Gene ID of the human homolog
HumanGeneSymbolNCBI Gene symbol of the human homolog
HumanDescriptionGene description of the human homolog
To query NCBI taxonomy IDs from free-text search, visit [NCBI Taxonomy Browser](https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi).
## Not run: ## human orthologs of rat genes ratOrths <- humanOrthologsByTaxID(10116) ## human orthologs of mouse genes mouseOrths <- humanOrthologsByTaxID(10090) ## human orthologs of cyno genes (crab-eating macaque, Macaca fascicularis) cynoOrths <- humanOrthologsByTaxID(9541) ## End(Not run)## Not run: ## human orthologs of rat genes ratOrths <- humanOrthologsByTaxID(10116) ## human orthologs of mouse genes mouseOrths <- humanOrthologsByTaxID(10090) ## human orthologs of cyno genes (crab-eating macaque, Macaca fascicularis) cynoOrths <- humanOrthologsByTaxID(9541) ## End(Not run)
Whether input character strings are valid feature IDs
isValidFeatureID(featureIDs)isValidFeatureID(featureIDs)
featureIDs |
A vector of character strings |
Logical vector of the same lenght as the input
Invalid feature IDs include NA, "-", and empty string.
Other features are deedm as valid.
featureIDs <- c("AMPK", "", "ACTB", "-") isValidFeatureID(featureIDs)featureIDs <- c("AMPK", "", "ACTB", "-") isValidFeatureID(featureIDs)
Whether input strings look like Entrez GeneIDs
likeGeneID(featureIDs) likeRefSeq(featureIDs) likeEnsembl(featureIDs) likeEnsemblGeneID(featureIDs) likeUniProt(featureIDs) likeGeneSymbol(featureIDs) likeHumanGeneSymbol(featureIDs)likeGeneID(featureIDs) likeRefSeq(featureIDs) likeEnsembl(featureIDs) likeEnsemblGeneID(featureIDs) likeUniProt(featureIDs) likeGeneSymbol(featureIDs) likeHumanGeneSymbol(featureIDs)
featureIDs |
Character strings. Input of other types are converted to them. |
A logical vector of the same length as input
likeRefSeq(): tests whether input strings look like NCBI RefSeq IDs
likeEnsembl(): tests whether input strings look like Ensembl IDs
likeEnsemblGeneID(): tests whether input strings look like EnsemblGeneIDs
likeUniProt(): tests whether input strings look like UniProt IDs
likeGeneSymbol(): tests whether input strings look like gene symbols
likeHumanGeneSymbol(): tests whether input strings look like human
gene symbols
Regular expression of UniProt accesion numbers is available at https://www.uniprot.org/help/accession_numbers. We requirea a whole-string match additionally
The HGNC guideline is available at https://www.genenames.org/about/guidelines/
feats <- c("1234", "LOX", "345", "-", "", "NKX-1", "CXorf21", "Snail", "A2BC19", "P12345", "A0A023GPI8", "NM_000259", "NM_000259.3", "ENSG00000197535", "ENST00000399231.7") likeGeneID(feats) likeGeneSymbol(feats) likeRefSeq(feats) likeEnsembl(feats) likeUniProt(feats) likeHumanGeneSymbol(feats)feats <- c("1234", "LOX", "345", "-", "", "NKX-1", "CXorf21", "Snail", "A2BC19", "P12345", "A0A023GPI8", "NM_000259", "NM_000259.3", "ENSG00000197535", "ENST00000399231.7") likeGeneID(feats) likeGeneSymbol(feats) likeRefSeq(feats) likeEnsembl(feats) likeUniProt(feats) likeHumanGeneSymbol(feats)
Get secrets for MongoDB connections
loadMongodbSecrets(file = locateSecretsFile(), instance = "bioinfo_read")loadMongodbSecrets(file = locateSecretsFile(), instance = "bioinfo_read")
file |
The secret JSON file. |
instance |
String, which must be found under the |
A list of the following items:
hostnameHostname of the MongoDB
portPort of the MongoDB
dbnameDatabase of the MongoDB
usernameUser name
passwordPassword
loadMongodbSecrets(instance="bioinfo_read") ## Not run: loadMongodbSecrets(instance="decoy") ## End(Not run)loadMongodbSecrets(instance="bioinfo_read") ## Not run: loadMongodbSecrets(instance="decoy") ## End(Not run)
Guess the majority members of a character string look like human gene symbols
majorityLikeHumanGeneSymbol(x, majority = 0.8)majorityLikeHumanGeneSymbol(x, majority = 0.8)
x |
A vector of character strings |
majority |
A numeric value between 0 and 1, the threshold of majority voting |
A logical value
TRUE is only returned if at least a proportion of majority
members look like human gene symbols
majorityLikeHumanGeneSymbol(c("AKT1", "AKT2", "MYOA")) # TRUE majorityLikeHumanGeneSymbol(c("Akt1", "Akt2", "Myoa")) # FALSE majorityLikeHumanGeneSymbol(c("AKT1", "Akt2", "MYOA"), majority=0.5) # TRUEmajorityLikeHumanGeneSymbol(c("AKT1", "AKT2", "MYOA")) # TRUE majorityLikeHumanGeneSymbol(c("Akt1", "Akt2", "Myoa")) # FALSE majorityLikeHumanGeneSymbol(c("AKT1", "Akt2", "MYOA"), majority=0.5) # TRUE
Remove version suffix from Ensembl IDs
removeEnsemblVersion(ensemblIDs)removeEnsemblVersion(ensemblIDs)
ensemblIDs |
A vector of character strings. Other types of inputs are converted. |
A character vector of the same length as input
ensemblIDs <- c("ENSG00000197535", "ENST00000399231.7", "ENSP00000418960.2") removeEnsemblVersion(ensemblIDs)ensemblIDs <- c("ENSG00000197535", "ENST00000399231.7", "ENSP00000418960.2") removeEnsemblVersion(ensemblIDs)
Construct a JSON string to indicate returned fields from a MongoDB query
returnFieldsJson(fields, include_id = FALSE)returnFieldsJson(fields, include_id = FALSE)
fields |
A vector of character strings that should be included |
include_id |
Logical, whether |
A JSON string that represents the fields to be returned
returnFieldsJson(c("name", "birthday")) returnFieldsJson(c("name", "birthday"), include_id=TRUE)returnFieldsJson(c("name", "birthday")) returnFieldsJson(c("name", "birthday"), include_id=TRUE)
ribiosAnnotation needs to access databases to fetch annotations, the process
of which requires credentials for these databases. The package looks for a
file in JSON format, either specified in environment variable
RIBIOS_ANNOTATION_SECRETS_JSON, or in the file
‘~/.credentials/ribiosAnnotation-secrets.json’, which
contains the credentials. If this file is not found, no queries can be made.
ribiosAnnotationSecretEnvVar locateSecretsFile(path)ribiosAnnotationSecretEnvVar locateSecretsFile(path)
path |
Path to the secret file. If not set, in case the environmental
variable |
An object of class character of length 1.
The function locates the file and returns the normalized path of the file.
String, the normalized path of the file
ribiosAnnotation secret file
ribiosAnnotationSecretFileribiosAnnotationSecretFile
An object of class character of length 1.
Sort the annotation table by query IDs
sortAnnotationByQuery(anno, ids, id_column = "GeneID", multi = FALSE)sortAnnotationByQuery(anno, ids, id_column = "GeneID", multi = FALSE)
anno |
A |
ids |
A vector of character or integer, identifiers used to query the annotation |
id_column |
Character, column of the data frame where ids can be found. |
multi |
In case that an identifier appears more than once in
|
A data.frame sorted by the query identifiers, with the
column id_column containing exactly the same value as ids.
If the identifiers are unique and if they do not contain NA, they are used as
the row names of the data.frame; otherwise, NULL will be used.
myAnno <- data.frame(GeneID=c(4,6,5), GeneName=c("Gene4", "Gene6", "Gene5")) inputIds <- c("6", "5", "6", "4", "NotAGeneID") sortAnnotationByQuery(myAnno, inputIds, "GeneID") myAnno2 <- data.frame(GeneID=c(4,6,5, 5), GeneName=c("Gene4", "Gene6", "Gene5", "Gene5.V2")) inputIds <- c("6", "5", "6", "4", "NotAGeneID") sortAnnotationByQuery(myAnno2, inputIds, "GeneID") sortAnnotationByQuery(myAnno2, inputIds, "GeneID", multi=TRUE)myAnno <- data.frame(GeneID=c(4,6,5), GeneName=c("Gene4", "Gene6", "Gene5")) inputIds <- c("6", "5", "6", "4", "NotAGeneID") sortAnnotationByQuery(myAnno, inputIds, "GeneID") myAnno2 <- data.frame(GeneID=c(4,6,5, 5), GeneName=c("Gene4", "Gene6", "Gene5", "Gene5.V2")) inputIds <- c("6", "5", "6", "4", "NotAGeneID") sortAnnotationByQuery(myAnno2, inputIds, "GeneID") sortAnnotationByQuery(myAnno2, inputIds, "GeneID", multi=TRUE)
Get Uniprot annotation with NCBI Taxonomy ID
uniprotByTaxID(taxid, orthologue = FALSE, multiOrth = FALSE)uniprotByTaxID(taxid, orthologue = FALSE, multiOrth = FALSE)
taxid |
NCBI Taxonomy ID |
orthologue |
Logical, whether human orthologues should be appended to the annotation |
multiOrth |
Logical, whether to return all orthologues or the (randomly)
selected top one if multiple exist. Only valid when |
A data.frame with UniProt accessions and gene annotations.
* annotateUniprotAccession, which annotates Uniprot accessions
* annotateTaxID, which annotates genes given TaxID.
## Not run: humanUniprot <- uniprotByTaxID(9606) ## End(Not run)## Not run: humanUniprot <- uniprotByTaxID(9606) ## End(Not run)
Return valid features in a vector
validFeatureIDs(featureIDs)validFeatureIDs(featureIDs)
featureIDs |
A vector of character strings |
A filtered vector containing only valid feature IDs.
Factor input will remain factors as output, but with invalid levels dropped. The output class will remain the same in case of integer or character input.
featureIDs <- c("AMPK", "", "ACTB", "-") validFeatureIDs(featureIDs)featureIDs <- c("AMPK", "", "ACTB", "-") validFeatureIDs(featureIDs)