Working with the geneset management system GeMS

Working with Geneset Management System (GeMS)

Background

The Geneset Managment System (GeMS, link visible within Roche), as its name suggests, is a system to manage genesets.

This document shows basic operations provided to work with the sytsem from R console.

Note that the evaluation only takes place when GeMS is reachable, because otherwise the compilation of this vignette will fail when the webservice is not available.

Examples

Task 1: show genesets provided by a user

The following command fetches genesets associated with a given user, test in this case. If not parameter is given, the genesets of the current user is provided.

getUserSetsFromGeMS(user="test")

Task 2: insert new genesets to GeMS

New genesets can be inserted into GeMS, using the GmtList data structure defined in the BioQC package, which is essentially a list of lists, which in turn contains three elements: name, desc, and genes. See example below.

testGmt <- BioQC::GmtList(list(
  list(name="Test gene set 1", 
       desc="Test", genes=c("CXCL13", "TNFRSF1B", "RGS2")),
  list(name="Test gene set 2", desc="Test", 
       genes=c("ACADSB", "AFTPH", "ARL1"))
))
insertGmtListToGeMS(testGmt, geneFormat=0, source="Test", xref="PMID:30397336", user="test")

The return value indicates the insertion was successful.

Now we check the genesets of the user test, we expect to see the two gene sets that we inserted.

getUserSetsFromGeMS(user="test")

Alternative to manual specificiation, GmtList can be constructed by reading a valid GMT file with the function readGmt in BioQC.

Task 3: Remove a geneset

The following example shows how a geneset can be removed

removeFromGeMS(setName=c("Test gene set 1", "Test gene set 2"),
               source="Test", user="test")

Now we check the genesets of the user test, we expect to see the two gene sets that we previously inserted should be gone.

getUserSetsFromGeMS(user="test")

Task 4: Retrieve gene-sets with names

The following example shows how genesest can be retrieved by their names. The example was kindly provided by Martha Serrano.

Warning: As shown in the example above, it is very easy to alter genesets in the database. Therefore always make gene set snapshots, never read live data from the database.

gmt_test1_filename <- "data/test1.gmt"
if (file.exists(gmt_test1_filename)) {
  gmt_test1 <- readGmt(gmt_test1_filename)
} else {
  gmt_test1 <- getSetsWithNamesFromGeMS(c("Plasma_sc", "Bcell_l_Danaher17"))
  writeGmt(gmt_test1, gmt_test1_filename)
}
gmt_test1

A caveat of using this function is that it envokes an API call for each geneset. If you need to extract multiple gene-sets, use getSetsWithPropertyFromGeMS (see below).

Task 5: Retrieve gene-sets with a certain property

The following example shows how genesest can be retrieved by matching properties, in this case all cell-marker gene-sets provided by the BESCA package.

Again, we are using GMT snapshots for reproducibility.

besca_markers_filename <- "data/besca_markers.gmt"
if (file.exists(besca_markers_filename)) {
  besca_markers <- readGmt(besca_markers_filename)
} else {
  besca_markers <- getSetsWithPropertyFromGeMS("meta.geneset", "besca_markers")
  writeGmt(besca_markers, besca_markers_filename)
}
print(besca_markers)

Conclusions

This is a very brief introduction of how to working with GeMS within R. Make sure to make snapshots instead of using live data from the database. In case of questions or suggestions, please reach out to Jitao David Zhang or Laura Badi.

Session information

sessionInfo()