--- title: "Working with the geneset management system GeMS" author: "Jitao David Zhang" date: '`r format(Sys.time(), "%d %B, %Y")`' output: rmarkdown::html_vignette: fig_caption: yes html_document: df_print: paged theme: spacelab mathjax: default code_folding: hide toc: true toc_depth: 3 number_sections: true toc_float: collapsed: false smooth_scroll: false pdf_document: number_sections: yes toc: yes toc_depth: 3 editor_options: chunk_output_type: inline params: echo: yes relative: FALSE vignette: > %\VignetteIndexEntry{Working with the geneset management system GeMS} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} \usepackage[utf8]{inputenc} --- ```{r setup, include=FALSE, message=FALSE, warning=FALSE} library(ribiosGSEA) library(BioQC) doEval <- FALSE ## ribiosGSEA::isGeMSReachable() knitr::opts_chunk$set(echo = TRUE, message=FALSE, warning=FALSE, eval = doEval, fig.height=6, fig.width=6) ``` # Working with Geneset Management System (GeMS) ## Background The Geneset Managment System ([GeMS, link visible within Roche](https://rochewiki.roche.com/confluence/display/BEDA/Gene+sets+and+signatures+-+GeMS)), as its name suggests, is a system to manage genesets. This document shows basic operations provided to work with the sytsem from R console. Note that the evaluation only takes place when GeMS is reachable, because otherwise the compilation of this vignette will fail when the webservice is not available. ## Examples ### Task 1: show genesets provided by a user The following command fetches genesets associated with a given user, *test* in this case. If not parameter is given, the genesets of the current user is provided. ```{r getGeneSets} getUserSetsFromGeMS(user="test") ``` ### Task 2: insert new genesets to GeMS New genesets can be inserted into GeMS, using the *GmtList* data structure defined in the *BioQC* package, which is essentially a list of lists, which in turn contains three elements: *name*, *desc*, and *genes*. See example below. ```{r insertGeneSets} testGmt <- BioQC::GmtList(list( list(name="Test gene set 1", desc="Test", genes=c("CXCL13", "TNFRSF1B", "RGS2")), list(name="Test gene set 2", desc="Test", genes=c("ACADSB", "AFTPH", "ARL1")) )) insertGmtListToGeMS(testGmt, geneFormat=0, source="Test", xref="PMID:30397336", user="test") ``` The return value \code{200} indicates the insertion was successful. Now we check the genesets of the user *test*, we expect to see the two gene sets that we inserted. ```{r getGeneSetsAfterInsert} getUserSetsFromGeMS(user="test") ``` Alternative to manual specificiation, *GmtList* can be constructed by reading a valid GMT file with the function *readGmt* in *BioQC*. ### Task 3: Remove a geneset The following example shows how a geneset can be removed ```{r removeGeneSet} removeFromGeMS(setName=c("Test gene set 1", "Test gene set 2"), source="Test", user="test") ``` Now we check the genesets of the user *test*, we expect to see the two gene sets that we previously inserted should be gone. ```{r getGeneSetsAfterRemove} getUserSetsFromGeMS(user="test") ``` ### Task 4: Retrieve gene-sets with names The following example shows how genesest can be retrieved by their names. The example was kindly provided by Martha Serrano. **Warning**: As shown in the example above, it is very easy to alter genesets in the database. Therefore always make gene set snapshots, never read live data from the database. ```{r include=FALSE} dir.create("data/", showWarnings = FALSE) ``` ```{r getSetsWithNames} gmt_test1_filename <- "data/test1.gmt" if (file.exists(gmt_test1_filename)) { gmt_test1 <- readGmt(gmt_test1_filename) } else { gmt_test1 <- getSetsWithNamesFromGeMS(c("Plasma_sc", "Bcell_l_Danaher17")) writeGmt(gmt_test1, gmt_test1_filename) } gmt_test1 ``` A caveat of using this function is that it envokes an API call for each geneset. If you need to extract multiple gene-sets, use getSetsWithPropertyFromGeMS (see below). ### Task 5: Retrieve gene-sets with a certain property The following example shows how genesest can be retrieved by matching properties, in this case all cell-marker gene-sets provided by the BESCA package. Again, we are using GMT snapshots for reproducibility. ```{r} besca_markers_filename <- "data/besca_markers.gmt" if (file.exists(besca_markers_filename)) { besca_markers <- readGmt(besca_markers_filename) } else { besca_markers <- getSetsWithPropertyFromGeMS("meta.geneset", "besca_markers") writeGmt(besca_markers, besca_markers_filename) } print(besca_markers) ``` ```{r include=FALSE} # cleanup unlink("data", recursive = TRUE) ``` ## Conclusions This is a very brief introduction of how to working with GeMS within R. Make sure to make snapshots instead of using live data from the database. In case of questions or suggestions, please reach out to [Jitao David Zhang](mailto:jitao_david.zhang@roche.com) or [Laura Badi](mailto:laura.badi@roche.com). # Session information ```{r} sessionInfo() ```