2.A: Enrichr & rbioapi
Moosa Rezwani
2024-03-30
Source:vignettes/rbioapi_enrichr.Rmd
rbioapi_enrichr.Rmd
Gene set library concept in Enrichr
Directly quoting from Enrichr’s help page:
A gene set library is a set of related gene sets or enrichment terms […] These libraries have been constructed from many sources such as published studies and major biological and biomedical online databases. Others have been created for and only available through Enrichr.
To get a list of the available libraries in Enrichr, use:
enrichr_libs <- rba_enrichr_libs()
In the returned data frame, you can find the names of available Enrichr libraries in “libraryName” column. As you will see in the following sections, you can use these names to request an enrichment analysis based on the selected library or libraries.
Enrichment analysis using Enrichr
To perform enrichment analysis on your gene-set with Enrichr using rbioapi, you can take two approaches. We will begin with the simple one. But first, we create a vector of genes’ NCBI IDs to use as the input example in this article.
# Create a vector with our genes' NCBI IDs
genes <- c("p53", "BRCA1", "cdk2", "Q99835", "CDC42","CDK1","KIF23","PLK1",
"RAC2","RACGAP1","RHOA","RHOB", "PHF14", "RBM3", "MSL1")
Approach 1: Using the one-step Wrapper function
The only required input for this function is to simply supply your
gene-set as a character vector. Optionally you can also select one or
more libraries. Please see rba_enrichr()
function’s manual
for more details on the arguments.
# Request the enrichment analysis
results_all <- rba_enrichr(gene_list = genes)
Note that the default value for the argument
gene_set_library
in the rba_enrichr function is “all”. This
means that if you call the function as above, all of the Enrichr
libraries will be used for the enrichment analysis of your uploaded gene
list. In this case, you will have a named list, where each of its
elements is a dataframe containing your genes’ analysis results using
that Enrichr library.
Alternatively, you can use the gene_set_library
argument
to specify the library (or libraries) to use. Here we demonstrate using
“MSigDB_Hallmark_2020” library:
# Request the enrichment analysis by a specific library
results_msig_hallmark <- rba_enrichr(gene_list = genes,
gene_set_library = "MSigDB_Hallmark_2020")
When supplying the gene_set_library
argument, rbioapi
assumes you are entering a regex pattern. You can disable this by
setting regex_library_name
to FALSE
. However,
this feature is useful if you need -for example- partial matches in the
library names. Suppose you want to perform the enrichment analysis on
every library available in Enrichr that contains the name “MSig”. You
can do the following:
# Request the enrichment analysis
results_msig <- rba_enrichr(gene_list = genes,
gene_set_library = "msig",
regex_library_name = TRUE)
# You can drop `regex_library_name = TRUE`, as it is TRUE by default.
Note that when only one Enrichr library is selected, a data frame with enrichment analysis result will be returned.
str(results_msig_hallmark)
#> 'data.frame': 18 obs. of 9 variables:
#> $ Term : chr "Mitotic Spindle" "G2-M Checkpoint" "E2F Targets" "Apoptosis" ...
#> $ Overlap : chr "5/199" "4/200" "4/200" "3/161" ...
#> $ P.value : num 2.57e-07 1.22e-05 1.22e-05 2.17e-04 2.74e-03 ...
#> $ Adjusted.P.value : num 4.62e-06 7.29e-05 7.29e-05 9.76e-04 9.87e-03 ...
#> $ Old.P.value : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ Old.Adjusted.P.value: int 0 0 0 0 0 0 0 0 0 0 ...
#> $ Odds.Ratio : num 51 36.7 36.7 31.4 29.7 ...
#> $ Combined.Score : num 774 416 416 265 175 ...
#> $ Genes : chr "CDC42;RACGAP1;PLK1;CDK1;KIF23" "RACGAP1;PLK1;CDK1;KIF23" "RACGAP1;PLK1;CDK1;BRCA1" "CDK2;BRCA1;RHOB" ...
But when multiple libraries have been selected, the function’s output will be a list where each element is a data frame corresponding to one of the selected libraries.
str(results_msig, 1)
#> List of 3
#> $ MSigDB_Computational :'data.frame': 195 obs. of 9 variables:
#> $ MSigDB_Oncogenic_Signatures:'data.frame': 26 obs. of 9 variables:
#> $ MSigDB_Hallmark_2020 :'data.frame': 18 obs. of 9 variables:
Approach 2: Going step-by-step
rba_enrichr()
is a wrapper function. It internally
executes a sequence of functions necessary to run your analysis.
Alternatively, you could go step by step. We demonstrate these steps in
this section.
First, you need to retrieve the list of available Enrichr libraries. This step is optional. You can skip it if you already know the name of your desired libraries or if you want to run the analysis over every available library.
# Get a list of available Enrichr libraries
libs <- rba_enrichr_libs(store_in_options = TRUE)
Now, you need to upload your genes list to Enrichr. By this, an identifier will be assigned to your submitted list, which is needed for the next step.
# Submit your gene-set to enrichr
list_id <- rba_enrichr_add_list(gene_list = genes)
From the returned response, we need the numeric ID in the “userListId” element.
str(list_id)
#> List of 2
#> $ shortId : chr "cc214215133b306c3e355665fed0618b"
#> $ userListId: int 71014921
Finally, we are ready to submit the enrichment analysis request to
Enrichr. Same as explained above for the wrapper function
rba_enrichr()
, we can supply the “gene_set_library”
argument in different ways. Here we will only select the
“Table_Mining_of_CRISPR_Studies” library:
# Request the analysis
results_crispr <- rba_enrichr_enrich(user_list_id = list_id$userListId,
gene_set_library = "Table_Mining_of_CRISPR_Studies")
Working with Other Species
Enrichr also provides libraries for model organisms. The following
functions have an organism
argument that allows you to
perform the analysis on species other than humans:
The available options for the organism argument are human”, (H. sapiens & M. musculus), fly” (D. melanogaster), “yeast” (S. cerevisiae), “worm” (C. elegans), and “fish” (D. rerio).
See also in Functions’ manuals
Some rbioapi Enrichr functions were not covered in this vignette, be sure to check their manuals:
How to Cite?
To cite Enrichr (Please see https://maayanlab.cloud/Enrichr/help#terms):
Chen, E.Y., Tan, C.M., Kou, Y. et al. Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. Bioinformatics 14, 128 (2013). https://doi.org/10.1186/1471-2105-14-128
Maxim V. Kuleshov, Matthew R. Jones, Andrew D. Rouillard, Nicolas F. Fernandez, Qiaonan Duan, Zichen Wang, Simon Koplev, Sherry L. Jenkins, Kathleen M. Jagodnik, Alexander Lachmann, Michael G. McDermott, Caroline D. Monteiro, Gregory W. Gundersen, Avi Ma’ayan, Enrichr: a comprehensive gene set enrichment analysis web server 2016 update, Nucleic Acids Research, Volume 44, Issue W1, 8 July 2016, Pages W90–W97, https://doi.org/10.1093/nar/gkw377
Xie, Z., Bailey, A., Kuleshov, M. V., Clarke, D. J. B., Evangelista, J. E., Jenkins, S. L., Lachmann, A., Wojciechowicz, M. L., Kropiwnicki, E., Jagodnik, K. M., Jeon, M., & Ma’ayan, A. (2021). Gene set knowledge discovery with Enrichr. Current Protocols, 1, e90. doi: 10.1002/cpz1.90
To cite rbioapi:
- Moosa Rezwani, Ali Akbar Pourfathollah, Farshid Noorbakhsh, rbioapi: user-friendly R interface to biologic web services’ API, Bioinformatics, Volume 38, Issue 10, 15 May 2022, Pages 2952–2953, https://doi.org/10.1093/bioinformatics/btac172
Over-representation analysis Using Other Services
Other services supported by rbioapi also provide Over-representation analysis tools. Please see the vignette article Do with rbioapi: Over-Representation (Enrichment) Analysis in R (link to the documentation site) for an in-depth review.
Session info
#> R version 4.3.3 (2024-02-29)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 22.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
#>
#> locale:
#> [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
#> [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
#> [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
#> [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] rbioapi_0.8.0
#>
#> loaded via a namespace (and not attached):
#> [1] vctrs_0.6.5 httr_1.4.7 cli_3.6.2 knitr_1.45
#> [5] rlang_1.1.3 xfun_0.43 purrr_1.0.2 textshaping_0.3.7
#> [9] jsonlite_1.8.8 DT_0.32 htmltools_0.5.8 ragg_1.3.0
#> [13] sass_0.4.9 rmarkdown_2.26 crosstalk_1.2.1 evaluate_0.23
#> [17] jquerylib_0.1.4 fastmap_1.1.1 yaml_2.3.8 lifecycle_1.0.4
#> [21] memoise_2.0.1 compiler_4.3.3 fs_1.6.3 htmlwidgets_1.6.4
#> [25] systemfonts_1.0.6 digest_0.6.35 R6_2.5.1 curl_5.2.1
#> [29] magrittr_2.0.3 bslib_0.7.0 tools_4.3.3 pkgdown_2.0.7
#> [33] cachem_1.0.8 desc_1.4.3