2.E: Reactome & rbioapi
Moosa Rezwani
2024-03-30
Source:vignettes/rbioapi_reactome.Rmd
rbioapi_reactome.Rmd
Introduction
Directly quoting from Reactome:
REACTOME is an open-source, open access, manually curated and peer-reviewed pathway database. Our goal is to provide intuitive bioinformatics tools for the visualization, interpretation and analysis of pathway knowledge to support basic and clinical research, genome analysis, modeling, systems biology and education. Founded in 2003, the Reactome project is led by Lincoln Stein of OICR, Peter D’Eustachio of NYULMC, Henning Hermjakob of EMBL-EBI, and Guanming Wu of OHSU.
(source: https://reactome.org/what-is-reactome)
Reactome provides two RESTful API services: Reactome content services and Reactome analysis services. In rbioapi, the naming schema is that any function which belongs to analysis services starts with rba_reactome_analysis* . Other rba_reactome_* functions without the ‘analysis’ infix correspond to content services API.
Before continuing reading this article, it is a good idea to read Reactome Data Model page.
Reactome analysis services
This section mostly revolves around
rba_reactome_analysis()
function. So, naturally, we will
start with that. As explained in the function’s manual, you have
considerable freedom in providing the main input for this function; You
can supply an R object (as a data frame, matrix, or simple vector), a
URL, or a local file path. Note that the type of analysis will be
decided based on whether your input is 1-dimensional or 2-dimensional.
This has been explained in detail in the manual of
rba_reactome_analysis()
, see that for more
information.rba_reactome_analysis()
is the API equivalent of Reactome’s
analyse gene
list tool. You can see that the function’s arguments correspond to
what would you choose in the webpage’s wizard.
## 1 We create a simple vector with our genes
genes <- c("p53", "BRCA1", "cdk2", "Q99835", "CDC42", "CDK1", "KIF23", "PLK1", "RAC2", "RACGAP1", "RHOA", "RHOB", "MSL1", "PHF21A", "INSR", "JADE2", "P2RX7", "CCDC101", "PPM1B", "ANAPC16", "CDH8", "HSPA1L", "CUL2", "ZNF302", "CUX1", "CYTH2", "SEC22C", "EIF4E3", "ROBO2", "CXXC1", "LINC01314", "ATP5F1")
## 2 We call reactome analysis with the default parameters
analyzed <- rba_reactome_analysis(input = genes,
projection = TRUE,
p_value = 0.01)
## 3 As always, we use str() to inspect the resutls
str(analyzed, 1)
#> List of 8
#> $ summary :List of 7
#> $ expression :List of 1
#> $ identifiersNotFound: int 1
#> $ pathwaysFound : int 79
#> $ pathways :'data.frame': 79 obs. of 19 variables:
#> $ resourceSummary :'data.frame': 3 obs. of 3 variables:
#> $ speciesSummary :'data.frame': 1 obs. of 5 variables:
#> $ warnings : list()
## 4 Note that in the summary element: (analyzed$summary)
### 4.a because we supplied a simple vector, the analysis type was: over-representation
### 4.b You need the token for other rba_reactome_analysis_* functions
## 5 Analsis results are in the pathways data frame:
As mentioned, some of rba_reactome_analysis()
’s
arguments correspond to the wizard of analyse gene
list tool; Other arguments corresponds to the contents of “Filter
your results” tab in the results page.
Having the analysis’s token, you can retrieve the analysis results in
many formats using rba_reactome_analysis_pdf()
and
rba_reactome_analysis_download()
:
# download a full pdf report
rba_reactome_analysis_pdf(token = analyzed$summary$token,
species = 9606)
# download the result in compressed json.gz format
rba_reactome_analysis_download(token = analyzed$summary$token,
request = "results",
save_to = "reactome_results.json")
Your token is only guaranteed to be stored for 7 days. After that,
you can upload the JSON file you have downloaded using
rba_reactome_analysis_download
and get a token for
that:
re_uploaded <- rba_reactome_analysis_import(input = "reactome_results.json")
Please Note: Other services supported by rbioapi also provide Over-representation analysis tools. Please see the vignette article Do with rbioapi: Over-Representation (Enrichment) Analysis in R (link to the documentation site) for an in-depth review.
Reactome contents services
rbioapi functions that correspond to Reactome content services are those starting with rba_reactome_* but without “_analysis” infix. These functions cover what you can do with objects in Reactome knowledge-base. In simpler terms, most -but not all of them- correspond to what you can find in Reactome Pathway Browser and search results. (e.g. a pathway, a reaction, a physical Entity, etc.)
Retrieve any object from Reactome knowledge-base
Using rba_reactome_query()
, you can retrieve any object
from Reactome knowledge-base. In simpler terms, what I mean by the
object is roughly anything that Reactome associated an ID to it. This
can range from a person’s entry to proteins, reactions, pathways,
species, and many more! You can explore Reactome’s data schema to
learn about Reactome knowledge-base objects and their organization. Here
are some examples, note that you are not limited to only one ID per
query. You can use a vector of inputs, the only limitation is that when
you supply more than one ID, you cannot have
enhanced = TRUE
.
## 1 query a pathway Entry
pathway <- rba_reactome_query(ids = "R-HSA-109581", enhanced = TRUE)
## 2 As always we use str() to inspect the output's structure
str(pathway, 2)
#> List of 26
#> $ dbId : int 109581
#> $ displayName : chr "Apoptosis"
#> $ stId : chr "R-HSA-109581"
#> $ stIdVersion : chr "R-HSA-109581.6"
#> $ created :List of 5
#> ..$ dbId : int 109608
#> ..$ displayName: chr "Alnemri, E, Hengartner, Michael, Tschopp, Jürg, Tsujimoto, Yoshihide, Hardwick, JM, 2004-01-16"
#> ..$ dateTime : chr "2004-01-16 21:01:51"
#> ..$ className : chr "InstanceEdit"
#> ..$ schemaClass: chr "InstanceEdit"
#> $ modified :List of 6
#> ..$ dbId : int 10931649
#> ..$ displayName: chr "Wright, Adam, 2024-03-08"
#> ..$ dateTime : chr "2024-03-08 03:53:59"
#> ..$ note : chr "Inserted by org.reactome.orthoinference"
#> ..$ className : chr "InstanceEdit"
#> ..$ schemaClass: chr "InstanceEdit"
#> $ isInDisease : logi FALSE
#> $ isInferred : logi FALSE
#> $ name :List of 1
#> ..$ : chr "Apoptosis"
#> $ releaseDate : chr "2004-09-20"
#> $ speciesName : chr "Homo sapiens"
#> $ authored :List of 1
#> ..$ : int 109608
#> $ edited :List of 1
#> ..$ :List of 5
#> $ figure :List of 1
#> ..$ :List of 5
#> $ goBiologicalProcess:List of 9
#> ..$ dbId : int 2273
#> ..$ displayName : chr "apoptotic process"
#> ..$ accession : chr "0006915"
#> ..$ databaseName: chr "GO"
#> ..$ definition : chr "A programmed cell death process which begins when a cell receives an internal (e.g. DNA damage) or external sig"| __truncated__
#> ..$ name : chr "apoptotic process"
#> ..$ url : chr "https://www.ebi.ac.uk/QuickGO/term/GO:0006915"
#> ..$ className : chr "GO_BiologicalProcess"
#> ..$ schemaClass : chr "GO_BiologicalProcess"
#> $ literatureReference:List of 7
#> ..$ :List of 11
#> ..$ :List of 11
#> ..$ :List of 11
#> ..$ :List of 11
#> ..$ :List of 11
#> ..$ :List of 11
#> ..$ :List of 11
#> $ orthologousEvent :List of 14
#> ..$ :List of 15
#> ..$ :List of 15
#> ..$ :List of 15
#> ..$ :List of 15
#> ..$ :List of 15
#> ..$ :List of 15
#> ..$ :List of 15
#> ..$ :List of 15
#> ..$ :List of 15
#> ..$ :List of 15
#> ..$ :List of 15
#> ..$ :List of 15
#> ..$ :List of 15
#> ..$ :List of 15
#> $ reviewed :List of 1
#> ..$ :List of 5
#> $ species :List of 1
#> ..$ :List of 8
#> $ summation :List of 1
#> ..$ :List of 5
#> $ reviewStatus :List of 6
#> ..$ dbId : int 9821382
#> ..$ displayName: chr "five stars"
#> ..$ definition : chr "externally reviewed"
#> ..$ name :List of 1
#> ..$ className : chr "ReviewStatus"
#> ..$ schemaClass: chr "ReviewStatus"
#> $ hasDiagram : logi TRUE
#> $ hasEHLD : logi TRUE
#> $ hasEvent :List of 4
#> ..$ :List of 15
#> ..$ :List of 16
#> ..$ :List of 16
#> ..$ :List of 15
#> $ className : chr "Pathway"
#> $ schemaClass : chr "Pathway"
## 3 You can compare it with the webpage of R-HSA-202939 entry:
# https://reactome.org/content/detail/R-HSA-202939
## 1 query a protein Entry
protein <- rba_reactome_query(ids = 66247, enhanced = TRUE)
## 2 As always we use str() to inspect the output's structure
str(protein, 1)
#> List of 27
#> $ dbId : int 66247
#> $ displayName : chr "UniProt:P25942-1 CD40"
#> $ modified :List of 6
#> $ databaseName : chr "UniProt"
#> $ identifier : chr "P25942"
#> $ name :List of 1
#> $ otherIdentifier :List of 108
#> $ url : chr "https://purl.uniprot.org/uniprot/P25942-1"
#> $ crossReference :List of 29
#> $ referenceDatabase :List of 8
#> $ physicalEntity :List of 1
#> $ checksum : chr "BC8776EC2C4A5680"
#> $ comment :List of 1
#> $ description :List of 1
#> $ geneName :List of 2
#> $ isSequenceChanged : logi FALSE
#> $ keyword :List of 16
#> $ secondaryIdentifier:List of 8
#> $ sequenceLength : int 277
#> $ species : int 48887
#> $ chain :List of 2
#> $ referenceGene :List of 12
#> $ referenceTranscript:List of 4
#> $ variantIdentifier : chr "P25942-1"
#> $ isoformParent :List of 1
#> $ className : chr "ReferenceIsoform"
#> $ schemaClass : chr "ReferenceIsoform"
## 3 You can compare it with the webpage of R-HSA-202939 entry:
# https://reactome.org/content/detail/R-HSA-202939
Find Cross-Reference IDs in Reactome
As you can see in the second example usage of we used Reactome’s dbID
66247
to query CD40 protein. How did we obtain that in the
first place? You can use rba_reactome_xref
to map any
cross-reference (external) IDs to Reactome IDs.
## 1 We Supply HGNC ID to find what is the corresponding database ID in Reactome
xref_protein <- rba_reactome_xref("CD40")
## 2 As always use str() to inspect the output's structure
str(xref_protein, 1)
#> List of 19
#> $ dbId : int 66247
#> $ displayName : chr "UniProt:P25942-1 CD40"
#> $ databaseName : chr "UniProt"
#> $ identifier : chr "P25942"
#> $ name :List of 1
#> $ otherIdentifier :List of 1
#> $ url : chr "https://purl.uniprot.org/uniprot/P25942-1"
#> $ checksum : chr "BC8776EC2C4A5680"
#> $ comment :List of 1
#> $ description :List of 1
#> $ geneName :List of 1
#> $ isSequenceChanged : logi FALSE
#> $ keyword :List of 1
#> $ secondaryIdentifier:List of 1
#> $ sequenceLength : int 277
#> $ chain :List of 1
#> $ variantIdentifier : chr "P25942-1"
#> $ className : chr "ReferenceIsoform"
#> $ schemaClass : chr "ReferenceIsoform"
Map Cross-Reference IDs to Reactome
While we are at the cross-reference topic, here is another useful
resource. Using rba_reactome_mapping
you can find the
Reactome pathways or reactions which include your external ID:
## 1 Again, consider CD40 protein:
xref_mapping <- rba_reactome_mapping(id = "CD40",
resource = "hgnc",
map_to = "pathways")
See also in Functions’ manuals
There are still more rbioapi f Reactome content functions that were not covered in this vignette. Here is a brief overview, see the functions’ manual for detailed guides and examples.
Retrieve Reactome Database information
rba_reactome_version()
: Return current Reactome versionrba_reactome_diseases()
: Retrieve a list of disease annotated in Reactome.rba_reactome_species()
: Retrieve a list of species annotated in Reactome.
Things you can do with a Entities
reactome_complex_list()
: Get a list of complexes that have your molecule in them.rba_reactome_complex_subunits()
: Get the list of subunits in your complexrba_reactome_participant_of()
: Get a list of Reactome sets and complexes that your entity (event, molecule, reaction, pathway etc.) is a participant in them.
Things you can do with Events
rba_reactome_event_hierarchy()
: Retrieve full event hierarchy of an species.
How to Cite?
To cite Reactome (Please see https://reactome.org/cite):
- Marc Gillespie, Bijay Jassal, Ralf Stephan, Marija Milacic, Karen Rothfels, Andrea Senff-Ribeiro, Johannes Griss, Cristoffer Sevilla, Lisa Matthews, Chuqiao Gong, Chuan Deng, Thawfeek Varusai, Eliot Ragueneau, Yusra Haider, Bruce May, Veronica Shamovsky, Joel Weiser, Timothy Brunson, Nasim Sanati, Liam Beckman, Xiang Shao, Antonio Fabregat, Konstantinos Sidiropoulos, Julieth Murillo, Guilherme Viteri, Justin Cook, Solomon Shorser, Gary Bader, Emek Demir, Chris Sander, Robin Haw, Guanming Wu, Lincoln Stein, Henning Hermjakob, Peter D’Eustachio, The reactome pathway knowledgebase 2022, Nucleic Acids Research, 2021;, kab1028, https://doi.org/10.1093/nar/gkab1028
- Griss J, Viteri G, Sidiropoulos K, Nguyen V, Fabregat A, Hermjakob H. ReactomeGSA - Efficient Multi-Omics Comparative Pathway Analysis. Mol Cell Proteomics. 2020 Sep 9. doi: 10.1074/mcp. PubMed PMID: 32907876.
- Fabregat A, Korninger F, Viteri G, Sidiropoulos K, Marin-Garcia P, Ping P, Wu G, Stein L, D’Eustachio P, Hermjakob H. Reactome graph database: Efficient access to complex pathway data. PLoS Comput Biol. 2018 Jan 29;14(1):e1005968. doi: 10.1371/journal.pcbi.1005968. eCollection 2018 Jan. PubMed PMID: 29377902.
- Fabregat A, Sidiropoulos K, Viteri G, Marin-Garcia P, Ping P, Stein L, D’Eustachio P, Hermjakob H. Reactome diagram viewer: data structures and strategies to boost performance. Bioinformatics. 2018 Apr 1;34(7):1208-1214. doi: 10.1093/bioinformatics/btx752. PubMed PMID: 29186351.
- Fabregat A, Sidiropoulos K, Viteri G, Forner O, Marin-Garcia P, Arnau V, D’Eustachio P, Stein L, Hermjakob H. Reactome pathway analysis: a high-performance in-memory approach. BMC Bioinformatics. 2017 Mar 2;18(1):142. doi: 10.1186/s12859-017-1559-2. PubMed PMID: 28249561.
- Wu G, Haw R. Functional Interaction Network Construction and Analysis for Disease Discovery. Methods Mol Biol. 2017;1558:235-253. doi: 10.1007/978-1-4939-6783-4_11. PubMed PMID: 28150241.
To cite rbioapi:
- Moosa Rezwani, Ali Akbar Pourfathollah, Farshid Noorbakhsh, rbioapi: user-friendly R interface to biologic web services’ API, Bioinformatics, Volume 38, Issue 10, 15 May 2022, Pages 2952–2953, https://doi.org/10.1093/bioinformatics/btac172
Session info
#> R version 4.3.3 (2024-02-29)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 22.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
#>
#> locale:
#> [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
#> [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
#> [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
#> [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] rbioapi_0.8.0
#>
#> loaded via a namespace (and not attached):
#> [1] vctrs_0.6.5 httr_1.4.7 cli_3.6.2 knitr_1.45
#> [5] rlang_1.1.3 xfun_0.43 purrr_1.0.2 textshaping_0.3.7
#> [9] jsonlite_1.8.8 DT_0.32 htmltools_0.5.8 ragg_1.3.0
#> [13] sass_0.4.9 rmarkdown_2.26 crosstalk_1.2.1 evaluate_0.23
#> [17] jquerylib_0.1.4 fastmap_1.1.1 yaml_2.3.8 lifecycle_1.0.4
#> [21] memoise_2.0.1 compiler_4.3.3 fs_1.6.3 htmlwidgets_1.6.4
#> [25] systemfonts_1.0.6 digest_0.6.35 R6_2.5.1 curl_5.2.1
#> [29] magrittr_2.0.3 bslib_0.7.0 tools_4.3.3 pkgdown_2.0.7
#> [33] cachem_1.0.8 desc_1.4.3