KEGG (Brite) pathways
Background
- Uniprot does not have good pathway annotation
- GO terms (obtained from Uniprot) are sometimes difficult to work
with as they are too specific or unspecific
- GO terms have the (dis-) advantage of labelling a gene with many
different functional classes
- what we often like to have instead is a simple overview about
gene-pathway relationship
- simple means here that one gene is associated with only one or few
descriptive functions
- KEGG (Brite) is a manually curated (not homology based) database
that offers such information
- this example uses the Bioconductor package
KEGGRest
to
retrieve pathways for the bacterium Bacillus subtilis.
Libraries and test data
- Install
KEGGREST
package from Bioconductor
BiocManager::install("KEGGREST")
suppressPackageStartupMessages({
library(KEGGREST)
library(tidyverse)
})
Retrieve pathways
Starting with organism ID
- KEGG uses organism IDs, for Bacillus subtilis it is
bsu
- using this ID, we can retrieve gene-pathway relationships using a
premade R function
- internally it uses the
keggLink
function to find
pathways and keggList
to retrieve human readable pathway
names
- it also trims some unnecessary text
- it can be used with organism ID (example:
bsu
) or gene
ID (example: bsu:BSU00040
)
source("../source/get_kegg_pathways.R")
- apply function to retrieve pathways by organism
df_kegg <- get_kegg_pathways(id = "bsu")
head(df_kegg)
Starting with gene ID
- we can supply one or more IDs but they need to have the organism tag
in the front
df_kegg_genes <- get_kegg_pathways(id = c("bsu:BSU00040", "bsu:BSU20340"))
head(df_kegg_genes)
Results
- overview about most abundant pathways by locus_tag
- only top 10 pathways are shown
df_summary <- df_kegg %>%
group_by(kegg_pathway) %>%
summarize(count = n()) %>%
arrange(desc(count)) %>%
slice(1:10)
head(df_summary)
df_summary %>%
slice(10:1) %>%
ggplot(aes(x = count, y = fct_inorder(kegg_pathway))) +
geom_col()

- how many genes are associated with multiple pathways?
- this anaylsis is the inverse of the previous one
- 524 genes are associated with only 1 pathway, 241 with 2, and so
on
df_summary <- df_kegg %>%
group_by(locus_tag) %>%
summarize(genes_per_pathway = n()) %>%
count(genes_per_pathway)
df_summary
df_summary %>%
ggplot(aes(x = genes_per_pathway, y = n)) +
geom_line() +
geom_point()

LS0tCnRpdGxlOiAiUmV0cmlldmUgS0VHRyBCcml0ZSBwYXRod2F5IGluZm9ybWF0aW9uIgphdXRob3I6IE1pY2hhZWwgSmFobgpkYXRlOiAiYHIgZm9ybWF0KFN5cy50aW1lKCksICclZCAlQiwgJVknKWAiCm91dHB1dDoKICBodG1sX25vdGVib29rOgogICAgdGhlbWU6IGNvc21vCiAgICB0b2M6IG5vCiAgICBudW1iZXJfc2VjdGlvbnM6IG5vCiAgaHRtbF9kb2N1bWVudDoKICAgIHRvYzogbm8KICAgIGRmX3ByaW50OiBwYWdlZAotLS0KCmBgYHtyIHNldHVwLCBpbmNsdWRlPUZBTFNFfQprbml0cjo6b3B0c19jaHVuayRzZXQoZWNobyA9IFRSVUUpCmBgYAoKIyMgS0VHRyAoQnJpdGUpIHBhdGh3YXlzCgojIyMgQmFja2dyb3VuZAoKLSBVbmlwcm90IGRvZXMgbm90IGhhdmUgZ29vZCBwYXRod2F5IGFubm90YXRpb24KLSBHTyB0ZXJtcyAob2J0YWluZWQgZnJvbSBVbmlwcm90KSBhcmUgc29tZXRpbWVzIGRpZmZpY3VsdCB0byB3b3JrIHdpdGggYXMgdGhleSBhcmUgdG9vIHNwZWNpZmljIG9yIHVuc3BlY2lmaWMKLSBHTyB0ZXJtcyBoYXZlIHRoZSAoZGlzLSkgYWR2YW50YWdlIG9mIGxhYmVsbGluZyBhIGdlbmUgd2l0aCBtYW55IGRpZmZlcmVudCBmdW5jdGlvbmFsIGNsYXNzZXMKLSB3aGF0IHdlIG9mdGVuIGxpa2UgdG8gaGF2ZSBpbnN0ZWFkIGlzIGEgc2ltcGxlIG92ZXJ2aWV3IGFib3V0IGdlbmUtcGF0aHdheSByZWxhdGlvbnNoaXAKLSBzaW1wbGUgbWVhbnMgaGVyZSB0aGF0IG9uZSBnZW5lIGlzIGFzc29jaWF0ZWQgd2l0aCBvbmx5IG9uZSBvciBmZXcgZGVzY3JpcHRpdmUgZnVuY3Rpb25zCi0gS0VHRyAoQnJpdGUpIGlzIGEgbWFudWFsbHkgY3VyYXRlZCAobm90IGhvbW9sb2d5IGJhc2VkKSBkYXRhYmFzZSB0aGF0IG9mZmVycyBzdWNoIGluZm9ybWF0aW9uCi0gdGhpcyBleGFtcGxlIHVzZXMgdGhlIEJpb2NvbmR1Y3RvciBwYWNrYWdlIGBLRUdHUmVzdGAgdG8gcmV0cmlldmUgcGF0aHdheXMgZm9yIHRoZSBiYWN0ZXJpdW0gKkJhY2lsbHVzIHN1YnRpbGlzKi4KCiMjIyBMaWJyYXJpZXMgYW5kIHRlc3QgZGF0YQoKLSBJbnN0YWxsIGBLRUdHUkVTVGAgcGFja2FnZSBmcm9tIEJpb2NvbmR1Y3RvcgoKYGBge3IsIGV2YWwgPSBGQUxTRX0KQmlvY01hbmFnZXI6Omluc3RhbGwoIktFR0dSRVNUIikKYGBgCgoKLSBsb2FkIHJlcXVpcmVkIGxpYnJhcmllcwoKYGBge3J9CnN1cHByZXNzUGFja2FnZVN0YXJ0dXBNZXNzYWdlcyh7CiAgbGlicmFyeShLRUdHUkVTVCkKICBsaWJyYXJ5KHRpZHl2ZXJzZSkKfSkKYGBgCgoKIyMjIFJldHJpZXZlIHBhdGh3YXlzCgojIyMjIFN0YXJ0aW5nIHdpdGggb3JnYW5pc20gSUQKCi0gS0VHRyB1c2VzIG9yZ2FuaXNtIElEcywgZm9yICpCYWNpbGx1cyBzdWJ0aWxpcyogaXQgaXMgYGJzdWAKLSB1c2luZyB0aGlzIElELCB3ZSBjYW4gcmV0cmlldmUgZ2VuZS1wYXRod2F5IHJlbGF0aW9uc2hpcHMgdXNpbmcgYSBwcmVtYWRlIFIgZnVuY3Rpb24KLSBpbnRlcm5hbGx5IGl0IHVzZXMgdGhlIGBrZWdnTGlua2AgZnVuY3Rpb24gdG8gZmluZCBwYXRod2F5cyBhbmQgYGtlZ2dMaXN0YCB0byByZXRyaWV2ZSBodW1hbiByZWFkYWJsZSBwYXRod2F5IG5hbWVzCi0gaXQgYWxzbyB0cmltcyBzb21lIHVubmVjZXNzYXJ5IHRleHQKLSBpdCBjYW4gYmUgdXNlZCB3aXRoIG9yZ2FuaXNtIElEIChleGFtcGxlOiBgYnN1YCkgb3IgZ2VuZSBJRCAoZXhhbXBsZTogYGJzdTpCU1UwMDA0MGApCgpgYGB7cn0Kc291cmNlKCIuLi9zb3VyY2UvZ2V0X2tlZ2dfcGF0aHdheXMuUiIpCmBgYAoKLSBhcHBseSBmdW5jdGlvbiB0byByZXRyaWV2ZSBwYXRod2F5cyBieSBvcmdhbmlzbQoKYGBge3J9CmRmX2tlZ2cgPC0gZ2V0X2tlZ2dfcGF0aHdheXMoaWQgPSAiYnN1IikKaGVhZChkZl9rZWdnKQpgYGAKCiMjIyMgU3RhcnRpbmcgd2l0aCBnZW5lIElECgotIHdlIGNhbiBzdXBwbHkgb25lIG9yIG1vcmUgSURzIGJ1dCB0aGV5IG5lZWQgdG8gaGF2ZSB0aGUgb3JnYW5pc20gdGFnIGluIHRoZSBmcm9udAoKYGBge3J9CmRmX2tlZ2dfZ2VuZXMgPC0gZ2V0X2tlZ2dfcGF0aHdheXMoaWQgPSBjKCJic3U6QlNVMDAwNDAiLCAiYnN1OkJTVTIwMzQwIikpCmhlYWQoZGZfa2VnZ19nZW5lcykKYGBgCgojIyMgUmVzdWx0cwoKLSBvdmVydmlldyBhYm91dCBtb3N0IGFidW5kYW50IHBhdGh3YXlzIGJ5IGxvY3VzX3RhZwotIG9ubHkgdG9wIDEwIHBhdGh3YXlzIGFyZSBzaG93bgoKYGBge3J9CmRmX3N1bW1hcnkgPC0gZGZfa2VnZyAlPiUKICBncm91cF9ieShrZWdnX3BhdGh3YXkpICU+JQogIHN1bW1hcml6ZShjb3VudCA9IG4oKSkgJT4lCiAgYXJyYW5nZShkZXNjKGNvdW50KSkgJT4lCiAgc2xpY2UoMToxMCkKCmhlYWQoZGZfc3VtbWFyeSkKYGBgCgpgYGB7ciwgZmlnLndpZHRoID0gNi41LCBmaWcuaGVpZ2h0ID0gMy41fQpkZl9zdW1tYXJ5ICU+JQogIHNsaWNlKDEwOjEpICU+JQogIGdncGxvdChhZXMoeCA9IGNvdW50LCB5ID0gZmN0X2lub3JkZXIoa2VnZ19wYXRod2F5KSkpICsKICBnZW9tX2NvbCgpCmBgYAoKLSBob3cgbWFueSBnZW5lcyBhcmUgYXNzb2NpYXRlZCB3aXRoIG11bHRpcGxlIHBhdGh3YXlzPwotIHRoaXMgYW5heWxzaXMgaXMgdGhlIGludmVyc2Ugb2YgdGhlIHByZXZpb3VzIG9uZQotIDUyNCBnZW5lcyBhcmUgYXNzb2NpYXRlZCB3aXRoIG9ubHkgMSBwYXRod2F5LCAyNDEgd2l0aCAyLCBhbmQgc28gb24KCmBgYHtyfQpkZl9zdW1tYXJ5IDwtIGRmX2tlZ2cgJT4lCiAgZ3JvdXBfYnkobG9jdXNfdGFnKSAlPiUKICBzdW1tYXJpemUoZ2VuZXNfcGVyX3BhdGh3YXkgPSBuKCkpICU+JQogIGNvdW50KGdlbmVzX3Blcl9wYXRod2F5KQoKZGZfc3VtbWFyeQpgYGAKCmBgYHtyLCBmaWcud2lkdGggPSA2LjUsIGZpZy5oZWlnaHQgPSAzLjV9CmRmX3N1bW1hcnkgJT4lCiAgZ2dwbG90KGFlcyh4ID0gZ2VuZXNfcGVyX3BhdGh3YXksIHkgPSBuKSkgKwogIGdlb21fbGluZSgpICsKICBnZW9tX3BvaW50KCkKYGBgCgo=