KEGG (Brite) pathways

Background

  • Uniprot does not have good pathway annotation
  • GO terms (obtained from Uniprot) are sometimes difficult to work with as they are too specific or unspecific
  • GO terms have the (dis-) advantage of labelling a gene with many different functional classes
  • what we often like to have instead is a simple overview about gene-pathway relationship
  • simple means here that one gene is associated with only one or few descriptive functions
  • KEGG (Brite) is a manually curated (not homology based) database that offers such information
  • this example uses the Bioconductor package KEGGRest to retrieve pathways for the bacterium Bacillus subtilis.

Libraries and test data

  • Install KEGGREST package from Bioconductor
BiocManager::install("KEGGREST")
  • load required libraries
suppressPackageStartupMessages({
  library(KEGGREST)
  library(tidyverse)
})

Retrieve pathways

Starting with organism ID

  • KEGG uses organism IDs, for Bacillus subtilis it is bsu
  • using this ID, we can retrieve gene-pathway relationships using a premade R function
  • internally it uses the keggLink function to find pathways and keggList to retrieve human readable pathway names
  • it also trims some unnecessary text
  • it can be used with organism ID (example: bsu) or gene ID (example: bsu:BSU00040)
source("../source/get_kegg_pathways.R")
  • apply function to retrieve pathways by organism
df_kegg <- get_kegg_pathways(id = "bsu")
head(df_kegg)

Starting with gene ID

  • we can supply one or more IDs but they need to have the organism tag in the front
df_kegg_genes <- get_kegg_pathways(id = c("bsu:BSU00040", "bsu:BSU20340"))
head(df_kegg_genes)

Results

  • overview about most abundant pathways by locus_tag
  • only top 10 pathways are shown
df_summary <- df_kegg %>%
  group_by(kegg_pathway) %>%
  summarize(count = n()) %>%
  arrange(desc(count)) %>%
  slice(1:10)

head(df_summary)
df_summary %>%
  slice(10:1) %>%
  ggplot(aes(x = count, y = fct_inorder(kegg_pathway))) +
  geom_col()

  • how many genes are associated with multiple pathways?
  • this anaylsis is the inverse of the previous one
  • 524 genes are associated with only 1 pathway, 241 with 2, and so on
df_summary <- df_kegg %>%
  group_by(locus_tag) %>%
  summarize(genes_per_pathway = n()) %>%
  count(genes_per_pathway)

df_summary
df_summary %>%
  ggplot(aes(x = genes_per_pathway, y = n)) +
  geom_line() +
  geom_point()

LS0tCnRpdGxlOiAiUmV0cmlldmUgS0VHRyBCcml0ZSBwYXRod2F5IGluZm9ybWF0aW9uIgphdXRob3I6IE1pY2hhZWwgSmFobgpkYXRlOiAiYHIgZm9ybWF0KFN5cy50aW1lKCksICclZCAlQiwgJVknKWAiCm91dHB1dDoKICBodG1sX25vdGVib29rOgogICAgdGhlbWU6IGNvc21vCiAgICB0b2M6IG5vCiAgICBudW1iZXJfc2VjdGlvbnM6IG5vCiAgaHRtbF9kb2N1bWVudDoKICAgIHRvYzogbm8KICAgIGRmX3ByaW50OiBwYWdlZAotLS0KCmBgYHtyIHNldHVwLCBpbmNsdWRlPUZBTFNFfQprbml0cjo6b3B0c19jaHVuayRzZXQoZWNobyA9IFRSVUUpCmBgYAoKIyMgS0VHRyAoQnJpdGUpIHBhdGh3YXlzCgojIyMgQmFja2dyb3VuZAoKLSBVbmlwcm90IGRvZXMgbm90IGhhdmUgZ29vZCBwYXRod2F5IGFubm90YXRpb24KLSBHTyB0ZXJtcyAob2J0YWluZWQgZnJvbSBVbmlwcm90KSBhcmUgc29tZXRpbWVzIGRpZmZpY3VsdCB0byB3b3JrIHdpdGggYXMgdGhleSBhcmUgdG9vIHNwZWNpZmljIG9yIHVuc3BlY2lmaWMKLSBHTyB0ZXJtcyBoYXZlIHRoZSAoZGlzLSkgYWR2YW50YWdlIG9mIGxhYmVsbGluZyBhIGdlbmUgd2l0aCBtYW55IGRpZmZlcmVudCBmdW5jdGlvbmFsIGNsYXNzZXMKLSB3aGF0IHdlIG9mdGVuIGxpa2UgdG8gaGF2ZSBpbnN0ZWFkIGlzIGEgc2ltcGxlIG92ZXJ2aWV3IGFib3V0IGdlbmUtcGF0aHdheSByZWxhdGlvbnNoaXAKLSBzaW1wbGUgbWVhbnMgaGVyZSB0aGF0IG9uZSBnZW5lIGlzIGFzc29jaWF0ZWQgd2l0aCBvbmx5IG9uZSBvciBmZXcgZGVzY3JpcHRpdmUgZnVuY3Rpb25zCi0gS0VHRyAoQnJpdGUpIGlzIGEgbWFudWFsbHkgY3VyYXRlZCAobm90IGhvbW9sb2d5IGJhc2VkKSBkYXRhYmFzZSB0aGF0IG9mZmVycyBzdWNoIGluZm9ybWF0aW9uCi0gdGhpcyBleGFtcGxlIHVzZXMgdGhlIEJpb2NvbmR1Y3RvciBwYWNrYWdlIGBLRUdHUmVzdGAgdG8gcmV0cmlldmUgcGF0aHdheXMgZm9yIHRoZSBiYWN0ZXJpdW0gKkJhY2lsbHVzIHN1YnRpbGlzKi4KCiMjIyBMaWJyYXJpZXMgYW5kIHRlc3QgZGF0YQoKLSBJbnN0YWxsIGBLRUdHUkVTVGAgcGFja2FnZSBmcm9tIEJpb2NvbmR1Y3RvcgoKYGBge3IsIGV2YWwgPSBGQUxTRX0KQmlvY01hbmFnZXI6Omluc3RhbGwoIktFR0dSRVNUIikKYGBgCgoKLSBsb2FkIHJlcXVpcmVkIGxpYnJhcmllcwoKYGBge3J9CnN1cHByZXNzUGFja2FnZVN0YXJ0dXBNZXNzYWdlcyh7CiAgbGlicmFyeShLRUdHUkVTVCkKICBsaWJyYXJ5KHRpZHl2ZXJzZSkKfSkKYGBgCgoKIyMjIFJldHJpZXZlIHBhdGh3YXlzCgojIyMjIFN0YXJ0aW5nIHdpdGggb3JnYW5pc20gSUQKCi0gS0VHRyB1c2VzIG9yZ2FuaXNtIElEcywgZm9yICpCYWNpbGx1cyBzdWJ0aWxpcyogaXQgaXMgYGJzdWAKLSB1c2luZyB0aGlzIElELCB3ZSBjYW4gcmV0cmlldmUgZ2VuZS1wYXRod2F5IHJlbGF0aW9uc2hpcHMgdXNpbmcgYSBwcmVtYWRlIFIgZnVuY3Rpb24KLSBpbnRlcm5hbGx5IGl0IHVzZXMgdGhlIGBrZWdnTGlua2AgZnVuY3Rpb24gdG8gZmluZCBwYXRod2F5cyBhbmQgYGtlZ2dMaXN0YCB0byByZXRyaWV2ZSBodW1hbiByZWFkYWJsZSBwYXRod2F5IG5hbWVzCi0gaXQgYWxzbyB0cmltcyBzb21lIHVubmVjZXNzYXJ5IHRleHQKLSBpdCBjYW4gYmUgdXNlZCB3aXRoIG9yZ2FuaXNtIElEIChleGFtcGxlOiBgYnN1YCkgb3IgZ2VuZSBJRCAoZXhhbXBsZTogYGJzdTpCU1UwMDA0MGApCgpgYGB7cn0Kc291cmNlKCIuLi9zb3VyY2UvZ2V0X2tlZ2dfcGF0aHdheXMuUiIpCmBgYAoKLSBhcHBseSBmdW5jdGlvbiB0byByZXRyaWV2ZSBwYXRod2F5cyBieSBvcmdhbmlzbQoKYGBge3J9CmRmX2tlZ2cgPC0gZ2V0X2tlZ2dfcGF0aHdheXMoaWQgPSAiYnN1IikKaGVhZChkZl9rZWdnKQpgYGAKCiMjIyMgU3RhcnRpbmcgd2l0aCBnZW5lIElECgotIHdlIGNhbiBzdXBwbHkgb25lIG9yIG1vcmUgSURzIGJ1dCB0aGV5IG5lZWQgdG8gaGF2ZSB0aGUgb3JnYW5pc20gdGFnIGluIHRoZSBmcm9udAoKYGBge3J9CmRmX2tlZ2dfZ2VuZXMgPC0gZ2V0X2tlZ2dfcGF0aHdheXMoaWQgPSBjKCJic3U6QlNVMDAwNDAiLCAiYnN1OkJTVTIwMzQwIikpCmhlYWQoZGZfa2VnZ19nZW5lcykKYGBgCgojIyMgUmVzdWx0cwoKLSBvdmVydmlldyBhYm91dCBtb3N0IGFidW5kYW50IHBhdGh3YXlzIGJ5IGxvY3VzX3RhZwotIG9ubHkgdG9wIDEwIHBhdGh3YXlzIGFyZSBzaG93bgoKYGBge3J9CmRmX3N1bW1hcnkgPC0gZGZfa2VnZyAlPiUKICBncm91cF9ieShrZWdnX3BhdGh3YXkpICU+JQogIHN1bW1hcml6ZShjb3VudCA9IG4oKSkgJT4lCiAgYXJyYW5nZShkZXNjKGNvdW50KSkgJT4lCiAgc2xpY2UoMToxMCkKCmhlYWQoZGZfc3VtbWFyeSkKYGBgCgpgYGB7ciwgZmlnLndpZHRoID0gNi41LCBmaWcuaGVpZ2h0ID0gMy41fQpkZl9zdW1tYXJ5ICU+JQogIHNsaWNlKDEwOjEpICU+JQogIGdncGxvdChhZXMoeCA9IGNvdW50LCB5ID0gZmN0X2lub3JkZXIoa2VnZ19wYXRod2F5KSkpICsKICBnZW9tX2NvbCgpCmBgYAoKLSBob3cgbWFueSBnZW5lcyBhcmUgYXNzb2NpYXRlZCB3aXRoIG11bHRpcGxlIHBhdGh3YXlzPwotIHRoaXMgYW5heWxzaXMgaXMgdGhlIGludmVyc2Ugb2YgdGhlIHByZXZpb3VzIG9uZQotIDUyNCBnZW5lcyBhcmUgYXNzb2NpYXRlZCB3aXRoIG9ubHkgMSBwYXRod2F5LCAyNDEgd2l0aCAyLCBhbmQgc28gb24KCmBgYHtyfQpkZl9zdW1tYXJ5IDwtIGRmX2tlZ2cgJT4lCiAgZ3JvdXBfYnkobG9jdXNfdGFnKSAlPiUKICBzdW1tYXJpemUoZ2VuZXNfcGVyX3BhdGh3YXkgPSBuKCkpICU+JQogIGNvdW50KGdlbmVzX3Blcl9wYXRod2F5KQoKZGZfc3VtbWFyeQpgYGAKCmBgYHtyLCBmaWcud2lkdGggPSA2LjUsIGZpZy5oZWlnaHQgPSAzLjV9CmRmX3N1bW1hcnkgJT4lCiAgZ2dwbG90KGFlcyh4ID0gZ2VuZXNfcGVyX3BhdGh3YXksIHkgPSBuKSkgKwogIGdlb21fbGluZSgpICsKICBnZW9tX3BvaW50KCkKYGBgCgo=