installifnot <- function (packageName){
if (!(require(packageName, character.only=TRUE))) {
install.packages(packageName)
}else{
detach(paste ("package", packageName, sep=":"), character.only=TRUE)
}
}
bioCifnot <- function (packageName){
if (!(require(packageName, character.only=TRUE))) {
BiocManager::install(packageName)
}else{
detach(paste ("package", packageName, sep=":"), character.only=TRUE)
}
}
installifnot("knitr")
installifnot("xml2") # May yield problems if some libraries (xml2-config) not available in linux
installifnot("ggnewscale")
bioCifnot ("org.Hs.eg.db")
bioCifnot ("hgu133a.db")
bioCifnot ("GO.db")
bioCifnot ("annotate")
bioCifnot ("Rgraphviz")
bioCifnot ("GOstats")
bioCifnot ("clusterProfiler")
This document provides some examples on the analyses that can be perfomed on one or more gene lists to help gain biological insight on the results of a differential expression analysis. Overall these analyses are known as Pathway Analysis or, also, Functional Analysis.
Functional analysis can be performed in many different ways that lead to similar (or not-so-similar) results. Because there is not a universal acceptance of what is a complete, well done functional analysis some different approaches will be shown.
Functional analysis can be made, on a first approach on:
Most tools require that gene list consist of gene identifiers in some
standard notation such as Entrez
, ENSEMBL
or
other related to these.
These gene lists can be usually extracted from output tables provided by microarrays or RNA-seq data analysis tools.
The examples shown in the document use several gene lists obtained from a maicroarray analysi performed on data from a breast cancer study, but it can be easily extended to more lists or other studies.
We start by reading two files that contain the expression values
(expres_AvsB.csv2
) and the results
(Top_AvsB.csv2
) of a differential expression analysis
performed using microarrays.
The code and text for the analysis that, using these data, generated these tables, can be found at: https://github.com/ASPteaching/Ejemplo_de_Analisis_de_Microarrays_con_Bioconductor
The code below assumes the files have been stored in a subdirectory
of the current folder named datasets
.
inputDir="datasets"
topTabAvsB <- read.table (file.path(inputDir, "Top_AvsB.csv2"), head=T, sep=";", dec=",", row.names=1)
expresAvsB <- read.table (file.path(inputDir, "expres_AvsB.csv2"), head=T, sep=";", dec=",", row.names=1)
comparisonName <- "AvsB"
dim(topTabAvsB); head(topTabAvsB)
## [1] 6218 6
## logFC AveExpr t P.Value adj.P.Val B
## 204667_at -3.038344 8.651157 -14.361559 5.741537e-11 3.570088e-07 14.649763
## 215729_s_at 3.452290 6.137595 12.814617 3.438484e-10 1.069025e-06 13.150062
## 220192_x_at -3.016315 9.521883 -10.859041 4.336167e-09 7.020345e-06 10.929050
## 214451_at -5.665059 7.432823 -10.829695 4.516144e-09 7.020345e-06 10.892594
## 217528_at -5.622086 6.763101 -9.666279 2.429965e-08 3.021904e-05 9.364339
## 217284_x_at -4.313116 9.133307 -9.528468 2.994504e-08 3.103304e-05 9.172112
dim(expresAvsB); head(expresAvsB)
## [1] 6218 10
## GSM26878.CEL GSM26883.CEL GSM26887.CEL GSM26903.CEL GSM26910.CEL
## 204667_at 9.822291 9.514461 9.919010 9.600700 9.592063
## 215729_s_at 4.736669 4.761001 6.254886 4.820416 4.848366
## 220192_x_at 10.484295 10.914847 10.510820 11.509958 10.265186
## 214451_at 10.176920 10.060391 11.201140 10.888825 10.403751
## 217528_at 10.533946 10.036022 11.325578 8.053482 10.618710
## 217284_x_at 11.726573 9.740720 11.436035 12.818812 12.686506
## GSM26888.CEL GSM26889.CEL GSM26892.CEL GSM26898.CEL GSM26906.CEL
## 204667_at 6.483792 6.551345 7.001161 6.685062 6.535446
## 215729_s_at 8.265870 8.962679 8.304247 8.768668 8.381325
## 220192_x_at 7.823961 7.809752 7.522203 8.427221 7.020395
## 214451_at 4.818335 4.784154 4.975607 4.912010 4.915627
## 217528_at 4.580566 4.537627 4.518766 4.357199 4.463147
## 217284_x_at 7.274321 7.298390 7.490518 7.562369 7.217467
A given gene list contains useful information that can be extracted by querying databases.
Let us see how we can obtain information fom the probesets
in table (comparison) AvsB
.
myProbes <- rownames(expresAvsB)
head(myProbes)
## [1] "204667_at" "215729_s_at" "220192_x_at" "214451_at" "217528_at"
## [6] "217284_x_at"
We need to load the library (“package”) that contains specific annotations for the microarray type that was used in this study. It has to be noticed also that each row does not represent a gene, but a probeset, a sequence that has been designed to detect if a given gene is expressed. Microarrays contain multiple probesets for many genes and this is something that ha to be dealt with.
In order to do many analyses it is convenient to use a universally
accepted identifier such as Entrez
or ENSEMBL
.
For instance Bioconductor organism annotation packages rely on
Entrez
identifiers as main key for most mappings.
It is possible to easily find out which mappings are available for each ID.
library(hgu133a.db)
keytypes(hgu133a.db)
## [1] "ACCNUM" "ALIAS" "ENSEMBL" "ENSEMBLPROT" "ENSEMBLTRANS"
## [6] "ENTREZID" "ENZYME" "EVIDENCE" "EVIDENCEALL" "GENENAME"
## [11] "GENETYPE" "GO" "GOALL" "IPI" "MAP"
## [16] "OMIM" "ONTOLOGY" "ONTOLOGYALL" "PATH" "PFAM"
## [21] "PMID" "PROBEID" "PROSITE" "REFSEQ" "SYMBOL"
## [26] "UCSCKG" "UNIPROT"
Annotation packages make it possible to annotate genes and in a similar manner other omics features. For example, we can obtain gene symbol, entrez ID and gene name with a single SQL instruction.
geneAnots <- AnnotationDbi::select(hgu133a.db, myProbes, c("SYMBOL", "ENTREZID", "GENENAME"))
head(geneAnots)
## PROBEID SYMBOL ENTREZID
## 1 204667_at FOXA1 3169
## 2 215729_s_at VGLL1 51442
## 3 220192_x_at SPDEF 25803
## 4 214451_at TFAP2B 7021
## 5 217528_at CLCA2 9635
## 6 217284_x_at SERHL2 253190
## GENENAME
## 1 forkhead box A1
## 2 vestigial like family member 1
## 3 SAM pointed domain containing ETS transcription factor
## 4 transcription factor AP-2 beta
## 5 chloride channel accessory 2
## 6 serine hydrolase like 2
Now we can provide a more informative list of differentially expressed genes in topTable
selected<- topTabAvsB[,"adj.P.Val"]<0.05 & topTabAvsB[,"logFC"] > 1
sum(selected)
## [1] 243
selectedTopTab <- topTabAvsB[selected,]
selectedProbes <- rownames(selectedTopTab)
selectedAnots <- AnnotationDbi::select(hgu133a.db, selectedProbes, c("SYMBOL", "ENTREZID", "GENENAME"))
selectedTopTab2 <- cbind(PROBEID=rownames(selectedTopTab), selectedTopTab)
# selectedInfo <- cbind(selectedAnots, selectedTopTab)
selectedInfo = merge(selectedAnots, selectedTopTab2, by="PROBEID")
write.csv2(selectedInfo, file="selectedTopTab_AvsB.csv2")
Pathway Analysis is an extensive field of research and application and the aim of this document is not to summarize it but simply to illustrate some applications with R.
See (https://github.com/ASPteaching/An-Introduction-to-Pathway-Analysis-with-R-and-Bioconductor/raw/main/Slides/Intro2PathwayEnrichmentAnalysis-SHORT.pdf) for an introduction to the topic.
The most used database in Pathway Analysis is the Gene Ontology,
which, as suggested by its name, is not a dabase but an ontology. The
GO.db
can be accessed using the same syntaxis as with gene
identifiers.
Bioconductor libraries allow for both: - Exploration of functional information on genes based on the Gene Ontology - Different types of Gene Enrichment and Pathway Analysis based on the GO or other pathway databases such as the KEGG, Reactome etc.
The code below shows some easy ways to retrieve GO information associated with genes
Start by loading the appropriate packages
require(GOstats)
require(GO.db)
require(hgu133a.db);
require(annotate) # Loads the required libraries.
Select the “top 5” genes from the list
probes <- rownames(expresAvsB)[1:5]
In the first versions of Bioconductor, identifiers were managed using
environments and functions such as get
or
mget
. While it still works, nowadays it has been
substituted by the use of the select
function which allows
for a muach easier retrieving of identifiers.
Select
returns a data.frame. If we need a character
vector we can obtain it using an ad-hoc function sucha as the
simplecharFromSelect
below.
charFromSelect <- function(df,posNames=1, posValues=2){
res <- df[,posValues]
names(res) <- df[,posNames]
return(res)
}
require(annotate)
geneIDs <- AnnotationDbi::select(hgu133a.db, probes, c("ENTREZID", "SYMBOL"))
entrezs <- charFromSelect(geneIDs, 1, 2)
simbols <- charFromSelect(geneIDs, 1, 3)
Given a set of gene identifiers its associated GO identifiers can be extracted from the microarrays (for probesets ids) or from the organism (for Entrez IDS) annotation package.
# % WANING
# The previous chunk can be substituted by the followink code chunk, shorter and more efficient.
library(hgu133a.db)
keytypes(hgu133a.db)
## [1] "ACCNUM" "ALIAS" "ENSEMBL" "ENSEMBLPROT" "ENSEMBLTRANS"
## [6] "ENTREZID" "ENZYME" "EVIDENCE" "EVIDENCEALL" "GENENAME"
## [11] "GENETYPE" "GO" "GOALL" "IPI" "MAP"
## [16] "OMIM" "ONTOLOGY" "ONTOLOGYALL" "PATH" "PFAM"
## [21] "PMID" "PROBEID" "PROSITE" "REFSEQ" "SYMBOL"
## [26] "UCSCKG" "UNIPROT"
res <-AnnotationDbi::select(hgu133a.db, keys=probes, keytype = "PROBEID", columns = c("ENTREZID", "SYMBOL","ONTOLOGY"))
res1 <- AnnotationDbi::select(hgu133a.db, keys=probes, keytype = "PROBEID", columns = c("ENTREZID", "SYMBOL","GO"))
The resulting tables can be printed or saved.
print(head(res, n=10))
## PROBEID ENTREZID SYMBOL ONTOLOGY
## 1 204667_at 3169 FOXA1 BP
## 2 204667_at 3169 FOXA1 CC
## 3 204667_at 3169 FOXA1 MF
## 4 215729_s_at 51442 VGLL1 BP
## 5 215729_s_at 51442 VGLL1 CC
## 6 215729_s_at 51442 VGLL1 MF
## 7 220192_x_at 25803 SPDEF BP
## 8 220192_x_at 25803 SPDEF CC
## 9 220192_x_at 25803 SPDEF MF
## 10 214451_at 7021 TFAP2B BP
print(head(res1, n=10))
## PROBEID ENTREZID SYMBOL GO EVIDENCE ONTOLOGY
## 1 204667_at 3169 FOXA1 GO:0000122 IEA BP
## 2 204667_at 3169 FOXA1 GO:0000785 ISA CC
## 3 204667_at 3169 FOXA1 GO:0000976 IDA MF
## 4 204667_at 3169 FOXA1 GO:0000978 IBA MF
## 5 204667_at 3169 FOXA1 GO:0000981 IBA MF
## 6 204667_at 3169 FOXA1 GO:0000981 ISA MF
## 7 204667_at 3169 FOXA1 GO:0001228 ISS MF
## 8 204667_at 3169 FOXA1 GO:0001650 IDA CC
## 9 204667_at 3169 FOXA1 GO:0003677 IDA MF
## 10 204667_at 3169 FOXA1 GO:0003682 IEA MF
This process yield GO identifiers but its meaning is not clear. For
this we need to query the GO.db package
The We can query the
GO.db` package can be queried as any
anotation package.
library(GO.db)
keytypes(GO.db)
## [1] "DEFINITION" "GOID" "ONTOLOGY" "TERM"
columns(GO.db)
## [1] "DEFINITION" "GOID" "ONTOLOGY" "TERM"
Assume we are only interested in the first three GO identifiers. We
can obtain its name directly using the appropriate select
call. First we remove duplicates then we proceed to extract
information.
uniqueGOTerms <- unique(res1$GO)
selectedGOTerms <- uniqueGOTerms[1:5]
select(GO.db, selectedGOTerms, c("GOID", "ONTOLOGY","TERM" ))
## GOID ONTOLOGY
## 1 GO:0000122 BP
## 2 GO:0000785 CC
## 3 GO:0000976 MF
## 4 GO:0000978 MF
## 5 GO:0000981 MF
## TERM
## 1 negative regulation of transcription by RNA polymerase II
## 2 chromatin
## 3 transcription cis-regulatory region binding
## 4 RNA polymerase II cis-regulatory region sequence-specific DNA binding
## 5 DNA-binding transcription factor activity, RNA polymerase II-specific
There are two main types of enrichment analysis:
Over-Representation Analysis takes a list of differentially expressed genes and it searches for biological categories in which a number of genes appear with “unusually” high frequencies. That is it looks for genes appearing more often than they would be expected by chance in any category.
Gene Set Expression Analyses works with all genes and looks for differentially expressed gene sets (categories). That is it searches for categories that, without containing an unusually high number of differentially expressed genes, are associated with genes that are in the upper or lower parts of the list of genes ordered by some measure of intensity of difference, such as the “log-Fold Change”.
Over-Representation Analysis is applied on a “truncated” list of genes that one considers to be differentially expressed.
These are checked for enrichment versus a “Universe” gene list, usually, all the genes that have entered in the analysis
require(hgu133a.db)
topTab <- topTabAvsB
probesUniverse <- rownames(topTab)
# entrezUniverse = unlist(mget(rownames(topTab), hgu133aENTREZID, ifnotfound=NA))
entrezUniverse<- select(hgu133a.db, probesUniverse, "ENTREZID")
entrezUniverse <- entrezUniverse$ENTREZID
whichGenes<- topTab["adj.P.Val"]<0.05 & topTab["logFC"] > 1
sum(whichGenes)
## [1] 243
topGenes <- entrezUniverse[whichGenes]
# Remove possible duplicates
topGenes <- topGenes[!duplicated(topGenes)]
entrezUniverse <- entrezUniverse[!duplicated(entrezUniverse)]
Many packages can be used to do a Gene Enrichment Analysis. Each of them perfoms slightly different analysis but the underlying ideas are the same. Some of these packages are:
GOstats
topGO
gprofiler
clusterProfiler
We show examples on how to do it using GOstats
and
clusterProfiler
GOstats
In this section we use GOstats
which was one of the
first packages available in Bioconduictor to do Enrichment Analysis.
The idea for using it is pretty simple: We need to create special type of object called “Hyperparameter” which may be of class either:
GOHyperGParams
if we want to do the enrichment analysis
using the Gene Ontology as reference databases orKEGGHyperGParams
if we wish to do the analysis using
the KEGG database, orPFAMHyperGParams
if we wish to do the analysis using
the PFAM database.A good strategy is to use all three.
First, we create the hyper-parameters.
library(GOstats)
# This parameter has an "ontology" argument. It may be "BP", "MF" or "CC"
# Other arguments are taken by default. Check the help for more information.
GOparams = new("GOHyperGParams",
geneIds=topGenes, universeGeneIds=entrezUniverse,
annotation="hgu133a.db", ontology="BP",
pvalueCutoff=0.001)
# These hyperparameters do not have an "ontology" argument.
KEGGparams = new("KEGGHyperGParams",
geneIds=topGenes, universeGeneIds=entrezUniverse,
annotation="hgu133a.db",
pvalueCutoff=0.001)
PFAMparams = new("PFAMHyperGParams",
geneIds=topGenes, universeGeneIds=entrezUniverse,
annotation="hgu133a.db",
pvalueCutoff=0.001)
Next, we run the analyses:
GOhyper = hyperGTest(GOparams)
KEGGhyper = hyperGTest(KEGGparams)
# PFAMhyper = hyperGTest(PFAMparams)
We can extract information directly from the analysis results using
summary
:
dim(summary(GOhyper))
## [1] 6 7
head(summary(GOhyper))
## GOBPID Pvalue OddsRatio ExpCount Count Size
## 1 GO:0019370 0.0000917494 31.294444 0.2904762 4 7
## 2 GO:0046395 0.0004796855 3.282433 4.4816327 13 108
## 3 GO:0016054 0.0005736863 3.213594 4.5646259 13 110
## 4 GO:0072329 0.0007287196 4.193992 2.4897959 9 60
## 5 GO:0006691 0.0007580159 13.402381 0.4564626 4 11
## 6 GO:0045109 0.0008510094 8.401076 0.7884354 5 19
## Term
## 1 leukotriene biosynthetic process
## 2 carboxylic acid catabolic process
## 3 organic acid catabolic process
## 4 monocarboxylic acid catabolic process
## 5 leukotriene metabolic process
## 6 intermediate filament organization
Apart of this we can create an automated report per each result, or, with a litle more effort we can combine all outputs into a single one.
# Creamos un informe html con los resultados
comparison = "AvsB"
GOfilename =file.path(paste("GOResults.",comparison,".html", sep=""))
htmlReport(GOhyper, file = GOfilename, summary.args=list("htmlLinks"=TRUE))
The resulting file has the contents provided by the call to
summary
but with additional links.
clusterProfiler
GO enrichment analysis can be performed using enrichGO
function from clusterProfiler
package. We will perform the
analysis over the Biological Process (BP) GO category.
The Gene Universe, the set of all genes, can be specified as in
GOstats
or simply assumed to be all the human genes.
An aside comment about the format of identifiers: -
GOstats
expects entrez identifiers to be integer values
stored as characters. - clusterProfiler
requires
the same identifiers, but they have to be stored as
integers.
This means that, in order to reuse identifierswe need to recast them
into one or another direction. In this case, because
topGenes
is a character vector, to use them in Bioconductor
we need to convert them into integers using
as.integer(topGenes)
.
# In GOstats use the following code
#
# GOparams = new("GOHyperGParams",
# geneIds=topGenes, universeGeneIds=entrezUniverse,
# annotation="hgu133a.db", ontology="BP",
# pvalueCutoff=0.001)
library(clusterProfiler)
## Run GO enrichment analysis
ego <- enrichGO(gene = as.integer(topGenes), # selgenes,
# universe = entrezUniverse, # universe = all_genes,
keyType = "ENTREZID",
OrgDb = org.Hs.eg.db,
ont = "BP",
pAdjustMethod = "BH",
qvalueCutoff = 0.25,
readable = TRUE)
head(ego, n=10)
## ID Description GeneRatio BgRatio
## GO:0046395 GO:0046395 carboxylic acid catabolic process 13/244 238/18800
## GO:0016054 GO:0016054 organic acid catabolic process 13/244 242/18800
## GO:0072329 GO:0072329 monocarboxylic acid catabolic process 9/244 124/18800
## GO:0006635 GO:0006635 fatty acid beta-oxidation 7/244 76/18800
## pvalue p.adjust qvalue
## GO:0046395 1.475561e-05 0.02886389 0.02710290
## GO:0016054 1.760530e-05 0.02886389 0.02710290
## GO:0072329 3.500820e-05 0.03826396 0.03592947
## GO:0006635 5.809675e-05 0.04762481 0.04471921
## geneID
## GO:0046395 GLS/CRAT/ACAA2/ECHDC2/GCAT/ECI2/IRS2/AKR1A1/SCP2/CRABP1/ACOX2/KYAT3/ALDH5A1
## GO:0016054 GLS/CRAT/ACAA2/ECHDC2/GCAT/ECI2/IRS2/AKR1A1/SCP2/CRABP1/ACOX2/KYAT3/ALDH5A1
## GO:0072329 CRAT/ACAA2/ECHDC2/ECI2/IRS2/AKR1A1/SCP2/CRABP1/ACOX2
## GO:0006635 CRAT/ACAA2/ECHDC2/ECI2/IRS2/SCP2/ACOX2
## Count
## GO:0046395 13
## GO:0016054 13
## GO:0072329 9
## GO:0006635 7
# Output results from GO analysis to a table
ego_results <- data.frame(ego)
write.csv(ego_results, "clusterProfiler_ORAresults_UpGO.csv")
Dotplot of top 10 enriched terms
dotplot(ego, showCategory=10)
Visualization of GO terms in hierarchy
Enriched GO terms can be visualized as a directed acyclic graph (only for GO):
goplot(ego, showCategory=10)
Gene network for the top terms
cnetplot(ego)
Enrichment Map
Enriched terms can be grouped by some similarity measure (eg. overlap of genes between terms) to summarize the results.
## Enrichmap clusters the 50 most significant (by adj.P.Va) GO terms to visualize relationships between terms
library(enrichplot)
ego_sim <- pairwise_termsim(ego)
emapplot(ego_sim, cex_label_category=0.5)
Note:
For overrepresentation analysis based on Reactome
Pathways database one can use function
enrichPathway
from ReactomePA
package.
Challenge: perform an overrepresentation analysis of the top down-regulated genes with a logFC < -2 and adjusted p-value < 0.05 over the GO-BP ontology.
If, instead of relying on the gene lists we decided to use all the genes on the array and confront them to selected sets of genes we may use the Gene Set Enrichment Analysis approach.
The clusterProfiler
package implements the classical
GSEA method as introduced by Subramanian et alt (2005).
entrezIDs <- AnnotationDbi::select(hgu133a.db, rownames(topTabAvsB), c("ENTREZID"))
# entrezIDs <- charFromSelect(entrezIDs)
topTabAvsB2<- cbind( PROBEID= rownames(topTabAvsB), topTabAvsB)
geneList <- merge(topTabAvsB2, entrezIDs, by="PROBEID")
# sort by absolute logFC to remove duplicates with smallest absolute logFC
geneList <- geneList[order(abs(geneList$logFC), decreasing=T),]
geneList <- geneList[ !duplicated(geneList$ENTREZ), ] ### Keep highest
# re-order based on logFC to be GSEA ready
geneList <- geneList[order(geneList$logFC, decreasing=T),]
genesVector <- geneList$logFC
names(genesVector) <- geneList$ENTREZ
set.seed(123)
library(clusterProfiler)
gseResulti <- gseKEGG(geneList = genesVector,
organism = "hsa",
keyType = "kegg",
exponent = 1,
minGSSize = 10,maxGSSize = 500,
pvalueCutoff = 0.05,pAdjustMethod = "BH",
# nPerm = 10000, #augmentem permutacions a 10000
verbose = TRUE,
use_internal_data = FALSE,
seed = TRUE,
eps=0,
by = "fgsea"
)
# keggResultsList[[i]] <- gseResulti
library(kableExtra)
gsea.result <- setReadable(gseResulti, OrgDb = org.Hs.eg.db, keyType ="ENTREZID" )
gsea.result.df <- as.data.frame(gsea.result)
print(kable(gsea.result.df[,c("Description","setSize","NES","p.adjust")])%>% scroll_box(height = "500px"))
Description | setSize | NES | p.adjust | |
---|---|---|---|---|
hsa04110 | Cell cycle | 89 | 2.078872 | 0.0000981 |
hsa04146 | Peroxisome | 43 | -2.225763 | 0.0002159 |
hsa04061 | Viral protein interaction with cytokine and cytokine receptor | 42 | 2.164157 | 0.0002870 |
hsa05133 | Pertussis | 39 | 2.134750 | 0.0002870 |
hsa04064 | NF-kappa B signaling pathway | 56 | 2.069468 | 0.0005921 |
hsa05152 | Tuberculosis | 94 | 1.930153 | 0.0005921 |
hsa00280 | Valine, leucine and isoleucine degradation | 33 | -2.165861 | 0.0010073 |
hsa05323 | Rheumatoid arthritis | 54 | 1.988814 | 0.0010073 |
hsa00350 | Tyrosine metabolism | 14 | -2.172127 | 0.0019756 |
hsa04060 | Cytokine-cytokine receptor interaction | 87 | 1.853078 | 0.0019756 |
hsa05171 | Coronavirus disease - COVID-19 | 116 | 1.752238 | 0.0019756 |
hsa04145 | Phagosome | 95 | 1.784051 | 0.0041828 |
hsa05150 | Staphylococcus aureus infection | 41 | 1.914447 | 0.0045621 |
hsa04668 | TNF signaling pathway | 69 | 1.804001 | 0.0049851 |
hsa05164 | Influenza A | 89 | 1.761631 | 0.0054464 |
hsa04640 | Hematopoietic cell lineage | 40 | 1.896413 | 0.0055800 |
hsa03040 | Spliceosome | 84 | 1.765446 | 0.0055800 |
hsa04621 | NOD-like receptor signaling pathway | 89 | 1.752585 | 0.0055800 |
hsa04115 | p53 signaling pathway | 49 | 1.824305 | 0.0064536 |
hsa04062 | Chemokine signaling pathway | 93 | 1.756455 | 0.0101539 |
hsa04650 | Natural killer cell mediated cytotoxicity | 58 | 1.740036 | 0.0101539 |
hsa05166 | Human T-cell leukemia virus 1 infection | 137 | 1.641323 | 0.0101539 |
hsa05140 | Leishmaniasis | 42 | 1.863793 | 0.0112608 |
hsa04612 | Antigen processing and presentation | 44 | 1.822988 | 0.0136566 |
hsa00053 | Ascorbate and aldarate metabolism | 10 | -1.953422 | 0.0210111 |
hsa00982 | Drug metabolism - cytochrome P450 | 21 | -1.899255 | 0.0217437 |
hsa05332 | Graft-versus-host disease | 22 | 1.793265 | 0.0217437 |
hsa01240 | Biosynthesis of cofactors | 76 | -1.673501 | 0.0217437 |
hsa05416 | Viral myocarditis | 34 | 1.759287 | 0.0227068 |
hsa00040 | Pentose and glucuronate interconversions | 12 | -1.943107 | 0.0237546 |
hsa00410 | beta-Alanine metabolism | 18 | -1.915210 | 0.0254396 |
hsa00650 | Butanoate metabolism | 14 | -1.861185 | 0.0335622 |
hsa05206 | MicroRNAs in cancer | 107 | 1.586085 | 0.0335622 |
hsa04657 | IL-17 signaling pathway | 53 | 1.672150 | 0.0340398 |
hsa05320 | Autoimmune thyroid disease | 18 | 1.750445 | 0.0406686 |
hsa05417 | Lipid and atherosclerosis | 108 | 1.537359 | 0.0425294 |
hsa05330 | Allograft rejection | 19 | 1.715923 | 0.0450705 |
hsa04672 | Intestinal immune network for IgA production | 18 | 1.730816 | 0.0457146 |
hsa05167 | Kaposi sarcoma-associated herpesvirus infection | 112 | 1.533654 | 0.0457146 |
hsa00900 | Terpenoid backbone biosynthesis | 15 | -1.814110 | 0.0479823 |
library(ggplot2)
# for (i in 1:length(files)){
# cat("\nComparison: ", namesC[i],"\n")
cat("DOTPLOT\n")
# if(nrow(keggResultsList[[i]]) > 0){
if(nrow(gseResulti) > 0){
p<- dotplot(gseResulti, showCategory = 20, font.size = 15,
title =paste("Enriched Pathways\n", comparisonName ,
split=".sign") + facet_grid(.~.sign))
plot(p)
cat("\nENRICHMENT MAP\n")
em<- emapplot(gseResulti)
plot(em)
#guardem en pdf
pdf(file = paste0("KEGGplots.",comparisonName,".pdf"),
width = 14, height = 14)
print(p)
print(em)
dev.off()
}else{
cat("\nNo enriched terms found\n")
}