WGBSFlow/Methylator pipeline version XXX
Developed and maintained at the BiBs platform (Epigenetics and Cell Fate Center - UMR7216) by Olivier Kirsh, Elouan Bethuel and Magali Hennion.
Please send feedback to: bibs@parisepigenetics.com.
Don’t forget to read and sign our charter.
Remember to cite pipeline authors in your publication.
Methylator runs R scripts and uses : - MethylKit package to handle DNA methylation data.
- bsseq package to infere differentially methylated regions (DMRs) from bisulfite sequencing
Conda within a Singularity/Apptainer image handles R version and package dependencies for improved reproducibility. Refer to the ‘R session information’ section for version details.
# load packages
library(BiocParallel)
library(ggplot2)
library(dplyr)
library(tidyr)
library(GenomicFeatures)
library(annotatr)
library(FactoMineR)
library(factoextra)
library(bsseq)
library(yaml)
library(dmrseq)
library(plyranges)
library(annotatr)
library(GenomicRanges)
library(stringr)
library(methylKit)
Read analysis parameters from ‘config_wgbs.yaml’ or ‘config_nanopore.yaml’ and extract sample names and groups from the corresponding metadata file.
# load the config file
yaml.file <- yaml.load_file('config_ongoing_run.yaml')
# extract the information from the yaml file
ORG <- yaml.file$ASSEMBLY
DESTRAND<- yaml.file$DESTRAND
MINCOV <- yaml.file$MINCOV
COV.PERC <- yaml.file$COV.PERC
DMR_TYPE <- yaml.file$DMR_TYPE
MIN_CPG <- yaml.file$MIN_CPG
MAX_GAP <- yaml.file$MAX_GAP
CUTOFF <- yaml.file$CUTOFF
FDR <- yaml.file$FDR
UNITE <- yaml.file$UNITE
# extract the metadata
meta <- read.table(yaml.file$METAFILE, sep='\t', header=T)
knitr::kable(meta, "pipe")
| sample | group |
|---|---|
| SRR9016926_sub500000_chr19 | WT |
| SRR9016927_sub500000_chr19 | 1KO |
| SRR9016928_sub500000_chr19 | DKO |
| SRR11806587_sub500000_chr19 | WT |
| SRR11806588_sub500000_chr19 | 1KO |
| SRR11806589_sub500000_chr19 | DKO |
# set colors
full.color = "#de0000"
high.color = "#e17e65"
mid.color = "#c6c6c6"
low.color = "#7c6fac"
unmeth.color = "#0f208f"
hyper.color = "#de0000"
hypo.color = "#0f208f"
not.color = "#c6c6c6"
Methylator takes primary DNA methylation data and generates one RData environment with one MethylRaw object per sample.
The MethylRaw object stores the following for each CpG:
- Genomic coordinates (Chrom, Start, End, Strand)
- Coverage (number of reads)
- Number of methylated and unmethylated cytosines (for computing the methylation status)
It contains the total number of CpG, sample_id, genome assembly and methylation context (CpG).
The dmrsseq package manipulates bsseq objects to infer DMRs,
It is possible to read files produced by Bismark’s methylation extractor (.gz or .bz2 )
But the Methylkit package offers numerous functions for filtering and unifying samples.
So we prefer to start with a Methylkit object and convert it into a bsseq object.
## An object of type 'BSseq' with
## 6 methylation loci
## 4 samples
## has not been smoothed
## All assays are in-memory
Beta values can be interpreted as the proportion of methylation at a given locus.
Beta values are continuous variables between 0 (CpG site always found unmethylated in sample DNA) and 1 (CpG site always found methylated in sample DNA)
plotEmpiricalDistribution(bs, testCovariate="group")
plotEmpiricalDistribution(bs, testCovariate="sample")
bs_cov <- as.data.frame(getCoverage(bs, type = "Cov"))
colnames(bs_cov) <- meta_cond$group
bs_cov_rsh <- pivot_longer(bs_cov, cols = meta_cond$group, names_to = "samples", values_to = "CpG_cov")
plt_cov <- ggplot(data = bs_cov_rsh, aes(x = CpG_cov, color = samples)) +
geom_line(stat = "count", size = 1.5) +
coord_cartesian(xlim = c(0, 40)) +
ggtitle("CpG coverage by group") + theme_classic(base_size = 14)
plt_cov
bs_cov <- as.data.frame(getCoverage(bs, type = "Cov"))
colnames(bs_cov) <- meta_cond$sample
bs_cov_rsh <- pivot_longer(bs_cov, cols = meta_cond$sample, names_to = "samples", values_to = "CpG_cov")
plt_cov <- ggplot(data = bs_cov_rsh, aes(x = CpG_cov, color = samples)) +
geom_line(stat = "count", size = 1.5) +
coord_cartesian(xlim = c(0, 40)) +
scale_x_continuous(breaks = seq(0, 40, 5)) +
ggtitle("CpG coverage by sample") + theme_classic()
plt_cov
When DMRs are inferred and then displayed with dmrseq, annotations can be associated with them
(genes or CpG islands present in the region or nearby) using an annoTrack object.
To build annoTrack we used the “annotation” object produced with the ‘Annotatr.R’ script using the annotar package
annotatr
The annoTrack object looks like this:
annoTrack <- GenomicRanges::GRangesList(CpG = annotations , Genes = annotations[annotations$type == paste0(ORG,"_genes")], compress = FALSE)
annoTrack$CpG$type <- ifelse(annoTrack$CpG$type == paste0(ORG,"_cpg_islands") , "islands",
ifelse(annoTrack$CpG$type == paste0(ORG,"_cpg_shores") , "shores",
ifelse(annoTrack$CpG$type == paste0(ORG,"_cpg_shelves"), "shelves" , "inter")))
annoTrack$Genes$symbol <- annoTrack$Genes$gene_id
head(annoTrack)
## GRangesList object of length 2:
## $CpG
## GRanges object with 84476 ranges and 5 metadata columns:
## seqnames ranges strand | id tx_id
## <Rle> <IRanges> <Rle> | <character> <character>
## [1] chr19 3946193-3946269 - | introns:1 <NA>
## [2] chr19 3946448-3946603 - | introns:2 <NA>
## [3] chr19 3946722-3946794 - | introns:3 <NA>
## [4] chr19 3946903-3947037 - | introns:4 <NA>
## [5] chr19 3947231-3947467 - | introns:5 <NA>
## ... ... ... ... . ... ...
## [84472] chr19 5250832-5256331 - | gene:263 ENSMUSG00002076406.1
## [84473] chr19 6189762-6195261 + | gene:264 ENSMUSG00002076495.1
## [84474] chr19 5848614-5854113 - | gene:265 ENSMUSG00002076626.1
## [84475] chr19 5894566-5900065 - | gene:266 ENSMUSG00002076763.1
## [84476] chr19 5893966-5899465 - | gene:267 ENSMUSG00002076852.1
## gene_id symbol type
## <character> <logical> <character>
## [1] <NA> <NA> inter
## [2] <NA> <NA> inter
## [3] <NA> <NA> inter
## [4] <NA> <NA> inter
## [5] <NA> <NA> inter
## ... ... ... ...
## [84472] ENSMUSG00002076406.1 <NA> inter
## [84473] ENSMUSG00002076495.1 <NA> inter
## [84474] ENSMUSG00002076626.1 <NA> inter
## [84475] ENSMUSG00002076763.1 <NA> inter
## [84476] ENSMUSG00002076852.1 <NA> inter
## -------
## seqinfo: 26 sequences from mm39 genome
##
## $Genes
## GRanges object with 267 ranges and 5 metadata columns:
## seqnames ranges strand | id tx_id
## <Rle> <IRanges> <Rle> | <character> <character>
## [1] chr19 3946050-3957133 - | gene:1 ENSMUSG00000001750.17
## [2] chr19 5738770-5752893 + | gene:2 ENSMUSG00000004054.10
## [3] chr19 4320208-4333165 - | gene:3 ENSMUSG00000005986.17
## [4] chr19 4850597-4861662 - | gene:4 ENSMUSG00000006456.11
## [5] chr19 4911244-4927937 - | gene:5 ENSMUSG00000006457.5
## ... ... ... ... . ... ...
## [263] chr19 5251226-5251331 - | gene:263 ENSMUSG00002076406.1
## [264] chr19 6194762-6194881 + | gene:264 ENSMUSG00002076495.1
## [265] chr19 5849028-5849113 - | gene:265 ENSMUSG00002076626.1
## [266] chr19 5894944-5895065 - | gene:266 ENSMUSG00002076763.1
## [267] chr19 5894314-5894465 - | gene:267 ENSMUSG00002076852.1
## gene_id symbol type
## <character> <character> <character>
## [1] ENSMUSG00000001750.17 ENSMUSG00000001750.17 mm39_genes
## [2] ENSMUSG00000004054.10 ENSMUSG00000004054.10 mm39_genes
## [3] ENSMUSG00000005986.17 ENSMUSG00000005986.17 mm39_genes
## [4] ENSMUSG00000006456.11 ENSMUSG00000006456.11 mm39_genes
## [5] ENSMUSG00000006457.5 ENSMUSG00000006457.5 mm39_genes
## ... ... ... ...
## [263] ENSMUSG00002076406.1 ENSMUSG00002076406.1 mm39_genes
## [264] ENSMUSG00002076495.1 ENSMUSG00002076495.1 mm39_genes
## [265] ENSMUSG00002076626.1 ENSMUSG00002076626.1 mm39_genes
## [266] ENSMUSG00002076763.1 ENSMUSG00002076763.1 mm39_genes
## [267] ENSMUSG00002076852.1 ENSMUSG00002076852.1 mm39_genes
## -------
## seqinfo: 26 sequences from mm39 genome
According to the bioconductor drmseq vignette, dmrseq-vignette the default smoothing parameters are designed to focus on local DMRs.
In some applications, it is of interest to effectively ‘zoom out’ in order to detect larger (lower-resolution) methylation blocks on the order of hundreds of thousands to millions of bases.
To do so, you can set the block argument to true, which will only include candidate regions with at least blockSize basepairs (default = 5000) and redefine the default smoothing parameters.
register(MulticoreParam(32))
#register(SerialParam)
if(DMR_TYPE == "regions"){
dmr_obj <- try(dmrseq(bs=bs,
cutoff = CUTOFF,
testCovariate="group",
smooth = FALSE,
bpSpan = 1000,
maxGap = MAX_GAP,
minInSpan = 10,
maxGapSmooth = 5000,
maxPerms = 20,
minNumRegion = MIN_CPG))
}
if(DMR_TYPE == "blocks"){
dmr_obj <- try(dmrseq(bs= bs,
cutoff = CUTOFF,
testCovariate= "group",
block = TRUE,
minInSpan = 500,
bpSpan = 5e4,
maxGapSmooth = 1e6,
maxGap = 5e3,))
}
if (typeof(dmr_obj) == "character"){
print("No candidate regions, try to modifie parameters")
notempty.dmr = FALSE
} else{
notempty.dmr = TRUE
head(dmr_obj)
}
## GRanges object with 6 ranges and 7 metadata columns:
## seqnames ranges strand | L area beta stat
## <Rle> <IRanges> <Rle> | <integer> <numeric> <numeric> <numeric>
## [1] chr19 3389173-3409922 * | 115 83.2923 -1.82725 -54.7911
## [2] chr19 3680928-3711553 * | 202 119.5877 -1.48613 -52.9744
## [3] chr19 5418104-5437913 * | 165 109.2771 -1.68585 -51.7837
## [4] chr19 3712589-3733760 * | 152 97.6623 -1.61981 -50.0170
## [5] chr19 3413154-3431722 * | 137 91.6677 -1.69839 -46.9851
## [6] chr19 4281973-4302515 * | 88 64.4769 -1.84916 -45.6531
## pval qval index
## <numeric> <numeric> <IRanges>
## [1] 0.00160256 0.00192308 771-885
## [2] 0.00160256 0.00192308 2169-2370
## [3] 0.00160256 0.00192308 12662-12826
## [4] 0.00160256 0.00192308 2371-2522
## [5] 0.00160256 0.00192308 891-1027
## [6] 0.00160256 0.00192308 5995-6082
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
if( (notempty.dmr == TRUE) && (length(dmr_obj) == 0)){
notempty.dmr = FALSE
}
Select DMRs with a q-value (FDR) < 0.05
if(notempty.dmr){
sigdmr <- dmr_obj[dmr_obj$qval < FDR,]
if(length(sigdmr) > 0){
nb_sig <- length(sigdmr)
nb_nosig <- length(dmr_obj)-nb_sig
df_sig <- data.frame(length(dmr_obj), nb_sig, nb_nosig)
percentage_sig <- c(100, round(((nb_sig/length(dmr_obj))*100),2), round(((nb_nosig/length(dmr_obj))*100),2))
colnames(df_sig) <- c("all", 'Significant', "Non-significant")
df_sig <- tidyr::gather(df_sig, key = "Status", value = "count")
sig_bar <- ggplot(df_sig, aes(x = Status, y = count, fill = Status)) +
geom_bar(position = "dodge" , stat = "identity", alpha=0.5, colour = "black") +
labs(title = paste0("Number of DMRs significants, FDR: ", FDR), y = "Number of DMRs") +
scale_color_brewer(palette="Dark2") +
geom_text(aes(label = paste0(percentage_sig, "%" )), position = position_stack(vjust = 0.5), size = 5)
theme_minimal()+theme_classic()
}else{
print("No significant found")
sigdmr <- data.frame()
sig_bar <- data.frame()
}
}else{
print("No candidate, try to modifie parameters")
sigdmr <- data.frame()
sig_bar <- data.frame()
}
## List of 93
## $ line :List of 6
## ..$ colour : chr "black"
## ..$ size : num 0.5
## ..$ linetype : num 1
## ..$ lineend : chr "butt"
## ..$ arrow : logi FALSE
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_line" "element"
## $ rect :List of 5
## ..$ fill : chr "white"
## ..$ colour : chr "black"
## ..$ size : num 0.5
## ..$ linetype : num 1
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_rect" "element"
## $ text :List of 11
## ..$ family : chr ""
## ..$ face : chr "plain"
## ..$ colour : chr "black"
## ..$ size : num 11
## ..$ hjust : num 0.5
## ..$ vjust : num 0.5
## ..$ angle : num 0
## ..$ lineheight : num 0.9
## ..$ margin : 'margin' num [1:4] 0points 0points 0points 0points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : logi FALSE
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ title : NULL
## $ aspect.ratio : NULL
## $ axis.title : NULL
## $ axis.title.x :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : NULL
## ..$ vjust : num 1
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : 'margin' num [1:4] 2.75points 0points 0points 0points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ axis.title.x.top :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : NULL
## ..$ vjust : num 0
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : 'margin' num [1:4] 0points 0points 2.75points 0points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ axis.title.x.bottom : NULL
## $ axis.title.y :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : NULL
## ..$ vjust : num 1
## ..$ angle : num 90
## ..$ lineheight : NULL
## ..$ margin : 'margin' num [1:4] 0points 2.75points 0points 0points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ axis.title.y.left : NULL
## $ axis.title.y.right :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : NULL
## ..$ vjust : num 0
## ..$ angle : num -90
## ..$ lineheight : NULL
## ..$ margin : 'margin' num [1:4] 0points 0points 0points 2.75points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ axis.text :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : chr "grey30"
## ..$ size : 'rel' num 0.8
## ..$ hjust : NULL
## ..$ vjust : NULL
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : NULL
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ axis.text.x :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : NULL
## ..$ vjust : num 1
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : 'margin' num [1:4] 2.2points 0points 0points 0points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ axis.text.x.top :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : NULL
## ..$ vjust : num 0
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : 'margin' num [1:4] 0points 0points 2.2points 0points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ axis.text.x.bottom : NULL
## $ axis.text.y :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : num 1
## ..$ vjust : NULL
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : 'margin' num [1:4] 0points 2.2points 0points 0points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ axis.text.y.left : NULL
## $ axis.text.y.right :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : num 0
## ..$ vjust : NULL
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : 'margin' num [1:4] 0points 0points 0points 2.2points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ axis.ticks :List of 6
## ..$ colour : chr "grey20"
## ..$ size : NULL
## ..$ linetype : NULL
## ..$ lineend : NULL
## ..$ arrow : logi FALSE
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_line" "element"
## $ axis.ticks.x : NULL
## $ axis.ticks.x.top : NULL
## $ axis.ticks.x.bottom : NULL
## $ axis.ticks.y : NULL
## $ axis.ticks.y.left : NULL
## $ axis.ticks.y.right : NULL
## $ axis.ticks.length : 'simpleUnit' num 2.75points
## ..- attr(*, "unit")= int 8
## $ axis.ticks.length.x : NULL
## $ axis.ticks.length.x.top : NULL
## $ axis.ticks.length.x.bottom: NULL
## $ axis.ticks.length.y : NULL
## $ axis.ticks.length.y.left : NULL
## $ axis.ticks.length.y.right : NULL
## $ axis.line :List of 6
## ..$ colour : chr "black"
## ..$ size : 'rel' num 1
## ..$ linetype : NULL
## ..$ lineend : NULL
## ..$ arrow : logi FALSE
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_line" "element"
## $ axis.line.x : NULL
## $ axis.line.x.top : NULL
## $ axis.line.x.bottom : NULL
## $ axis.line.y : NULL
## $ axis.line.y.left : NULL
## $ axis.line.y.right : NULL
## $ legend.background :List of 5
## ..$ fill : NULL
## ..$ colour : logi NA
## ..$ size : NULL
## ..$ linetype : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_rect" "element"
## $ legend.margin : 'margin' num [1:4] 5.5points 5.5points 5.5points 5.5points
## ..- attr(*, "unit")= int 8
## $ legend.spacing : 'simpleUnit' num 11points
## ..- attr(*, "unit")= int 8
## $ legend.spacing.x : NULL
## $ legend.spacing.y : NULL
## $ legend.key : list()
## ..- attr(*, "class")= chr [1:2] "element_blank" "element"
## $ legend.key.size : 'simpleUnit' num 1.2lines
## ..- attr(*, "unit")= int 3
## $ legend.key.height : NULL
## $ legend.key.width : NULL
## $ legend.text :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : 'rel' num 0.8
## ..$ hjust : NULL
## ..$ vjust : NULL
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : NULL
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ legend.text.align : NULL
## $ legend.title :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : num 0
## ..$ vjust : NULL
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : NULL
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ legend.title.align : NULL
## $ legend.position : chr "right"
## $ legend.direction : NULL
## $ legend.justification : chr "center"
## $ legend.box : NULL
## $ legend.box.just : NULL
## $ legend.box.margin : 'margin' num [1:4] 0cm 0cm 0cm 0cm
## ..- attr(*, "unit")= int 1
## $ legend.box.background : list()
## ..- attr(*, "class")= chr [1:2] "element_blank" "element"
## $ legend.box.spacing : 'simpleUnit' num 11points
## ..- attr(*, "unit")= int 8
## $ panel.background :List of 5
## ..$ fill : chr "white"
## ..$ colour : logi NA
## ..$ size : NULL
## ..$ linetype : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_rect" "element"
## $ panel.border : list()
## ..- attr(*, "class")= chr [1:2] "element_blank" "element"
## $ panel.spacing : 'simpleUnit' num 5.5points
## ..- attr(*, "unit")= int 8
## $ panel.spacing.x : NULL
## $ panel.spacing.y : NULL
## $ panel.grid :List of 6
## ..$ colour : chr "grey92"
## ..$ size : NULL
## ..$ linetype : NULL
## ..$ lineend : NULL
## ..$ arrow : logi FALSE
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_line" "element"
## $ panel.grid.major : list()
## ..- attr(*, "class")= chr [1:2] "element_blank" "element"
## $ panel.grid.minor : list()
## ..- attr(*, "class")= chr [1:2] "element_blank" "element"
## $ panel.grid.major.x : NULL
## $ panel.grid.major.y : NULL
## $ panel.grid.minor.x : NULL
## $ panel.grid.minor.y : NULL
## $ panel.ontop : logi FALSE
## $ plot.background :List of 5
## ..$ fill : NULL
## ..$ colour : chr "white"
## ..$ size : NULL
## ..$ linetype : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_rect" "element"
## $ plot.title :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : 'rel' num 1.2
## ..$ hjust : num 0
## ..$ vjust : num 1
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : 'margin' num [1:4] 0points 0points 5.5points 0points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ plot.title.position : chr "panel"
## $ plot.subtitle :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : num 0
## ..$ vjust : num 1
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : 'margin' num [1:4] 0points 0points 5.5points 0points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ plot.caption :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : 'rel' num 0.8
## ..$ hjust : num 1
## ..$ vjust : num 1
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : 'margin' num [1:4] 5.5points 0points 0points 0points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ plot.caption.position : chr "panel"
## $ plot.tag :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : 'rel' num 1.2
## ..$ hjust : num 0.5
## ..$ vjust : num 0.5
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : NULL
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ plot.tag.position : chr "topleft"
## $ plot.margin : 'margin' num [1:4] 5.5points 5.5points 5.5points 5.5points
## ..- attr(*, "unit")= int 8
## $ strip.background :List of 5
## ..$ fill : chr "white"
## ..$ colour : chr "black"
## ..$ size : 'rel' num 2
## ..$ linetype : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_rect" "element"
## $ strip.background.x : NULL
## $ strip.background.y : NULL
## $ strip.placement : chr "inside"
## $ strip.text :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : chr "grey10"
## ..$ size : 'rel' num 0.8
## ..$ hjust : NULL
## ..$ vjust : NULL
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : 'margin' num [1:4] 4.4points 4.4points 4.4points 4.4points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ strip.text.x : NULL
## $ strip.text.y :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : NULL
## ..$ vjust : NULL
## ..$ angle : num -90
## ..$ lineheight : NULL
## ..$ margin : NULL
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ strip.switch.pad.grid : 'simpleUnit' num 2.75points
## ..- attr(*, "unit")= int 8
## $ strip.switch.pad.wrap : 'simpleUnit' num 2.75points
## ..- attr(*, "unit")= int 8
## $ strip.text.y.left :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : NULL
## ..$ vjust : NULL
## ..$ angle : num 90
## ..$ lineheight : NULL
## ..$ margin : NULL
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## - attr(*, "class")= chr [1:2] "theme" "gg"
## - attr(*, "complete")= logi TRUE
## - attr(*, "validate")= logi TRUE
sig_bar
suppressWarnings(rm(df_sig))
suppressWarnings(rm(precentage_sig))
suppressWarnings(rm(nb_sig, nb_nosig))
if(notempty.dmr){
df_dmr <- as.data.frame(dmr_obj)
par(mfrow = c(1,2))
boxplot(df_dmr$L)
title("Number of CpGs by DMRs")
boxplot(df_dmr$width)
title("DMRs width")
}else{
print("No candidate, try to modifie parameters")
}
if(length(sigdmr) > 0){
df_dmr_sig <- as.data.frame(sigdmr)
par(mfrow = c(1,2))
boxplot(df_dmr_sig$L)
title("Number of CpGs by DMRs")
boxplot(df_dmr_sig$width)
title("DMRs width")
}else{
print("No significants candidate, try to modifie parameters")
}
if(notempty.dmr){
dmr_hyper <- sum(dmr_obj$stat > 0)
dmr_hypo <- sum(dmr_obj$stat < 0)
df_meth <- data.frame(dmr_hyper, dmr_hypo)
colnames(df_meth) <- c("hyper","hypo")
df_meth <- tidyr::gather(df_meth, key = "Methylation_status", value = "count")
perc_meth <- c(round(((dmr_hyper/length(dmr_obj))*100),2), round(((dmr_hypo/length(dmr_obj))*100),2))
plot_meth <- ggplot(df_meth, aes(x = Methylation_status, y = count, fill = Methylation_status)) +
geom_bar(position = "dodge" , stat = "identity", alpha=0.5, colour = "black") +
labs(title = "Number of hypo and hyper methylated DMRs ", y = "Number of DMRs") +
scale_color_brewer(palette="Dark2") +
geom_text(aes(label = paste0(perc_meth, "%" )), position = position_stack(vjust = 0.5), size = 5) +
theme_minimal() + theme_classic()
}else{
print("No candidate, try to modifie parameters")
plot_meth <- data.frame()
}
plot_meth
rm(df_meth)
rm(perc_meth)
rm(dmr_hyper, dmr_hypo)
if(length(sigdmr) > 0){
sigdmr_hyper <- sum(sigdmr$stat > 0)
sigdmr_hypo <- sum(sigdmr$stat < 0)
df_sigmeth <- data.frame(sigdmr_hyper, sigdmr_hypo)
colnames(df_sigmeth) <- c("hyper","hypo")
df_sigmeth <- tidyr::gather(df_sigmeth, key = "Methylation_status", value = "count")
perc_sigmeth <- c(round(((sigdmr_hyper/length(sigdmr))*100),2), round(((sigdmr_hypo/length(sigdmr))*100),2))
sig_meth <- ggplot(df_sigmeth, aes(x = Methylation_status, y = count, fill = Methylation_status)) +
geom_bar(position = "dodge" , stat = "identity", colour = "black") +
labs(title = "Number of hypo and hyper methylated significants DMRs ", y = "Number of DMRs") +
theme_bw(base_size = 14) +
scale_fill_manual("", values=c('Hyper' = hyper.color,
'Hypo' = hypo.color,
'None' = not.color)) +
geom_text(aes(label = paste0(perc_sigmeth, "%" )), position = position_stack(vjust = 0.5), size = 5)
}else{
sig_meth <- data.frame()
print("No significant candidate, try to modifie parameters")
}
sig_meth
suppressWarnings(rm(df_sigmeth))
suppressWarnings(rm(perc_sigmeth))
suppressWarnings(rm(sigdmr_hyper, sigdmr_hypo))
for each chromosome plot the number of DMRs hypo and hyper methylated
if(notempty.dmr){
df_dmr <- as.data.frame(dmr_obj)
df_dmr$seqnames <- as.vector(df_dmr$seqnames)
df_dmr$status <- ifelse(df_dmr$stat > 0, "hyper", "hypo")
df <- data.frame(
chr = df_dmr$seqnames,
status = df_dmr$status)
count_per_chromosome <- table(df$chr, df$status)
count_df <- as.data.frame.matrix(count_per_chromosome)
count_df$chr <- rownames(count_df)
# test si absence de hypo ou hyper
if(dim(count_df)[2] == 2){
if(colnames(count_df[1]) == "hypo"){
count_df$hyper <- rep(0, dim(count_df)[1])
}else{
count_df$hypo <- rep(0, dim(count_df)[1])
}
}
chr_others <- count_df[1,]
rownames(chr_others) <- "others"
chr_others$hypo <- chr_others$hyper <- 0
chr_others$chr <- "others"
delete_row <- 0
for(i in 1:length(count_df$chr)){
if((nchar(count_df$chr[i]) > 5) == TRUE){
chr_others$hypo <- (chr_others$hypo + count_df[i,]$hypo)
chr_others$hyper <- (chr_others$hyper + count_df[i,]$hyper)
delete_row <- c(delete_row, i)
}
}
if(is.na(delete_row[2]) == FALSE){ # test si il n'y a aucune colonne à remove (delete_row = 0), car sinon génère des bugs
count_df <- count_df[-delete_row,]
}
count_df<- rbind(count_df, chr_others)
count_df$chr <- str_sort(count_df$chr, numeric = TRUE)
count_df_long <- tidyr::gather(count_df, key = "status", value = "count", hypo, hyper)
hist_dmr_chr <- ggplot(count_df_long, aes(x = chr, y = count, fill = status)) +
geom_bar(position = "dodge" , stat = "identity", colour = "black") +
labs(title = "Number of DMR hypo and hyper methylated by chromosome",
x = "Chromosome",
y = "DMRs") +
theme_bw(base_size = 14) +
scale_fill_manual("", values=c('Hyper' = hyper.color,
'Hypo' = hypo.color,
'None' = not.color)) +
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1))
rm(df_dmr, df)
rm(delete_row)
rm(count_df, count_df_long)
rm(chr_others)
}else{
hist_dmr_chr <- data.frame()
print("No candidate, try to modifie parameters")
}
hist_dmr_chr
By default, DMRs are sorted by their q-value
if(length(sigdmr) > 0){
if(length(sigdmr) >= 10){
plotDMRs(bs, regions=sigdmr[1:10,], testCovariate="group", extend = 2000, annoTrack = annoTrack, horizLegend = TRUE)
}else{
plotDMRs(bs, regions=sigdmr[1:length(dmr_obj),], testCovariate="group", extend = 2000, annoTrack = annoTrack, horizLegend = TRUE)
}
}else{
print("Impossible to plot top ten significant DMRs, no candidate, try to modifie parameters")
}
Annotates significant DMRSs that intersect with genes. Use the Annotatr package. Warning: if several genes or promoters are associated with the same DMRS, create several lines in the CSV file, with the DMR annotated with the different genes. The CSV file is likely to be larger, so DMRs that are not associated with genes are not displayed.
if(length(sigdmr) > 0){
sigdmr_annot <- annotate_regions(regions = sigdmr, annotations = annotations, ignore.strand = TRUE, quiet = FALSE)
sigdmr_annot_genes <- sigdmr_annot[sigdmr_annot$annot$type == paste0(ORG, "_genes") | sigdmr_annot$annot$type == paste0(ORG, "_promoters") , ]
head(sigdmr_annot_genes)
}else{
print("No candidate, try to modifie parameters")
sigdmr_annot <- NULL
sigdmr_annot_genes <- NULL
}
## GRanges object with 6 ranges and 8 metadata columns:
## seqnames ranges strand | L area beta stat
## <Rle> <IRanges> <Rle> | <integer> <numeric> <numeric> <numeric>
## [1] chr19 3389173-3409922 * | 115 83.2923 -1.82725 -54.7911
## [2] chr19 3680928-3711553 * | 202 119.5877 -1.48613 -52.9744
## [3] chr19 5418104-5437913 * | 165 109.2771 -1.68585 -51.7837
## [4] chr19 5418104-5437913 * | 165 109.2771 -1.68585 -51.7837
## [5] chr19 5418104-5437913 * | 165 109.2771 -1.68585 -51.7837
## [6] chr19 5418104-5437913 * | 165 109.2771 -1.68585 -51.7837
## pval qval index annot
## <numeric> <numeric> <IRanges> <GRanges>
## [1] 0.00160256 0.00192308 771-885 chr19:3372334-3435733:+
## [2] 0.00160256 0.00192308 2169-2370 chr19:3634828-3736564:-
## [3] 0.00160256 0.00192308 12662-12826 chr19:5414769-5418769:*
## [4] 0.00160256 0.00192308 12662-12826 chr19:5425551-5429551:*
## [5] 0.00160256 0.00192308 12662-12826 chr19:5436364-5440364:*
## [6] 0.00160256 0.00192308 12662-12826 chr19:5416769-5421554:+
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
if(length(sigdmr) > 0){
list_annot_type <- NULL
list_nb_dmr_hypo <- NULL
list_nb_dmr_hyper <- NULL
for(type in names(table(sigdmr_annot$annot$type))){
dmr_type <- sigdmr_annot[sigdmr_annot$annot$type == type, ]
nb_dmr_hypo <- length(table(dmr_type[dmr_type$stat < 0,]))
nb_dmr_hyper <- length(table(dmr_type[dmr_type$stat > 0,]))
list_annot_type <- c(list_annot_type, type)
list_nb_dmr_hypo <- c(list_nb_dmr_hypo, nb_dmr_hypo)
list_nb_dmr_hyper <- c(list_nb_dmr_hyper, nb_dmr_hyper)
}
df_dmr_annot <- data.frame(list_annot_type, list_nb_dmr_hyper, list_nb_dmr_hypo)
colnames(df_dmr_annot) <- c("Annot_type", "Hyper", "Hypo")
df_dmr_annot_long <- pivot_longer(df_dmr_annot, cols = c(Hyper, Hypo), names_to = "Meth_status", values_to = "Nb_DMRs")
hist_dmr_annot <- ggplot(df_dmr_annot_long, aes(x = Annot_type, y = Nb_DMRs, fill = Meth_status)) +
geom_bar(position = "dodge" , stat = "identity", colour = "black") +
labs(title = "Number of DMR hypo et hyper methylated by annotations types",
x = "Annotations type",
y = "DMRs") +
guides(fill=guide_legend(title="Methylation Status")) +
theme_bw(base_size = 14) +
scale_fill_manual("", values=c('Hyper' = hyper.color,
'Hypo' = hypo.color,
'None' = not.color)) +
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1))
}else{
print("No candidate, try to modifie parameters")
hist_dmr_annot <- data.frame()
}
hist_dmr_annot
suppressWarnings(rm(list_annot_type))
suppressWarnings(rm(list_nb_dmr_hypo))
suppressWarnings(rm(list_nb_dmr_hyper))
suppressWarnings(rm(df_dmr_annot, df_dmr_annot_long))
Generate 3 CSV files, one for all DMRs, one for significant DMRs and one for significant DMRS with genes annotations
if(notempty.dmr){
write.csv(as.data.frame(dmr_obj), file= paste0(results_path, "DMR/", CONDI, "_DMRs.csv"))
}else{
print("Impossible to export results, no candidate DMR, try to modifie parameters")
}
if(length(sigdmr) > 0){
write.csv(as.data.frame(sigdmr), file= paste0(results_path, "DMR/", CONDI, "_significant_DMRs.csv"))
}else{
print("Impossible to export results, no significant DMR, try to modifie parameters")
}
if(length(sigdmr_annot_genes) > 0){
write.csv(as.data.frame(sigdmr_annot_genes), file= paste0(results_path, "DMR/", CONDI, "_significant_annot_DMRs.csv"))
}else{
print("Impossible to export results, no significant DMR with genes annotations, try to modifie parameters")
}
Saves the whole R session of the script (variables, objects …)
at the end of its execution. The RData file produced can then be opened in a local environment (if you have enough space)
to make modifications. For example to retouch figures.
save.image(file = rdata_out)
# R Session
sessionInfo()
## R version 4.1.3 (2022-03-10)
## Platform: x86_64-conda-linux-gnu (64-bit)
## Running under: Debian GNU/Linux 11 (bullseye)
##
## Matrix products: default
## BLAS/LAPACK: /opt/conda/envs/wgbsflow/lib/libopenblasp-r0.3.21.so
##
## locale:
## [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
## [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
## [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
## [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] methylKit_1.20.0 stringr_1.5.0
## [3] plyranges_1.14.0 dmrseq_1.14.0
## [5] yaml_2.3.5 bsseq_1.30.0
## [7] SummarizedExperiment_1.24.0 MatrixGenerics_1.6.0
## [9] matrixStats_0.63.0 factoextra_1.0.7
## [11] FactoMineR_2.6 annotatr_1.20.0
## [13] GenomicFeatures_1.46.1 AnnotationDbi_1.56.2
## [15] Biobase_2.54.0 GenomicRanges_1.46.1
## [17] GenomeInfoDb_1.30.1 IRanges_2.28.0
## [19] S4Vectors_0.32.4 BiocGenerics_0.40.0
## [21] tidyr_1.3.0 dplyr_1.0.10
## [23] ggplot2_3.3.6 BiocParallel_1.28.3
##
## loaded via a namespace (and not attached):
## [1] AnnotationHub_3.2.0 BiocFileCache_2.2.0
## [3] plyr_1.8.8 splines_4.1.3
## [5] digest_0.6.31 foreach_1.5.2
## [7] htmltools_0.5.4 fansi_1.0.4
## [9] magrittr_2.0.3 memoise_2.0.1
## [11] BSgenome_1.62.0 cluster_2.1.4
## [13] tzdb_0.3.0 limma_3.50.3
## [15] Biostrings_2.62.0 readr_2.1.2
## [17] R.utils_2.12.2 bdsmatrix_1.3-6
## [19] prettyunits_1.1.1 colorspace_2.1-0
## [21] blob_1.2.3 rappdirs_0.3.3
## [23] ggrepel_0.9.3 xfun_0.37
## [25] crayon_1.5.2 RCurl_1.98-1.10
## [27] jsonlite_1.8.4 iterators_1.0.14
## [29] glue_1.6.2 gtable_0.3.1
## [31] zlibbioc_1.40.0 emmeans_1.8.4-1
## [33] XVector_0.34.0 DelayedArray_0.20.0
## [35] Rhdf5lib_1.16.0 HDF5Array_1.22.1
## [37] scales_1.2.1 mvtnorm_1.1-3
## [39] rngtools_1.5.2 DBI_1.1.3
## [41] Rcpp_1.0.10 emdbook_1.3.12
## [43] xtable_1.8-4 progress_1.2.2
## [45] bumphunter_1.36.0 mclust_6.0.0
## [47] flashClust_1.01-2 bit_4.0.5
## [49] DT_0.27 htmlwidgets_1.6.1
## [51] httr_1.4.4 RColorBrewer_1.1-3
## [53] ellipsis_0.3.2 farver_2.1.1
## [55] R.methodsS3_1.8.2 pkgconfig_2.0.3
## [57] XML_3.99-0.11 multcompView_0.1-8
## [59] sass_0.4.5 dbplyr_2.3.0
## [61] locfit_1.5-9.7 utf8_1.2.3
## [63] labeling_0.4.2 tidyselect_1.2.0
## [65] rlang_1.0.6 reshape2_1.4.4
## [67] later_1.3.0 munsell_0.5.0
## [69] BiocVersion_3.14.0 tools_4.1.3
## [71] cachem_1.0.6 cli_3.6.0
## [73] generics_0.1.3 RSQLite_2.2.20
## [75] fastseg_1.40.0 evaluate_0.20
## [77] fastmap_1.1.0 outliers_0.15
## [79] knitr_1.42 bit64_4.0.5
## [81] purrr_1.0.1 KEGGREST_1.34.0
## [83] doRNG_1.8.6 nlme_3.1-162
## [85] sparseMatrixStats_1.6.0 mime_0.12
## [87] R.oo_1.25.0 leaps_3.1
## [89] xml2_1.3.3 biomaRt_2.50.0
## [91] compiler_4.1.3 filelock_1.0.2
## [93] curl_4.3.3 png_0.1-8
## [95] interactiveDisplayBase_1.32.0 tibble_3.1.8
## [97] bslib_0.4.2 stringi_1.7.12
## [99] highr_0.10 lattice_0.20-45
## [101] Matrix_1.5-3 permute_0.9-7
## [103] vctrs_0.5.2 pillar_1.8.1
## [105] lifecycle_1.0.3 rhdf5filters_1.6.0
## [107] BiocManager_1.30.19 jquerylib_0.1.4
## [109] estimability_1.4.1 data.table_1.14.2
## [111] bitops_1.0-7 qvalue_2.26.0
## [113] httpuv_1.6.8 rtracklayer_1.54.0
## [115] R6_2.5.1 BiocIO_1.4.0
## [117] promises_1.2.0.1 codetools_0.2-19
## [119] MASS_7.3-58.2 gtools_3.9.4
## [121] assertthat_0.2.1 rhdf5_2.38.1
## [123] rjson_0.2.21 withr_2.5.0
## [125] regioneR_1.26.0 GenomicAlignments_1.30.0
## [127] Rsamtools_2.10.0 GenomeInfoDbData_1.2.7
## [129] parallel_4.1.3 hms_1.1.2
## [131] grid_4.1.3 coda_0.19-4
## [133] rmarkdown_2.20 DelayedMatrixStats_1.16.0
## [135] bbmle_1.0.25 numDeriv_2016.8-1.1
## [137] scatterplot3d_0.3-42 shiny_1.7.4
## [139] restfulr_0.0.15