湖州网站建设服务公司,vs2017手机网站开发,广州中小学安全教育平台,潘家园做网站的公司文章目录 前言cfDNAProdemo1.片段长度可视化2.片段长度分布比较3.可视化DNA片段模态长度4.片段化振荡模式比较5. ggplot2美化 前言
cfDNA#xff08;无细胞DNA#xff0c;游离DNA#xff0c;Circulating free DNA or Cell free DNA#xff09;是指在血液循环中存在的DNA片… 文章目录 前言cfDNAProdemo1.片段长度可视化2.片段长度分布比较3.可视化DNA片段模态长度4.片段化振荡模式比较5. ggplot2美化 前言
cfDNA无细胞DNA游离DNACirculating free DNA or Cell free DNA是指在血液循环中存在的DNA片段。这些DNA片段不属于任何细胞因此被称为“无细胞”或“游离”的。cfDNA来源广泛可以来自正常细胞和病变细胞如肿瘤细胞的死亡和分解过程。cfDNA的长度通常在160-180碱基对左右这与核小体保护的DNA片段长度相符。
cfDNA的研究对于非侵入性诊断、疾病监测、早期检测以及了解生理和病理状态具有重要意义。特别是在肿瘤学领域通过分析循环肿瘤DNActDNA即来源于肿瘤细胞的cfDNA可以获取肿瘤的遗传信息从而指导癌症的诊断、治疗选择和治疗效果监测。
cfDNAPro 主要功能
数据表征 计算片段大小分布的整体、中位数和众数以及片段大小轮廓中的峰和谷还有振荡周期性。数据可视化 提供了多种函数来可视化这些数据包括整体到单个片段的可视化、度量可视化、模式和摘要可视化等。
demo
1.片段长度可视化 上图横轴表示片段长度范围为30bp至500bp。纵轴表示具有特定读取长度的读取比例。这里的线并不是平滑曲线而是连接不同数据点的直线。 下图首先统计长度小于或等于30bp的读取数量例如N然后将其归一化为比例。重复这一过程直至处理完所有片段长度即30bp, 31bp, …, 500bp然后以线图的形式呈现。与非累积图一样这里的线也是连接各个数据点而不是平滑曲线。
library(scales)
library(ggpubr)
library(ggplot2)
library(dplyr)# Define a list for the groups/cohorts.
grp_list-list(cohort_1cohort_1,cohort_2cohort_2,cohort_3cohort_3,cohort_4cohort_4)# Generating the plots and store them in a list.
result-sapply(grp_list, function(x){result -callSize(path data_path) %% dplyr::filter(groupas.character(x)) %% plotSingleGroup()
}, simplify FALSE)
# setting default outfmt to df.
# setting default input_type to picard.
# setting default outfmt to df.
# setting default input_type to picard.
# setting default outfmt to df.
# setting default input_type to picard.
# setting default outfmt to df.
# setting default input_type to picard.# Multiplexing the plots in one figure
suppressWarnings(multiplex -ggarrange(result$cohort_1$prop_plot theme(axis.title.x element_blank()),result$cohort_4$prop_plot theme(axis.title element_blank()),result$cohort_1$cdf_plot,result$cohort_4$cdf_plot theme(axis.title.y element_blank()),labels c(Cohort 1 (n5), Cohort 4 (n4)),label.x 0.2,ncol 2,nrow 2))multiplex2.片段长度分布比较
callMetrics计算了每个组的中位片段大小分布上图每个队列中位数片段大小分布的比例。y轴显示读取比例x轴显示片段大小。图中显示的线不是平滑的曲线而是连接不同数据点的线下图中位数累积分布函数(CDF)的图形。y轴显示累积比例x轴仍然显示片段大小。这是一个逐步上升的图形反映了不同片段大小下读取的累积分布情况。
# Set an order for those groups (i.e. the levels of factors).
order - c(cohort_1, cohort_2, cohort_3, cohort_4)
# Generate plots.
compare_grps-callMetrics(data_path) %% plotMetrics(orderorder)
# setting default input_type to picard.# Modify plots.
p1-compare_grps$median_prop_plot ylim(c(0, 0.028)) theme(axis.title.x element_blank(),axis.title.y element_text(size12,facebold)) theme(legend.position c(0.7, 0.5),legend.text element_text( size 11),legend.title element_blank())p2-compare_grps$median_cdf_plot scale_y_continuous(labels scales::number_format(accuracy 0.001)) theme(axis.titleelement_text(size12,facebold)) theme(legend.position c(0.7, 0.5),legend.text element_text( size 11),legend.title element_blank())# Finalize plots.
suppressWarnings(median_grps-ggpubr::ggarrange(p1,p2,label.x 0.3,ncol 1,nrow 2))median_grps3.可视化DNA片段模态长度
柱状图这里的模态片段大小是指在样本中出现次数最多的DNA片段长度
# Set an order for your groups, it will affect the group order along x axis!
order - c(cohort_1, cohort_2, cohort_3, cohort_4)# Generate mode bin chart.
mode_bin - callMode(data_path) %% plotMode(orderorder,hline c(167,111,81))
# setting default mincount as 0.
# setting default input_type to picard.# Show the plot.
suppressWarnings(print(mode_bin))堆叠柱状图可以看到每个组中不同长度片段的分布
# Set an order for your groups, it will affect the group order along x axis.
order - c(cohort_1, cohort_2, cohort_3, cohort_4)# Generate mode stacked bar chart. You could specify how to stratify the modes
# using mode_partition arguments. If other modes exist other than you
# specified, an other group will be added to the plot.mode_stacked - callMode(data_path) %% plotModeSummary(orderorder,mode_partition list(c(166,167)))
# setting default input_type to picard.# Modify the plot using ggplot syntax.
mode_stacked - mode_stacked theme(legend.position top)# Show the plot.
suppressWarnings(print(mode_stacked))4.片段化振荡模式比较
间峰距离通过测量和比较间距距离峰值之间的距离比较不同队列中的10bp周期性振荡模式
# Set an order for your groups, it will affect the group order.
order - c(cohort_1, cohort_2, cohort_4, cohort_3)# Plot and modify inter-peak distances.inter_peak_dist-callPeakDistance(path data_path, limit c(50, 135)) %%plotPeakDistance(order order) labs(yFraction) theme(axis.title element_text(size12,facebold),legend.title element_blank(),legend.position c(0.91, 0.5),legend.text element_text(size 11))
# setting the mincount to 0.
# setting the xlim to c(7,13).
# setting default outfmt to df.
# Setting default mincount to 0.
# setting default input_type to picard.# Show the plot.
suppressWarnings(print(inter_peak_dist))间谷距离与之前介绍的间峰距离可视化相比间谷距离的可视化重点在于表示读取次数下降的区域而不是上升的区域。这两个图表的区别在于它们关注的是碎片大小谱的不同特点一个是峰点即频率的局部最高点另一个是谷点即频率的局部最低点。
# Set an order for your groups, it will affect the group order.
order - c(cohort_1, cohort_2, cohort_4, cohort_3)
# Plot and modify inter-peak distances.
inter_valley_dist-callValleyDistance(path data_path, limit c(50, 135)) %%plotValleyDistance(order order) labs(yFraction) theme(axis.title element_text(size12,facebold),legend.title element_blank(),legend.position c(0.91, 0.5),legend.text element_text(size 11))
# setting the mincount to 0.
# setting the xlim to c(7,13).
# setting default outfmt to df.
# setting the mincount to 0.
# setting default input_type to picard.# Show the plot.
suppressWarnings(print(inter_valley_dist))5. ggplot2美化
library(ggplot2)
library(cfDNAPro)
# Set the path to the example sample.
exam_path - examplePath(step6)
# Calculate peaks and valleys.
peaks - callPeakDistance(path exam_path)
# setting default limit to c(35,135).
# setting default outfmt to df.
# Setting default mincount to 0.
# setting default input_type to picard.
valleys - callValleyDistance(path exam_path)
# setting default limit to c(35,135).
# setting default outfmt to df.
# setting the mincount to 0.
# setting default input_type to picard.
# A line plot showing the fragmentation pattern of the example sample.
exam_plot_all - callSize(pathexam_path) %% plotSingleGroup(vline NULL)
# setting default outfmt to df.
# setting default input_type to picard.
# Label peaks and valleys with dashed and solid lines.
exam_plot_prop - exam_plot_all$prop coord_cartesian(xlim c(90,135),ylim c(0,0.0065)) geom_vline(xinterceptpeaks$insert_size, colourred,linetypedashed) geom_vline(xintercept valleys$insert_size,colourblue)# Show the plot.
suppressWarnings(print(exam_plot_prop))# Label peaks and valleys with dots.
exam_plot_prop_dot- exam_plot_all$prop coord_cartesian(xlim c(90,135),ylim c(0,0.0065)) geom_point(data peaks, mapping aes(x insert_size, y prop),colorblue,alpha0.5,size3) geom_point(data valleys, mapping aes(x insert_size, y prop),colorred,alpha0.5,size3)
# Show the plot.
suppressWarnings(print(exam_plot_prop_dot))想做cfDNA迈出分析的第一步数据表征。