域名和网站建设实训报告,企业app商城开发网站建设,wordpress设置静态页,直播网站建设观察数据 这是经公司使用fastp质控后的数据#xff0c;我们先挑选部分数据进行比对#xff0c;观察序列结构 为了准确性#xff0c;我们再次挑选另一批数据进行比对 可以看到#xff0c;所有序列都存在一个“GTGTCAAATACTTATTTTCCCCGCTGTA”的前导序列#xff0c;这可能…观察数据 这是经公司使用fastp质控后的数据我们先挑选部分数据进行比对观察序列结构 为了准确性我们再次挑选另一批数据进行比对 可以看到所有序列都存在一个“GTGTCAAATACTTATTTTCCCCGCTGTA”的前导序列这可能是接头序列之类的我们使用cutadapt工具将其去除。
nohup cutadapt -g GTGTCAAATACTTATTTTCCCCGCTGTA -o ps_fastpTrimmed.cutadapt.fq PS_fastpTrimmed.fastq 运行日志 运行结果 所有数据的该序列都被去除我们再来比对一下看看是否存在还需要修剪的序列。
比对
bwa mem -t 32 ../001.blastFORcgspy/human.ref.bwaindex/GCF_000001405.40_GRCh38.p14_genomic.fna ps_fastpTrimmed.cutadapt.fq ps_fastpTrimmed.cutadapt.sam 运行日志
[M::bwa_idx_load_from_disk] read 0 ALT contigs ✔ ⚙ 911 13:15:02
[M::process] read 5632178 sequences (320000099 bp)...
[M::process] read 5632870 sequences (320000113 bp)...
[M::mem_process_seqs] Processed 5632178 reads in 574.971 CPU sec, 19.185 real sec
[M::process] read 5630606 sequences (320000003 bp)...
[M::mem_process_seqs] Processed 5632870 reads in 552.172 CPU sec, 17.734 real sec
[M::process] read 5632256 sequences (320000078 bp)...
[M::mem_process_seqs] Processed 5630606 reads in 650.468 CPU sec, 20.646 real sec
[M::process] read 5631510 sequences (320000045 bp)...
[M::mem_process_seqs] Processed 5632256 reads in 641.476 CPU sec, 20.295 real sec
[M::process] read 5631304 sequences (320000059 bp)...
[M::mem_process_seqs] Processed 5631510 reads in 645.206 CPU sec, 20.259 real sec
[M::process] read 5631206 sequences (320000009 bp)...
[M::mem_process_seqs] Processed 5631304 reads in 591.624 CPU sec, 18.888 real sec
[M::process] read 3075445 sequences (174733356 bp)...
[M::mem_process_seqs] Processed 5631206 reads in 594.674 CPU sec, 18.734 real sec
[M::mem_process_seqs] Processed 3075445 reads in 384.189 CPU sec, 12.525 real sec
[main] Version: 0.7.17-r1188
[main] CMD: bwa mem -t 32 ../001.blastFORcgspy/human.ref.bwaindex/GCF_000001405.40_GRCh38.p14_genomic.fna ps_fastpTrimmed.cutadapt.fq
[main] Real time: 168.989 sec; CPU: 4645.033 sec
去除表头
awk !/SQ/ ps_fastpTrimmed.cutadapt.sam ps_fastpTrimmed.cutadapt.1.sam查看比对结果 提取文件的前六列
awk {print $1, $2, $3, $4, $5, $6} ps_fastpTrimmed.cutadapt.1.sam ps_fastpTrimmed.cutadapt.2.sam 去除没有匹配上的数据
awk $4 ! 0 ps_fastpTrimmed.cutadapt.2.sam ps_fastpTrimmed.cutadapt.3.sam 提取文件的2,3,4列由于文件太大不方便excel统计尝试一下
awk {print $2, $3, $4} ps_fastpTrimmed.cutadapt.3.sam ps_fastpTrimmed.cutadapt.4.sam 不行还是太大了文件超过了三千万行远超过了excel的处理能力寻找其他方法进行统计。 使用samtools统计每个位点覆盖到的reads数量。 首先使用samtools将未经处理的比对结果转换为bam文件
samtools view -bS input.sam output.bam使用samtools软件对bam文件进行索引 注意在使用samtools对bam文件进行索引之前必须对bam文件进行排序否则会报错
samtools sort ps_fastpTrimmed.cutadapt.bam -o ps_fastpTrimmed.cutadapt.sorted.bam samtools index ps_fastpTrimmed.cutadapt.sorted.bam 然后使用samtools depth命令统计每个位点的覆盖 reads 数
samtools depth alignment.bam coverage.txt