You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We tried to use fastp to do de-duplication. However, we found 2 issues. Looking forward to your reply.
one round of de-duplication is ineffective.
we ran level 1 de-duplication and got "Duplication rate: 0.498141%". When we ran level 6 de-duplication on the input, we got "Duplication rate: 0.312492%". However, if we ran second round of de-duplication based on the output of first run. The Duplication rate can almost reach < 0.1%, see as below.
But
2) accuracy level issue:
we run level 1 de-duplication first and then using the output to run de-duplication at different accuracy levels.
As you can see, level 1 + level 1 -> 0.00744113%, level 1 + level 3 -> 0.088817% , level 1 + level 6 -> 0.0237203%, which doesn't make sense.
Read1 before filtering:
total reads: 15180846
total bases: 2277126900
Q20 bases: 2199749620(96.602%)
Q30 bases: 2075324182(91.1378%)
Read2 before filtering:
total reads: 15180846
total bases: 2277126900
Q20 bases: 2209710343(97.0394%)
Q30 bases: 2098006573(92.1339%)
Read1 after filtering:
total reads: 15105224
total bases: 2264474181
Q20 bases: 2187424528(96.5975%)
Q30 bases: 2063578514(91.1284%)
Read2 after filtering:
total reads: 15105224
total bases: 2264474181
Q20 bases: 2197319205(97.0344%)
Q30 bases: 2086050677(92.1208%)
Filtering result:
reads passed filter: 30361692
reads failed due to low quality: 0
reads failed due to too many N: 0
reads failed due to too short: 0
reads with adapter trimmed: 623982
bases trimmed due to adapters: 2636132
Duplication rate: 0.498141%
Insert size peak (evaluated by paired-end reads): 226
JSON report: fastp.json
HTML report: fastp.html
/projects/f_lz332_1/software/fastp -i /projects/f_lz332_1/DataBase/MetaGenomeData/Li_FrontMicro_2021_COVID/0.rawdata/ERR5445742_1.fastq.gz -I /projects/f_lz332_1/DataBase/MetaGenomeData/Li_FrontMicro_2021_COVID/0.rawdata/ERR5445742_2.fastq.gz -o ERR5445742_l1R1.fastq.gz -O ERR5445742_l1R2.fastq.gz --dedup --dup_calc_accuracy 1 --thread 16
fastp v0.23.4, time used: 80 seconds
Read1 before filtering:
total reads: 15105224
total bases: 2264474181
Q20 bases: 2187424528(96.5975%)
Q30 bases: 2063578514(91.1284%)
Read2 before filtering:
total reads: 15105224
total bases: 2264474181
Q20 bases: 2197319205(97.0344%)
Q30 bases: 2086050677(92.1208%)
Read1 after filtering:
total reads: 15104100
total bases: 2264308814
Q20 bases: 2187263387(96.5974%)
Q30 bases: 2063424187(91.1282%)
Read2 after filtering:
total reads: 15104100
total bases: 2264308814
Q20 bases: 2197157837(97.0344%)
Q30 bases: 2085895844(92.1206%)
Filtering result:
reads passed filter: 30210448
reads failed due to low quality: 0
reads failed due to too many N: 0
reads failed due to too short: 0
reads with adapter trimmed: 0
bases trimmed due to adapters: 0
Duplication rate: 0.00744113%
Insert size peak (evaluated by paired-end reads): 214
JSON report: fastp.json
HTML report: fastp.html
/projects/f_lz332_1/software/fastp -i ERR5445742_l1R1.fastq.gz -I ERR5445742_l1R2.fastq.gz -o ERR5445742_l1l1R1.fastq.gz -O ERR5445742_l1l1R2.fastq.gz --dedup --dup_calc_accuracy 1 --thread 16
fastp v0.23.4, time used: 79 seconds
Read1 before filtering:
total reads: 15105224
total bases: 2264474181
Q20 bases: 2187424528(96.5975%)
Q30 bases: 2063578514(91.1284%)
Read2 before filtering:
total reads: 15105224
total bases: 2264474181
Q20 bases: 2197319205(97.0344%)
Q30 bases: 2086050677(92.1208%)
Read1 after filtering:
total reads: 15091808
total bases: 2262463985
Q20 bases: 2185485043(96.5976%)
Q30 bases: 2061749882(91.1285%)
Read2 after filtering:
total reads: 15091808
total bases: 2262463985
Q20 bases: 2195369494(97.0345%)
Q30 bases: 2084200083(92.1208%)
Filtering result:
reads passed filter: 30210448
reads failed due to low quality: 0
reads failed due to too many N: 0
reads failed due to too short: 0
reads with adapter trimmed: 0
bases trimmed due to adapters: 0
Duplication rate: 0.088817%
Insert size peak (evaluated by paired-end reads): 214
JSON report: fastp.json
HTML report: fastp.html
/projects/f_lz332_1/software/fastp -i ERR5445742_l1R1.fastq.gz -I ERR5445742_l1R2.fastq.gz -o ERR5445742_l1l3R1.fastq.gz -O ERR5445742_l1l3R2.fastq.gz --dedup --dup_calc_accuracy 3 --thread 16
fastp v0.23.4, time used: 80 seconds
Read1 before filtering:
total reads: 15105224
total bases: 2264474181
Q20 bases: 2187424528(96.5975%)
Q30 bases: 2063578514(91.1284%)
Read2 before filtering:
total reads: 15105224
total bases: 2264474181
Q20 bases: 2197319205(97.0344%)
Q30 bases: 2086050677(92.1208%)
Read1 after filtering:
total reads: 15101641
total bases: 2263938008
Q20 bases: 2186907014(96.5975%)
Q30 bases: 2063090824(91.1284%)
Read2 after filtering:
total reads: 15101641
total bases: 2263938008
Q20 bases: 2196799311(97.0344%)
Q30 bases: 2085557436(92.1208%)
Filtering result:
reads passed filter: 30210448
reads failed due to low quality: 0
reads failed due to too many N: 0
reads failed due to too short: 0
reads with adapter trimmed: 0
bases trimmed due to adapters: 0
Duplication rate: 0.0237203%
Insert size peak (evaluated by paired-end reads): 214
Hi, there
We tried to use fastp to do de-duplication. However, we found 2 issues. Looking forward to your reply.
we ran level 1 de-duplication and got "Duplication rate: 0.498141%". When we ran level 6 de-duplication on the input, we got "Duplication rate: 0.312492%". However, if we ran second round of de-duplication based on the output of first run. The Duplication rate can almost reach < 0.1%, see as below.
But
2) accuracy level issue:
we run level 1 de-duplication first and then using the output to run de-duplication at different accuracy levels.
As you can see, level 1 + level 1 -> 0.00744113%, level 1 + level 3 -> 0.088817% , level 1 + level 6 -> 0.0237203%, which doesn't make sense.
Read1 before filtering:
total reads: 15180846
total bases: 2277126900
Q20 bases: 2199749620(96.602%)
Q30 bases: 2075324182(91.1378%)
Read2 before filtering:
total reads: 15180846
total bases: 2277126900
Q20 bases: 2209710343(97.0394%)
Q30 bases: 2098006573(92.1339%)
Read1 after filtering:
total reads: 15105224
total bases: 2264474181
Q20 bases: 2187424528(96.5975%)
Q30 bases: 2063578514(91.1284%)
Read2 after filtering:
total reads: 15105224
total bases: 2264474181
Q20 bases: 2197319205(97.0344%)
Q30 bases: 2086050677(92.1208%)
Filtering result:
reads passed filter: 30361692
reads failed due to low quality: 0
reads failed due to too many N: 0
reads failed due to too short: 0
reads with adapter trimmed: 623982
bases trimmed due to adapters: 2636132
Duplication rate: 0.498141%
Insert size peak (evaluated by paired-end reads): 226
JSON report: fastp.json
HTML report: fastp.html
/projects/f_lz332_1/software/fastp -i /projects/f_lz332_1/DataBase/MetaGenomeData/Li_FrontMicro_2021_COVID/0.rawdata/ERR5445742_1.fastq.gz -I /projects/f_lz332_1/DataBase/MetaGenomeData/Li_FrontMicro_2021_COVID/0.rawdata/ERR5445742_2.fastq.gz -o ERR5445742_l1R1.fastq.gz -O ERR5445742_l1R2.fastq.gz --dedup --dup_calc_accuracy 1 --thread 16
fastp v0.23.4, time used: 80 seconds
Read1 before filtering:
total reads: 15105224
total bases: 2264474181
Q20 bases: 2187424528(96.5975%)
Q30 bases: 2063578514(91.1284%)
Read2 before filtering:
total reads: 15105224
total bases: 2264474181
Q20 bases: 2197319205(97.0344%)
Q30 bases: 2086050677(92.1208%)
Read1 after filtering:
total reads: 15104100
total bases: 2264308814
Q20 bases: 2187263387(96.5974%)
Q30 bases: 2063424187(91.1282%)
Read2 after filtering:
total reads: 15104100
total bases: 2264308814
Q20 bases: 2197157837(97.0344%)
Q30 bases: 2085895844(92.1206%)
Filtering result:
reads passed filter: 30210448
reads failed due to low quality: 0
reads failed due to too many N: 0
reads failed due to too short: 0
reads with adapter trimmed: 0
bases trimmed due to adapters: 0
Duplication rate: 0.00744113%
Insert size peak (evaluated by paired-end reads): 214
JSON report: fastp.json
HTML report: fastp.html
/projects/f_lz332_1/software/fastp -i ERR5445742_l1R1.fastq.gz -I ERR5445742_l1R2.fastq.gz -o ERR5445742_l1l1R1.fastq.gz -O ERR5445742_l1l1R2.fastq.gz --dedup --dup_calc_accuracy 1 --thread 16
fastp v0.23.4, time used: 79 seconds
Read1 before filtering:
total reads: 15105224
total bases: 2264474181
Q20 bases: 2187424528(96.5975%)
Q30 bases: 2063578514(91.1284%)
Read2 before filtering:
total reads: 15105224
total bases: 2264474181
Q20 bases: 2197319205(97.0344%)
Q30 bases: 2086050677(92.1208%)
Read1 after filtering:
total reads: 15091808
total bases: 2262463985
Q20 bases: 2185485043(96.5976%)
Q30 bases: 2061749882(91.1285%)
Read2 after filtering:
total reads: 15091808
total bases: 2262463985
Q20 bases: 2195369494(97.0345%)
Q30 bases: 2084200083(92.1208%)
Filtering result:
reads passed filter: 30210448
reads failed due to low quality: 0
reads failed due to too many N: 0
reads failed due to too short: 0
reads with adapter trimmed: 0
bases trimmed due to adapters: 0
Duplication rate: 0.088817%
Insert size peak (evaluated by paired-end reads): 214
JSON report: fastp.json
HTML report: fastp.html
/projects/f_lz332_1/software/fastp -i ERR5445742_l1R1.fastq.gz -I ERR5445742_l1R2.fastq.gz -o ERR5445742_l1l3R1.fastq.gz -O ERR5445742_l1l3R2.fastq.gz --dedup --dup_calc_accuracy 3 --thread 16
fastp v0.23.4, time used: 80 seconds
Read1 before filtering:
total reads: 15105224
total bases: 2264474181
Q20 bases: 2187424528(96.5975%)
Q30 bases: 2063578514(91.1284%)
Read2 before filtering:
total reads: 15105224
total bases: 2264474181
Q20 bases: 2197319205(97.0344%)
Q30 bases: 2086050677(92.1208%)
Read1 after filtering:
total reads: 15101641
total bases: 2263938008
Q20 bases: 2186907014(96.5975%)
Q30 bases: 2063090824(91.1284%)
Read2 after filtering:
total reads: 15101641
total bases: 2263938008
Q20 bases: 2196799311(97.0344%)
Q30 bases: 2085557436(92.1208%)
Filtering result:
reads passed filter: 30210448
reads failed due to low quality: 0
reads failed due to too many N: 0
reads failed due to too short: 0
reads with adapter trimmed: 0
bases trimmed due to adapters: 0
Duplication rate: 0.0237203%
Insert size peak (evaluated by paired-end reads): 214
JSON report: fastp.json
HTML report: fastp.html
/projects/f_lz332_1/software/fastp -i ERR5445742_l1R1.fastq.gz -I ERR5445742_l1R2.fastq.gz -o ERR5445742_l1l6R1.fastq.gz -O ERR5445742_l1l6R2.fastq.gz --dedup --dup_calc_accuracy 6 --thread 16
fastp v0.23.4, time used: 85 seconds
The text was updated successfully, but these errors were encountered: