Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue to run on real data #21

Open
Tcvalenzuela opened this issue Jan 10, 2024 · 6 comments
Open

Issue to run on real data #21

Tcvalenzuela opened this issue Jan 10, 2024 · 6 comments

Comments

@Tcvalenzuela
Copy link

Hi thank you so much for this tool.

I'm writing again because I cant manage to make it run on my real data.
I create a conda env with Python2.7, I notice that on previous issues gawk was necessary and is installed. Nonetheless continue to fail. Here the packages on my conda env:


# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
_r-mutex                  1.0.1               anacondar_1    conda-forge
bedops                    2.4.41               h4ac6f70_1    bioconda
bedtools                  2.31.1               hf5e1c6e_0    bioconda
binutils_impl_linux-64    2.40                 hf600244_0    conda-forge
bwa                       0.7.17              he4a0461_11    bioconda
bwidget                   1.9.14               ha770c72_1    conda-forge
bzip2                     1.0.8                hd590300_5    conda-forge
c-ares                    1.25.0               hd590300_0    conda-forge
ca-certificates           2023.11.17           hbcca054_0    conda-forge
cairo                     1.18.0               h3faef2a_0    conda-forge
certifi                   2019.11.28       py27h8c360ce_1    conda-forge
clustalw                  2.1                  h4ac6f70_9    bioconda
curl                      8.5.0                hca28451_0    conda-forge
expat                     2.5.0                hcb278e6_1    conda-forge
font-ttf-dejavu-sans-mono 2.37                 hab24e00_0    conda-forge
font-ttf-inconsolata      3.000                h77eed37_0    conda-forge
font-ttf-source-code-pro  2.038                h77eed37_0    conda-forge
font-ttf-ubuntu           0.83                 h77eed37_1    conda-forge
fontconfig                2.14.2               h14ed4e7_0    conda-forge
fonts-conda-ecosystem     1                             0    conda-forge
fonts-conda-forge         1                             0    conda-forge
freetype                  2.12.1               h267a509_2    conda-forge
fribidi                   1.0.10               h36c2ea0_0    conda-forge
gatk                      3.8                 hdfd78af_11    bioconda
gawk                      5.3.0                ha916aea_0    conda-forge
gcc_impl_linux-64         13.2.0               h338b0a0_3    conda-forge
gettext                   0.21.1               h27087fc_0    conda-forge
gfortran_impl_linux-64    13.2.0               h76e1118_3    conda-forge
gmp                       6.3.0                h59595ed_0    conda-forge
graphite2                 1.3.13            h58526e2_1001    conda-forge
gxx_impl_linux-64         13.2.0               h338b0a0_3    conda-forge
harfbuzz                  8.3.0                h3d44ed6_0    conda-forge
htslib                    1.19                 h81da01d_0    bioconda
icu                       73.2                 h59595ed_0    conda-forge
kernel-headers_linux-64   2.6.32              he073ed8_16    conda-forge
keyutils                  1.6.1                h166bdaf_0    conda-forge
krb5                      1.21.2               h659d440_0    conda-forge
ld_impl_linux-64          2.40                 h41732ed_0    conda-forge
lerc                      4.0.0                h27087fc_0    conda-forge
libblas                   3.9.0           20_linux64_openblas    conda-forge
libcurl                   8.5.0                hca28451_0    conda-forge
libdeflate                1.19                 hd590300_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 hd590300_2    conda-forge
libexpat                  2.5.0                hcb278e6_1    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-devel_linux-64     13.2.0             ha9c7c90_103    conda-forge
libgcc-ng                 13.2.0               h807b86a_3    conda-forge
libgfortran-ng            13.2.0               h69a702a_3    conda-forge
libgfortran5              13.2.0               ha4646dd_3    conda-forge
libglib                   2.78.3               h783c2da_0    conda-forge
libgomp                   13.2.0               h807b86a_3    conda-forge
libiconv                  1.17                 hd590300_2    conda-forge
libjpeg-turbo             3.0.0                hd590300_1    conda-forge
liblapack                 3.9.0           20_linux64_openblas    conda-forge
libnghttp2                1.58.0               h47da74e_1    conda-forge
libopenblas               0.3.25          pthreads_h413a1c8_0    conda-forge
libpng                    1.6.39               h753d276_0    conda-forge
libsanitizer              13.2.0               h7e041cc_3    conda-forge
libsqlite                 3.44.2               h2797004_0    conda-forge
libssh2                   1.11.0               h0841786_0    conda-forge
libstdcxx-devel_linux-64  13.2.0             ha9c7c90_103    conda-forge
libstdcxx-ng              13.2.0               h7e041cc_3    conda-forge
libtiff                   4.6.0                ha9c0a0a_2    conda-forge
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libwebp-base              1.3.2                hd590300_0    conda-forge
libxcb                    1.15                 h0b41bf4_0    conda-forge
libxcrypt                 4.4.36               hd590300_1    conda-forge
libzlib                   1.2.13               hd590300_5    conda-forge
make                      4.3                  hd18ef5c_1    conda-forge
mpfr                      4.2.1                h9458935_0    conda-forge
ncurses                   6.4                  h59595ed_2    conda-forge
openjdk                   8.0.382              hd590300_0    conda-forge
openssl                   3.2.0                hd590300_1    conda-forge
pango                     1.50.14              ha41ecd1_2    conda-forge
pcre2                     10.42                hcad00b1_0    conda-forge
perl                      5.32.1          7_hd590300_perl5    conda-forge
pip                       20.1.1             pyh9f0ad1d_0    conda-forge
pixman                    0.43.0               h59595ed_0    conda-forge
pthread-stubs             0.4               h36c2ea0_1001    conda-forge
python                    2.7.18               h42bf7aa_3  
python_abi                2.7                    1_cp27mu    conda-forge
r-base                    4.3.2                hb8ee39d_1    conda-forge
r-bitops                  1.0_7             r43h57805ef_2    conda-forge
r-catools                 1.18.2            r43ha503ecb_2    conda-forge
r-cli                     3.6.2             r43ha503ecb_0    conda-forge
r-colorspace              2.1_0             r43h57805ef_1    conda-forge
r-crayon                  1.5.2             r43hc72bb7e_2    conda-forge
r-ellipsis                0.3.2             r43h57805ef_2    conda-forge
r-fansi                   1.0.6             r43h57805ef_0    conda-forge
r-farver                  2.1.1             r43ha503ecb_2    conda-forge
r-ggplot2                 3.4.4             r43hc72bb7e_0    conda-forge
r-glue                    1.6.2             r43h57805ef_2    conda-forge
r-gplots                  3.1.3             r43hc72bb7e_2    conda-forge
r-gsalib                  2.2.1             r43hc72bb7e_2    conda-forge
r-gtable                  0.3.4             r43hc72bb7e_0    conda-forge
r-gtools                  3.9.5             r43h57805ef_0    conda-forge
r-isoband                 0.2.7             r43ha503ecb_2    conda-forge
r-kernsmooth              2.23_22           r43h13b3f57_0    conda-forge
r-labeling                0.4.3             r43hc72bb7e_0    conda-forge
r-lattice                 0.22_5            r43h57805ef_0    conda-forge
r-lifecycle               1.0.4             r43hc72bb7e_0    conda-forge
r-magrittr                2.0.3             r43h57805ef_2    conda-forge
r-mass                    7.3_60            r43h57805ef_1    conda-forge
r-matrix                  1.6_4             r43h316c678_0    conda-forge
r-mgcv                    1.9_1             r43h316c678_0    conda-forge
r-munsell                 0.5.0           r43hc72bb7e_1006    conda-forge
r-nlme                    3.1_164           r43h61816a4_0    conda-forge
r-pillar                  1.9.0             r43hc72bb7e_1    conda-forge
r-pkgconfig               2.0.3             r43hc72bb7e_3    conda-forge
r-plyr                    1.8.9             r43ha503ecb_0    conda-forge
r-r6                      2.5.1             r43hc72bb7e_2    conda-forge
r-rcolorbrewer            1.1_3             r43h785f33e_2    conda-forge
r-rcpp                    1.0.12            r43h7df8631_0    conda-forge
r-reshape                 0.8.9             r43hc72bb7e_2    conda-forge
r-rlang                   1.1.2             r43ha503ecb_0    conda-forge
r-scales                  1.3.0             r43hc72bb7e_0    conda-forge
r-tibble                  3.2.1             r43h57805ef_2    conda-forge
r-utf8                    1.2.4             r43h57805ef_0    conda-forge
r-vctrs                   0.6.5             r43ha503ecb_0    conda-forge
r-viridislite             0.4.2             r43hc72bb7e_1    conda-forge
r-withr                   2.5.2             r43hc72bb7e_0    conda-forge
readline                  8.2                  h8228510_1    conda-forge
samtools                  1.19                 h50ea8bc_0    bioconda
sed                       4.8                  he412f7d_0    conda-forge
setuptools                44.0.0                   py27_0    conda-forge
sqlite                    3.44.2               h2c6b66d_0    conda-forge
sysroot_linux-64          2.12                he073ed8_16    conda-forge
tk                        8.6.13          noxft_h4845f30_101    conda-forge
tktable                   2.10                 h0c5db8f_5    conda-forge
wheel                     0.37.1             pyhd8ed1ab_0    conda-forge
xorg-kbproto              1.0.7             h7f98852_1002    conda-forge
xorg-libice               1.1.1                hd590300_0    conda-forge
xorg-libsm                1.2.4                h7391055_0    conda-forge
xorg-libx11               1.8.7                h8ee46fc_0    conda-forge
xorg-libxau               1.0.11               hd590300_0    conda-forge
xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
xorg-libxext              1.3.4                h0b41bf4_2    conda-forge
xorg-libxrender           0.9.11               hd590300_0    conda-forge
xorg-libxt                1.3.0                hd590300_1    conda-forge
xorg-renderproto          0.11.1            h7f98852_1002    conda-forge
xorg-xextproto            7.3.0             h0b41bf4_1003    conda-forge
xorg-xproto               7.0.31            h7f98852_1007    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
zlib                      1.2.13               hd590300_5    conda-forge
zstd                      1.5.5                hfc55251_0    conda-forge

The error is:


 cat RunTEMP2Clean_230957.err
[bam_sort_core] merging from 80 files and 40 in-memory blocks...
[bam_sort_core] merging from 0 files and 40 in-memory blocks...
[M::bam2fq_mainloop] discarded 0 singletons
[M::bam2fq_mainloop] processed 85224 reads
During startup - Warning messages:
1: Setting LC_CTYPE failed, using "C" 
2: Setting LC_COLLATE failed, using "C" 
3: Setting LC_TIME failed, using "C" 
4: Setting LC_MESSAGES failed, using "C" 
5: Setting LC_MONETARY failed, using "C" 
6: Setting LC_PAPER failed, using "C" 
7: Setting LC_MEASUREMENT failed, using "C" 
[bam_sort_core] merging from 0 files and 40 in-memory blocks...
[M::bam2fq_mainloop] discarded 0 singletons
[M::bam2fq_mainloop] processed 17768650 reads
pass1 - making usageList (570 chroms): 286 millis
pass2 - checking and writing primary data (338482 records, 6 fields): 1464 millis
gawk: cmd. line:1: (FILENAME=Awka_Nigeria_10.unpair.uniq.transposon.fixLTR.bed FNR=2) fatal: cannot redirect to `Awka_Nigeria_10.supportReads/seq_10543#LINE/I.-.bed': No such file or directory
During startup - Warning messages:
1: Setting LC_CTYPE failed, using "C" 
2: Setting LC_COLLATE failed, using "C" 
3: Setting LC_TIME failed, using "C" 
4: Setting LC_MESSAGES failed, using "C" 
5: Setting LC_MONETARY failed, using "C" 
6: Setting LC_PAPER failed, using "C" 
7: Setting LC_MEASUREMENT failed, using "C" 
[main_samview] region "seq_11258#Satellite" specifies an unknown reference name. Continue anyway.
[main_samview] region "seq_17149#Satelite" specifies an unknown reference name. Continue anyway.
/beegfs/data/tcarrasco/Programs/TEMP2/bin/TEMP2_insertion.sh: line 293: Awka_Nigeria_10.transposonMapping/seq_22908#LTR/Ty3.sam: No such file or directory
/beegfs/data/tcarrasco/Programs/TEMP2/bin/TEMP2_insertion.sh: line 293: Awka_Nigeria_10.transposonMapping/seq_8603#LTR/Ty3.sam: No such file or directory
/beegfs/data/tcarrasco/Programs/TEMP2/bin/TEMP2_insertion.sh: line 293: Awka_Nigeria_10.transposonMapping/seq_18651#LTR/Ty3.sam: No such file or directory
/beegfs/data/tcarrasco/Programs/TEMP2/bin/TEMP2_insertion.sh: line 293: Awka_Nigeria_10.transposonMapping/seq_11654#LTR/Ty3.sam: No such file or directory
/beegfs/data/tcarrasco/Programs/TEMP2/bin/TEMP2_insertion.sh: line 293: Awka_Nigeria_10.transposonMapping/seq_12632#LTR/Ty3.sam: No such file or directory
/beegfs/data/tcarrasco/Programs/TEMP2/bin/TEMP2_insertion.sh: line 293: Awka_Nigeria_10.transposonMapping/seq_9794#LTR/Ty3.sam: No such file or directory
/beegfs/data/tcarrasco/Programs/TEMP2/bin/TEMP2_insertion.sh: line 293: Awka_Nigeria_10.transposonMapping/seq_11404#LTR/Ty3.sam: No such file or directory
/beegfs/data/tcarrasco/Programs/TEMP2/bin/TEMP2_insertion.sh: line 293: Awka_Nigeria_10.transposonMapping/seq_2247#LTR/Ty3.sam: No such file or directory
/beegfs/data/tcarrasco/Programs/TEMP2/bin/TEMP2_insertion.sh: line 293: Awka_Nigeria_10.transposonMapping/seq_557#LTR/Ty3.sam: No such file or directory
/beegfs/data/tcarrasco/Programs/TEMP2/bin/TEMP2_insertion.sh: line 293: Awka_Nigeria_10.transposonMapping/seq_22490#LTR/Ty3.sam: No such file or directory
/beegfs/data/tcarrasco/Programs/TEMP2/bin/TEMP2_insertion.sh: line 293: Awka_Nigeria_10.transposonMapping/seq_11542#LTR/Ty3.sam: No such file or directory
/beegfs/data/tcarrasco/Programs/TEMP2/bin/TEMP2_insertion.sh: line 293: Awka_Nigeria_10.transposonMapping/seq_10597#LTR/Ty3.sam: No such file or directory
[main_samview] region "seq_3671#MITE" specifies an unknown reference name. Continue anyway.
[main_samview] region "seq_8560#MITE" specifies an unknown reference name. Continue anyway.
[main_samview] region "seq_13666#MITE" specifies an unknown reference name. Continue anyway.
[main_samview] region "seq_3486#MITE" specifies an unknown reference name. Continue anyway.
[main_samview] region "seq_22889#MITE" specifies an unknown reference name. Continue anyway.
[main_samview] region "seq_22681#MITE" specifies an unknown reference name. Continue anyway.
[main_samview] region "seq_3502#MITE" specifies an unknown reference name. Continue anyway.
[main_samview] region "seq_3199#MITE" specifies an unknown reference name. Continue anyway.
[main_samview] region "seq_863#MITE" specifies an unknown reference name. Continue anyway.
[main_samview] region "seq_3168#MITE" specifies an unknown reference name. Continue anyway.
[main_samview] region "seq_22348#MITE" specifies an unknown reference name. Continue anyway.
... Lots of them...
During startup - Warning messages:
1: Setting LC_CTYPE failed, using "C" 
2: Setting LC_COLLATE failed, using "C" 
3: Setting LC_TIME failed, using "C" 
4: Setting LC_MESSAGES failed, using "C" 
5: Setting LC_MONETARY failed, using "C" 
6: Setting LC_PAPER failed, using "C" 
7: Setting LC_MEASUREMENT failed, using "C" 
Error in `[.data.frame`(soma, , 3) : undefined columns selected
Calls: [ -> [.data.frame
Execution halted
During startup - Warning messages:
1: Setting LC_CTYPE failed, using "C" 
2: Setting LC_COLLATE failed, using "C" 
3: Setting LC_TIME failed, using "C" 
4: Setting LC_MESSAGES failed, using "C" 
5: Setting LC_MONETARY failed, using "C" 
6: Setting LC_PAPER failed, using "C" 
7: Setting LC_MEASUREMENT failed, using "C" 
Error in read.table(Args[6], header = F, row.names = NULL) : 
  no lines available in input
Execution halted
pass1 - making usageList (1 chroms): 7 millis
pass2 - checking and writing primary data (1 records, 6 fields): 7 millis

On the out file I do have (I took the freedom to add more of the require software on the check):

Testing required softwares:
bwa: /beegfs/data/tcarrasco/Programs/Conda/envs/TEMP-P2.7/bin/bwa
samtools: /beegfs/data/tcarrasco/Programs/Conda/envs/TEMP-P2.7/bin/samtools
bedtools: /beegfs/data/tcarrasco/Programs/Conda/envs/TEMP-P2.7/bin/bedtools
Rscript: /beegfs/data/tcarrasco/Programs/Conda/envs/TEMP-P2.7/bin/Rscript
gawk: /beegfs/data/tcarrasco/Programs/Conda/envs/TEMP-P2.7/bin/gawk
bedops: /beegfs/data/tcarrasco/Programs/Conda/envs/TEMP-P2.7/bin/bedops
awk: /beegfs/data/tcarrasco/Programs/Conda/envs/TEMP-P2.7/bin/awk
------ Start pipeline ------
bam file not specified, map raw reads tp genome via bwa mem	Mon Jan  8 12:46:38 UTC 2024
transform sam to sorted bam and index it	Mon Jan  8 14:43:05 UTC 2024
get concordant-uniq-split reads	Mon Jan  8 15:27:21 UTC 2024
check fragment length	Mon Jan  8 15:33:08 UTC 2024
insert size set to 95 quantile: 498
WARNING: standard deviation of insert size is higher than 100 (120)
get mate seq of the uniq-unpaired	Mon Jan  8 15:33:15 UTC 2024
map paired split uniqMappers and unpaired uniqMappers to transposons	Mon Jan  8 16:12:39 UTC 2024
merge fragments in genome and transposon	Mon Jan  8 16:15:11 UTC 2024
merge support reads in the same direction within 498 - 150	Mon Jan  8 16:19:16 UTC 2024
merge support reads in different direction within 2 X 498 - 150	Mon Jan  8 16:19:18 UTC 2024
filter candidate insertions which overlap with the same transposon insertion or in high depth region	Mon Jan  8 16:19:18 UTC 2024
filter candidate insertions in high depth region	Mon Jan  8 16:19:44 UTC 2024
average read number for 200bp bins is 37.718, set read number cutoff to 188.59
Filtered insertion number: 1 - 0 (overlap rmsk) 0 (short insertion) - 0 (high depth) = 1
generate the overall distribution of transposon mapping reads, first map all reads to transposon	Mon Jan  8 16:20:16 UTC 2024
sam to bed and bedGraph, multiple mappers are divided by their map times	Mon Jan  8 16:20:16 UTC 2024
estimate de novo insertion number for each transposon using singleton reads	Mon Jan  8 16:35:41 UTC 2024
generate distribution figures for singleton supporting reads	Mon Jan  8 16:35:42 UTC 2024
filter unreliable singleton insertions, also filter 2p insertions overlapped with similar reference transposon copies	Mon Jan  8 16:35:45 UTC 2024
Calculate frequency of each transposon insertion	Mon Jan  8 16:35:45 UTC 2024
get TSD, remove redundant insertions and recalculate de novo insertion rate	Mon Jan  8 16:35:46 UTC 2024
calculate de novo insertion rate per genome	Mon Jan  8 16:35:46 UTC 2024
clean tmp files	Mon Jan  8 16:35:47 UTC 2024
Done, Congras!!!🍺🍺🍺

And just to check this are the files that manage to print:

-rw-r--r-- 1 tcarrasco gei 1,4K janv.  9 13:44 RunTEMP2.slurm
-rw-r--r-- 1 tcarrasco gei  14G janv.  9 16:07 Awka_Nigeria_10.sorted.bam
-rw-r--r-- 1 tcarrasco gei 4,0M janv.  9 16:27 Awka_Nigeria_10.sorted.bam.bai
-rw-r--r-- 1 tcarrasco gei  12M janv.  9 17:19 Awka_Nigeria_10.supportReadsUnfiltered.bb
drwxr-xr-x 2 tcarrasco gei    4 janv.  9 17:19 Awka_Nigeria_10.supportReads
-rw-r--r-- 1 tcarrasco gei    0 janv.  9 17:20 Awka_Nigeria_10.spike.bed
drwxr-xr-x 2 tcarrasco gei  824 janv.  9 17:35 Awka_Nigeria_10.transposonMapping
-rw-r--r-- 1 tcarrasco gei  64K janv.  9 17:35 RunTEMP2Clean_230957.err
-rw-r--r-- 1 tcarrasco gei  14K janv.  9 17:35 Awka_Nigeria_10.insertion.bb
-rw-r--r-- 1 tcarrasco gei  23K janv.  9 17:35 Awka_Nigeria_10.soma.summary.txt
-rw-r--r-- 1 tcarrasco gei  317 janv.  9 17:35 Awka_Nigeria_10.insertion.bed
drwxr-xr-x 4 tcarrasco gei   54 janv.  9 17:36 tmpTEMP2
-rw-r--r-- 1 tcarrasco gei 9,0K janv.  9 17:36 RunTEMP2Clean_230957.out

@tianxiongbb
Copy link
Collaborator

tianxiongbb commented Jan 15, 2024 via email

@Tcvalenzuela
Copy link
Author

Hi Tianxiong,

Thanks for your response, sadly I have the same error.

bam_sort_core] merging from 80 files and 40 in-memory blocks...
[bam_sort_core] merging from 0 files and 40 in-memory blocks...
[M::bam2fq_mainloop] discarded 0 singletons
[M::bam2fq_mainloop] processed 85224 reads
[bam_sort_core] merging from 0 files and 40 in-memory blocks...
[M::bam2fq_mainloop] discarded 0 singletons
[M::bam2fq_mainloop] processed 17768650 reads
pass1 - making usageList (570 chroms): 210 millis
pass2 - checking and writing primary data (338482 records, 6 fields): 1100 millis
gawk: cmd. line:1: (FILENAME=Awka_Nigeria_10.unpair.uniq.transposon.fixLTR.bed FNR=2) fatal: cannot redirect to `Awka_Nigeria_10.supportReads/seq_10543_LINE/I.-.bed': No such file or directory
[bam_sort_core] merging from 0 files and 40 in-memory blocks...
/beegfs/data/tcarrasco/Programs/TEMP2/bin/TEMP2_insertion.sh: line 293: Awka_Nigeria_10.transposonMapping/seq_22908_LTR/Ty3.sam: No such file or directory
/beegfs/data/tcarrasco/Programs/TEMP2/bin/TEMP2_insertion.sh: line 293: Awka_Nigeria_10.transposonMapping/seq_8603_LTR/Ty3.sam: No such file or directory
/beegfs/data/tcarrasco/Programs/TEMP2/bin/TEMP2_insertion.sh: line 293: Awka_Nigeria_10.transposonMapping/seq_18651_LTR/Ty3.sam: No such file or directory
/beegfs/data/tcarrasco/Programs/TEMP2/bin/TEMP2_insertion.sh: line 293: Awka_Nigeria_10.transposonMapping/seq_11654_LTR/Ty3.sam: No such file or directory

And like that for a while, at the end of the error it shows:

/beegfs/data/tcarrasco/Programs/TEMP2/bin/TEMP2_insertion.sh: line 293: Awka_Nigeria_10.transposonMapping/seq_12_DNA/Sola2.sam: No such file or directory
/beegfs/data/tcarrasco/Programs/TEMP2/bin/TEMP2_insertion.sh: line 293: Awka_Nigeria_10.transposonMapping/seq_129_LTR/Ty3.sam: No such file or directory
Error in read.table(Args[9], header = F, row.names = NULL) : 
  no lines available in input
Execution halted
pass1 - making usageList (1 chroms): 6 millis
pass2 - checking and writing primary data (1 records, 6 fields): 6 millis

Nontheless the out file says that everything is ok,

Testing required softwares:
bwa: /beegfs/data/tcarrasco/Programs/Conda/envs/TEMP-P2.7/bin/bwa
samtools: /beegfs/data/tcarrasco/Programs/Conda/envs/TEMP-P2.7/bin/samtools
bedtools: /beegfs/data/tcarrasco/Programs/Conda/envs/TEMP-P2.7/bin/bedtools
Rscript: /beegfs/data/tcarrasco/Programs/Conda/envs/TEMP-P2.7/bin/Rscript
gawk: /beegfs/data/tcarrasco/Programs/Conda/envs/TEMP-P2.7/bin/gawk
bedops: /beegfs/data/tcarrasco/Programs/Conda/envs/TEMP-P2.7/bin/bedops
awk: /beegfs/data/tcarrasco/Programs/Conda/envs/TEMP-P2.7/bin/awk
------ Start pipeline ------
bam file not specified, map raw reads tp genome via bwa mem	lun. 15 janv. 2024 10:03:41 CET
transform sam to sorted bam and index it	lun. 15 janv. 2024 13:57:30 CET
get concordant-uniq-split reads	lun. 15 janv. 2024 14:25:38 CET
check fragment length	lun. 15 janv. 2024 14:49:20 CET
insert size set to 95 quantile: 498
WARNING: standard deviation of insert size is higher than 100 (120)
get mate seq of the uniq-unpaired	lun. 15 janv. 2024 14:49:24 CET
map paired split uniqMappers and unpaired uniqMappers to transposons	lun. 15 janv. 2024 15:37:37 CET
merge fragments in genome and transposon	lun. 15 janv. 2024 15:40:46 CET
merge support reads in the same direction within 498 - 150	lun. 15 janv. 2024 15:45:56 CET
merge support reads in different direction within 2 X 498 - 150	lun. 15 janv. 2024 15:45:58 CET
filter candidate insertions which overlap with the same transposon insertion or in high depth region	lun. 15 janv. 2024 15:45:58 CET
filter candidate insertions in high depth region	lun. 15 janv. 2024 15:46:18 CET
average read number for 200bp bins is 37.317, set read number cutoff to 186.585
Filtered insertion number: 1 - 0 (overlap rmsk) 0 (short insertion) - 0 (high depth) = 1
generate the overall distribution of transposon mapping reads, first map all reads to transposon	lun. 15 janv. 2024 15:46:28 CET
sam to bed and bedGraph, multiple mappers are divided by their map times	lun. 15 janv. 2024 16:08:19 CET
estimate de novo insertion number for each transposon using singleton reads	lun. 15 janv. 2024 16:26:02 CET
generate distribution figures for singleton supporting reads	lun. 15 janv. 2024 16:26:40 CET
filter unreliable singleton insertions, also filter 2p insertions overlapped with similar reference transposon copies	lun. 15 janv. 2024 16:26:43 CET
Calculate frequency of each transposon insertion	lun. 15 janv. 2024 16:26:43 CET
get TSD, remove redundant insertions and recalculate de novo insertion rate	lun. 15 janv. 2024 16:26:44 CET
calculate de novo insertion rate per genome	lun. 15 janv. 2024 16:26:44 CET
clean tmp files	lun. 15 janv. 2024 16:26:45 CET
Done, Congras!!!🍺🍺🍺

Do you have any idea what else to modify?

Thank you so much

@Tcvalenzuela
Copy link
Author

Something have to be wrong before the spike file, here the file size:

-rw-r--r-- 1 tcarrasco gei 1,4K janv. 16 10:00 RunTEMP2.slurm
-rw-r--r-- 1 tcarrasco gei  14G janv. 16 14:14 Awka_Nigeria_10.sorted.bam
-rw-r--r-- 1 tcarrasco gei 4,0M janv. 16 14:25 Awka_Nigeria_10.sorted.bam.bai
-rw-r--r-- 1 tcarrasco gei  12M janv. 16 15:45 Awka_Nigeria_10.supportReadsUnfiltered.bb
-rw-r--r-- 1 tcarrasco gei    0 janv. 16 15:46 Awka_Nigeria_10.spike.bed
-rw-r--r-- 1 tcarrasco gei  44K janv. 16 16:26 RunTEMP2Clean_237114.err
-rw-r--r-- 1 tcarrasco gei  27K janv. 16 16:26 Awka_Nigeria_10.soma.summary.txt
-rw-r--r-- 1 tcarrasco gei  14K janv. 16 16:26 Awka_Nigeria_10.insertion.bb
drwxr-xr-x 4 tcarrasco gei   54 janv. 16 16:26 tmpTEMP2
-rw-r--r-- 1 tcarrasco gei 9,0K janv. 16 16:26 RunTEMP2Clean_237114.out
-rw-r--r-- 1 tcarrasco gei  317 janv. 16 16:26 Awka_Nigeria_10.insertion.bed

@Tcvalenzuela
Copy link
Author

and the temporary files:

ls  tmpTEMP2
total 104G
-rw-r--r-- 1 tcarrasco gei  96K janv. 16 13:57 Awka_Nigeria_10.bwamem.log
-rw-r--r-- 1 tcarrasco gei  21K janv. 16 14:25 Awka_Nigeria_10.tmp.header
-rw-r--r-- 1 tcarrasco gei  10M janv. 16 14:48 Awka_Nigeria_10.pair.uniq.split.bam
-rw-r--r-- 1 tcarrasco gei  29M janv. 16 14:49 Awka_Nigeria_10.pair.uniq.split.fastq
-rw-r--r-- 1 tcarrasco gei 3,2M janv. 16 14:49 Awka_Nigeria_10.pair.uniq.split.bed
-rw-r--r-- 1 tcarrasco gei   51 janv. 16 14:49 Awka_Nigeria_10.fragL
-rw-r--r-- 1 tcarrasco gei  30G janv. 16 15:02 Awka_Nigeria_10.unpair.sam
-rw-r--r-- 1 tcarrasco gei 2,4G janv. 16 15:22 Awka_Nigeria_10.unpair.uniq.1.fastq
-rw-r--r-- 1 tcarrasco gei 2,3G janv. 16 15:22 Awka_Nigeria_10.unpair.uniq.2.fastq
-rw-r--r-- 1 tcarrasco gei 838M janv. 16 15:37 Awka_Nigeria_10.unpair.uniq.bed
-rw-r--r-- 1 tcarrasco gei  12K janv. 16 15:37 Awka_Nigeria_10.tmp.te.size
-rw-r--r-- 1 tcarrasco gei 754K janv. 16 15:37 Awka_Nigeria_10.tmp.te.index.sa
-rw-r--r-- 1 tcarrasco gei 377K janv. 16 15:37 Awka_Nigeria_10.tmp.te.index.pac
-rw-r--r-- 1 tcarrasco gei 1,5M janv. 16 15:37 Awka_Nigeria_10.tmp.te.index.bwt
-rw-r--r-- 1 tcarrasco gei  20K janv. 16 15:37 Awka_Nigeria_10.tmp.te.index.ann
-rw-r--r-- 1 tcarrasco gei 114K janv. 16 15:37 Awka_Nigeria_10.tmp.te.index.amb
-rw-r--r-- 1 tcarrasco gei  32M janv. 16 15:37 Awka_Nigeria_10.pair.uniq.split.transposon.sam
-rw-r--r-- 1 tcarrasco gei 1,7G janv. 16 15:40 Awka_Nigeria_10.unpair.uniq.transposon.sam
-rw-r--r-- 1 tcarrasco gei 119K janv. 16 15:40 Awka_Nigeria_10.pair.uniq.split.transposon.bed
-rw-r--r-- 1 tcarrasco gei  40M janv. 16 15:45 Awka_Nigeria_10.unpair.uniq.transposon.bed
-rw-r--r-- 1 tcarrasco gei 119K janv. 16 15:45 Awka_Nigeria_10.pair.uniq.split.transposon.fixLTR.bed
-rw-r--r-- 1 tcarrasco gei  40M janv. 16 15:45 Awka_Nigeria_10.unpair.uniq.transposon.fixLTR.bed
-rw-r--r-- 1 tcarrasco gei  15K janv. 16 15:45 Awka_Nigeria_10.tmp.chr.size
drwxr-xr-x 2 tcarrasco gei    4 janv. 16 15:45 Awka_Nigeria_10.supportReads
-rw-r--r-- 1 tcarrasco gei  108 janv. 16 15:45 Awka_Nigeria_10.final.bed
-rw-r--r-- 1 tcarrasco gei 101M janv. 16 15:46 Awka_Nigeria_10.tmp.rmsk.bed
-rw-r--r-- 1 tcarrasco gei    0 janv. 16 15:46 Awka_Nigeria_10.removed.bed
-rw-r--r-- 1 tcarrasco gei    0 janv. 16 15:46 Awka_Nigeria_10.removed.1p1.bed
-rw-r--r-- 1 tcarrasco gei  43K janv. 16 15:46 Awka_Nigeria_10.tmp.random.bed
-rw-r--r-- 1 tcarrasco gei  31K janv. 16 15:46 Awka_Nigeria_10.TPregion.bed
-rw-r--r-- 1 tcarrasco gei  41M janv. 16 15:46 Awka_Nigeria_10.singleton.cov
-rw-r--r-- 1 tcarrasco gei  138 janv. 16 15:46 Awka_Nigeria_10.insertion.raw.bed
-rw-r--r-- 1 tcarrasco gei  63G janv. 16 16:08 Awka_Nigeria_10.transposon.sam
-rw-r--r-- 1 tcarrasco gei  96K janv. 16 16:08 Awka_Nigeria_10.transposon.bwamem.log
-rw-r--r-- 1 tcarrasco gei 3,8G janv. 16 16:15 Awka_Nigeria_10.transposon.bam
-rw-r--r-- 1 tcarrasco gei  40K janv. 16 16:19 Awka_Nigeria_10.transposon.bam.bai
-rw-r--r-- 1 tcarrasco gei  40K janv. 16 16:20 Awka_Nigeria_10.parafile
drwxr-xr-x 2 tcarrasco gei  824 janv. 16 16:25 Awka_Nigeria_10.transposonMapping
-rw-r--r-- 1 tcarrasco gei  40K janv. 16 16:25 Awka_Nigeria_10.parafile.completed
-rw-r--r-- 1 tcarrasco gei 8,8M janv. 16 16:25 Awka_Nigeria_10.transposon.sense.bdg
-rw-r--r-- 1 tcarrasco gei 8,8M janv. 16 16:25 Awka_Nigeria_10.transposon.anti.bdg
-rw-r--r-- 1 tcarrasco gei 284M janv. 16 16:26 Awka_Nigeria_10.transposon.bed
-rw-r--r-- 1 tcarrasco gei  21K janv. 16 16:26 Awka_Nigeria_10.soma.rate.bed
-rw-r--r-- 1 tcarrasco gei   25 janv. 16 16:26 Awka_Nigeria_10.singleton.sense.bdg
-rw-r--r-- 1 tcarrasco gei    0 janv. 16 16:26 Awka_Nigeria_10.singleton.anti.bdg
-rw-r--r-- 1 tcarrasco gei    0 janv. 16 16:26 Awka_Nigeria_10.2p.sense.bdg
-rw-r--r-- 1 tcarrasco gei    0 janv. 16 16:26 Awka_Nigeria_10.2p.anti.bdg
-rw-r--r-- 1 tcarrasco gei    0 janv. 16 16:26 Awka_Nigeria_10.1p1.sense.bdg
-rw-r--r-- 1 tcarrasco gei    0 janv. 16 16:26 Awka_Nigeria_10.1p1.anti.bdg
-rw-r--r-- 1 tcarrasco gei    0 janv. 16 16:26 Awka_Nigeria_10.tmp2
-rw-r--r-- 1 tcarrasco gei  114 janv. 16 16:26 Awka_Nigeria_10.tmp1
-rw-r--r-- 1 tcarrasco gei  114 janv. 16 16:26 Awka_Nigeria_10.insertion.filtered.bed
-rw-r--r-- 1 tcarrasco gei   29 janv. 16 16:26 Awka_Nigeria_10.tmp.bed
-rw-r--r-- 1 tcarrasco gei  21K janv. 16 16:26 Awka_Nigeria_10.tmp

@tianxiongbb
Copy link
Collaborator

tianxiongbb commented Jan 17, 2024 via email

@Tcvalenzuela
Copy link
Author

Dear Tianxiong Yu

Thank you so much for helping me to debug this. I think we are progressing but get is not there.
First I'm so sorry I was unaware that the name structure of the TEs needed to be different than RepeatModeler output, good to know. I change everything to _, but I realized that to go back to RM standard I would like to use different symbols between "#" and "/". Are any other symbols allowed?

There is a new error:

[bam_sort_core] merging from 80 files and 40 in-memory blocks...
[bam_sort_core] merging from 0 files and 40 in-memory blocks...
[M::bam2fq_mainloop] discarded 0 singletons
[M::bam2fq_mainloop] processed 85224 reads
[bam_sort_core] merging from 0 files and 40 in-memory blocks...
[M::bam2fq_mainloop] discarded 0 singletons
[M::bam2fq_mainloop] processed 17768650 reads
pass1 - making usageList (570 chroms): 240 millis
pass2 - checking and writing primary data (338482 records, 6 fields): 1003 millis
[bam_sort_core] merging from 0 files and 40 in-memory blocks...
[bam_sort_core] merging from 0 files and 40 in-memory blocks...
pass1 - making usageList (568 chroms): 88 millis
Error line 1111 of Awka_Nigeria_10.t: name [seq_12205_LINE_RTE-BovB:2:45:-,seq_12291_LINE_RTE-BovB:3070:3113:+,seq_13992_LINE_RTE-BovB:1:44:-,seq_16198_LINE_RTE-BovB:4095:4138:+,seq_7805_LINE_RTE-BovB:4:47:-,seq_12205_LINE_RTE-BovB:300:449:+,seq_12291_LINE_RTE-BovB:2666:2815:-,seq_13992_LINE_RTE-BovB:300:449:+,seq_16198_LINE_RTE-BovB:3691:3840:-,seq_7805_LINE_RTE-BovB:300:449:+;0.125;singleton] is too long (must not exceed 255 characters)

Nonetheless, the good news, this seems to be the only case of failure because:

head Awka_Nigeria_10.soma.summary.txt
#transposonName	estimatedSomaticInsertionNumberPerGenome	95percentileSomaticInsertionNumberPerGenome	estimatedSomaticInsertionNumber	95percentileSomaticInsertionNumber	singletonReadsInTrueTransposonAnchorRegion	singletonReadsInFalseTransposonAnchorRegionreadsInTrueTransposonAnchorRegion	readsInFalseTransposonAnchorRegion	filterStatus
seq_9422_LTR_Copia	0.28357	0.06138	4.62	1	8	20	680	6249	imbalance
seq_9889_MITE	0.00000	0.00000	0	0	59	26	12583.5	3821	pass
seq_15257_DNA_Sola-2	0.64530	0.32736	10.5133	5.33333	18.3333	90.9167	3536.92	41136.2	pass
seq_9511_LINE_RTE-BovB	0.95886	0.27694	15.6219	4.51191	61.5119	213.633	6963.75	32420.2	pass
seq_22490_LTR_Ty3	0.42720	0.18414	6.96	3	13	44	1621	11806	pass
seq_22709_MITE	15.02452	13.66742	244.781	222.671	423.671	284.583	57349.5	91232.2	pass
seq_3736_MITE	0.00000	0.00000	0	0	38	16	7806	3161	pass
seq_22316_DNA	6.31490	5.26840	102.883	85.8333	206.833	244.5	24180.5	56877.2	pass
seq_508_LINE_LOA	2.22009	1.78614	36.17	29.1	55.1	123.667	4943.77	32298	pass

Although it concern me that the file .spike.bed is empty. Here the file size, it is everything ok?

-rw-r--r-- 1 tcarrasco gei  14G janv. 17 18:41 Awka_Nigeria_10.sorted.bam
-rw-r--r-- 1 tcarrasco gei 4,0M janv. 17 18:57 Awka_Nigeria_10.sorted.bam.bai
-rw-r--r-- 1 tcarrasco gei  12M janv. 17 20:24 Awka_Nigeria_10.supportReadsUnfiltered.bb
-rw-r--r-- 1 tcarrasco gei    0 janv. 17 20:39 Awka_Nigeria_10.spike.bed
-rw-r--r-- 1 tcarrasco gei  12M janv. 17 21:34 Awka_Nigeria_10.supportingRead.dis.pdf
-rw-r--r-- 1 tcarrasco gei 1,1K janv. 17 21:37 RunTEMP2Clean_237643.err
-rw-r--r-- 1 tcarrasco gei  49K janv. 17 21:37 Awka_Nigeria_10.insertion.bb
-rw-r--r-- 1 tcarrasco gei  36K janv. 17 21:37 Awka_Nigeria_10.soma.summary.txt
-rw-r--r-- 1 tcarrasco gei  11M janv. 17 21:37 Awka_Nigeria_10.insertion.bed
drwxr-xr-x 4 tcarrasco gei   55 janv. 17 21:37 tmpTEMP2
-rw-r--r-- 1 tcarrasco gei 9,1K janv. 17 21:37 RunTEMP2Clean_237643.out

Again than you so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants