You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for the wonderful tool. Recently I ran Salsa2 with arg -i 10 and it terminated before that number of iterations. Though there were good results, there were also several super-scaffoldings that had been made in a 3D-dna run that weren't present.
So, inspecting the code I saw the NG50 advancement test to determine when to break out of the loop. To see what more iterations might accomplish, I commented that out and reran with a setting of -i 30 to create the 31 scaffolds_ITERATION_# agp files.
I'm attaching the text of stats_scaffolds_ITERATION.txt (I changed the suffix to txt for uploading) a short awk script that printed out the following info below for each of the agps. The columns are filename, total of scaffold lengths, number of scaffolds and various N#/L# values
The asterisk before a value means it's the same as it was in the prior agp file.
You can see the N50 test has iter_7 as the first repeat, but using number of scaffolds it's iter_10, which has all the others values repeated from iter_9 as well.
If we allow for an additional iteration look-ahead then iter_11 gives us something new. We get a repeat of iter_12 at iter_13, but again a 1 iter look ahead gets us something new at iter_14 and then we have the first double repeats (i.e., 3 of the same set of values in a row) starting at iter_16, with 8 sets of the same values in a row. So that improves from iter_6 140 scaffs N50 60,979,473 L50 16, to iter_16 126 scaffs N50 79,034,342 L50 14.
Anyway, food for thought. Thanks again for a great tool.
Thanks for the wonderful tool. Recently I ran Salsa2 with
arg -i 10
and it terminated before that number of iterations. Though there were good results, there were also several super-scaffoldings that had been made in a 3D-dna run that weren't present.So, inspecting the code I saw the NG50 advancement test to determine when to break out of the loop. To see what more iterations might accomplish, I commented that out and reran with a setting of
-i 30
to create the 31 scaffolds_ITERATION_# agp files.I'm attaching the text of stats_scaffolds_ITERATION.txt (I changed the suffix to txt for uploading) a short awk script that printed out the following info below for each of the agps. The columns are filename, total of scaffold lengths, number of scaffolds and various N#/L# values
The asterisk before a value means it's the same as it was in the prior agp file.
You can see the N50 test has iter_7 as the first repeat, but using number of scaffolds it's iter_10, which has all the others values repeated from iter_9 as well.
If we allow for an additional iteration look-ahead then iter_11 gives us something new. We get a repeat of iter_12 at iter_13, but again a 1 iter look ahead gets us something new at iter_14 and then we have the first double repeats (i.e., 3 of the same set of values in a row) starting at iter_16, with 8 sets of the same values in a row. So that improves from iter_6 140 scaffs N50 60,979,473 L50 16, to iter_16 126 scaffs N50 79,034,342 L50 14.
Anyway, food for thought. Thanks again for a great tool.
--Jim Henderson, California Academy of Sciences
output:
The text was updated successfully, but these errors were encountered: