Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Job keeps failing due to time limit of 7 days enforced by cluster. #712

Open
JohnUrban opened this issue Oct 9, 2024 · 2 comments
Open

Comments

@JohnUrban
Copy link

Hi,

As always, thank you for your tools, innovation, and help on issues.

I've launched a HiFiasm job twice now. Both times it failed to complete in 7 days.

The first time it did finish making the bin files and such.

So the second time it was able to start from there.

However, from what I can tell, there were no additional files written or any new landmarks for HiFiasm to start from.

So, I have requested a relaunch, but I am not hopeful it will finish. I think it will just be a repeat of the first re-launch.

Some details:

  • I have HiFi data, ONT Ultra-long reads, and Hi-C data.

Is there a stepwise set of commands I can run to help break this job up into smaller pieces that can each finish in 7 days or less?

Any guidance would be much appreciated.

Best,

John

@JohnUrban
Copy link
Author

JohnUrban commented Oct 16, 2024

Well the job did not finish (again), as expected.

Well, I am now trying to give more memory and threads. Instead of 32, I am trying up to 80 threads.

Hopefully this can finish in 7 days. At the moment, the cluster says it won't even start for another 10 days though.

@JohnUrban
Copy link
Author

Alright. So here is more background and my current solution.

Datasets:

  • two sets of HiFi reads, each ~250-300X (i.e. 500-600X total)
  • 1 set of ONT ultra-long reads, ~120X
  • Hi-C reads

Solution:

  • I down-sampled to 100X of the longest HiFi reads, and optionally down to 100X of the longest ONT reads.

HiFiasm finished in a day with 100 GB memory allotted to it. This is in comparison to needing to allot 500-1500 GB memory and it not finishing even within 7 days.

Any further discussion on down-sampling would be appreciated.

  • Is down-sampling to 100X of the longest reads a recommended practice?
  • Would a higher or lower coverage target for down-sampling be more recommended than 100X?
  • Are there problems, other than memory and time, that arise when coverage is too high? For example, perhaps some errors begin to look like real variation?

And as for how this issue/insight could potentially enhance the user experience for HiFiasm:

  • Could there be a down-sampling option built into HiFiasm to tell it to just use 100X or 100X of longest reads? Or even down-sampling by read length minimum, i.e. reads > _ kb?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant