Skip to content

Commit

Permalink
Document commands to use if jobs are held.
Browse files Browse the repository at this point in the history
  • Loading branch information
ktlim authored and timj committed Aug 29, 2022
1 parent 057766d commit df45e96
Showing 1 changed file with 16 additions and 0 deletions.
16 changes: 16 additions & 0 deletions doc/lsst.ctrl.bps.htcondor/userguide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,22 @@ initially run with 2 GB of memory and failed because of exceeding the limit,
result the entire workflow fails again due to other reasons, the job will ask
for 2 GB of memory during the first execution after the workflow is restarted.

If you did not set the ``memoryMultiplier`` option, this command:

.. code-block:: sh
condor_q -hold $USER
will show any held jobs and the reasons for their being held, including running out of memory.

This command:

.. code-block:: sh
condor_q $USER | awk '{if($6 == "H") print "condor_qedit",$1,"RequestMemory 4096; condor_release",$1}' | bash
will take any held jobs, change their requested memory size to 4096 MiB, and release them to be run again.

.. _htc-plugin-troubleshooting:

Troubleshooting
Expand Down

0 comments on commit df45e96

Please sign in to comment.