Skip to content

Analysis

Adina Shanholtz edited this page Mar 20, 2023 · 1 revision

Analysis

Issues Addressed


Job failed immediately

If a workflow you start has a task that failed immediately and lead to workflow failure be sure to check your input JSON files. Follow the instructions here and check out an example WDL and inputs JSON file here to ensure there are no errors in defining your input files.

For files hosted on an Azure Storage account that is connected to your Cromwell on Azure instance, the input path consists of three parts - the storage account name, the blob container name, file path with extension, following this format:

/<storageaccountname>/<containername>/<blobName>

Example file path for an "inputs" container in a storage account "msgenpublicdata" will look like "/msgenpublicdata/inputs/chr21.read1.fq.gz"

Another possibility is that you are trying to use a storage account that hasn't been mounted to your Cromwell on Azure instance - either by default during setup or by following these steps to mount a different storage account.

Check out these known issues and mitigation for more commonly seen issues caused by bugs we are actively tracking.

Check Azure Batch account quotas

If you are running a task in a workflow with a large cpu cores requirement, check if your Batch account has enough resource quotas. You can request more quotas by following these instructions.

For other resource quotas, like active jobs or pools, if there are not enough resources available, Cromwell on Azure keeps the tasks in queue until resources become available. This may lead to longer wait times for workflow completion.

Set up my own WDL

To get started you can view this Hello World sample, an example WDL to convert FASTQ to UBAM or follow these steps to convert an existing public WDL for other clouds to run on Azure.
There are also links to ready-to-try WDLs for common workflows here

Instructions to write a WDL file for a pipeline from scratch are COMING SOON.

Check all tasks running for a workflow using batch account

Each task in a workflow starts an Azure Batch VM. To see currently active tasks, navigate to your Azure Batch account connected to Cromwell on Azure on Azure Portal. Click on "Jobs" and then search for the Cromwell workflowId to see all tasks associated with a workflow.

Batch account

Find which tasks failed in a workflow

Cosmos DB stores information about all tasks in a workflow. For monitoring or debugging any workflow you may choose to query the database.

Navigate to your Cosmos DB instance on Azure Portal. Click on the "Data Explorer" menu item, click on the "TES" container, and select "Items".

Cosmos DB SQL query

You can write a SQL query to get all tasks that have not completed successfully in a workflow using the following query, replacing workflowId with the id returned from Cromwell for your workflow:

SELECT * FROM c where startswith(c.description,"workflowId") AND c.state != "COMPLETE"

OR

SELECT * FROM c where startswith(c.id,"<first 9 character of the workflowId>") AND c.state != "COMPLETE"

Make sure there are no Azure infrastructure errors

When working with Cromwell on Azure, you may run into issues with Azure Batch or Storage accounts. For instance, if a file path cannot be found or if the WDL workflow failed for an unknown reason. For these scenarios, consider debugging or collecting more information using Application Insights.

Navigate to your Application Insights instance on Azure Portal. Click on the "Logs (Analytics)" menu item under the "Monitoring" section to get all logs from Cromwell on Azure's TES backend.

App insights

You can explore exceptions or logs to find the reason for failure and use time ranges or Kusto Query Language to narrow your search.

Check Azure Storage Tier

Cromwell utilizes Blob storage containers and Blobfuse to allow your data to be accessed and processed. The Blob Storage Access Tier can have a demonstrable effect on your analysis time, particularly on your initial VM preparation. If you experience this, we recommend setting your access tier to "Hot" instead of "Cool". You can do this under the "Access Tier" settings in the "Configuration" menu on Azure Portal. NOTE: this only affects users utilizing Gen2 Storage Accounts. All Gen 1 "Standard" blobs are access tier "Hot" by default.

Clone this wiki locally