Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sandbox log improvements #214

Open
2 tasks
uniqueg opened this issue Apr 29, 2024 · 0 comments
Open
2 tasks

Sandbox log improvements #214

uniqueg opened this issue Apr 29, 2024 · 0 comments

Comments

@uniqueg
Copy link
Contributor

uniqueg commented Apr 29, 2024

Problem

The logging capabilities in both WES and TES currently do not provide good support for structured information, making it difficult for clients to interpret workflow and task run logs. It is not well defined where implementations log what and how, only very few properties are required, support for external (GA4GH or third party) schemas is missing and the the expected logging behavior between WES and TES differs perhaps more than it needs to.

Proposed solution

This issue is a sandbox for discussung, in a single place, improved logging capabilities provided by the WES and TES specs. For reference, relevant schemas are included in section "Additional context".

Improvements could consist of, for example (not exhaustive):

  • Streamlined, consistent log handling withing and across WES and TES
  • Extended documentation of expected logging behavior
  • Support for (more) structured logs, including those of external schemas

The primary goal of the improvements are to make WES and TES logs more useful for clients. Increase maintainability of the specifications and a better integration of WES and TES behavior are secondary goals.

Please open individual issues that address individual improvements, then link them in the task list below (more info on GitHub task lists).

In this way, we can keep track of all log-related issues and have a single place where we can consider how all proposed changes would impact one another.

Task list

Additional context

This section includes references to relevant schemas of the most recent WES (1.1.0) and TES (1.1) releases including minimal examples, preliminary notes and obvious issues that caught my eye when compiling this information.

Relevant WES schemas

Log
  • Usage:
    • Schema for the run_log property in RunLog, which is the schema for the response of GET /runs/{run_id}
    • Alternative to TaskLog as array item schema for the deprecated task_logs property in 'RunLog', which is the schema for the response of GET /runs/{run_id}
    • Schema that the 'TaskLog' schema inherits from
  • Schema:
    Log:
      title: Log
      type: object
      properties:
        name:
          type: string
          description: The task or workflow name
        cmd:
          type: array
          items:
            type: string
          description: The command line that was executed
        start_time:
          type: string
          description: When the command started executing, in ISO 8601 format "%Y-%m-%dT%H:%M:%SZ"
        end_time:
          type: string
          description: When the command stopped executing (completed, failed, or cancelled), in ISO 8601 format "%Y-%m-%dT%H:%M:%SZ"
        stdout:
          type: string
          description: A URL to retrieve standard output logs of the workflow run or task.  This URL may change between status requests, or may not be available until the task or workflow has finished execution.  Should be available using the same credentials used to access the WES endpoint.
        stderr:
          type: string
          description: A URL to retrieve standard error logs of the workflow run or task.  This URL may change between status requests, or may not be available until the task or workflow has finished execution.  Should be available using the same credentials used to access the WES endpoint.
        exit_code:
          type: integer
          description: Exit code of the program
          format: int32
        system_logs:
          type: array
          items:
            type: string
    
          description: |-
            System logs are any logs the system decides are relevant,
            which are not tied directly to a workflow.
            Content is implementation specific: format, size, etc.
    
            System logs may be collected here to provide convenient access.
    
            For example, the system may include an error message that caused
            a SYSTEM_ERROR state (e.g. disk is full), etc.
      description: Log and other info
  • Minimal example:
    {}
TaskLog
  • Usage:
    • Schema for the response of GET /runs/{run_id}/tasks/{task_id}
    • Array item schema for the task_logs property in TaskListResponse, which is itself the schema for the response of GET /runs/{run_id}/tasks
    • Alternative to Log as array item schema for the deprecated task_logs property in RunLog, which is the schema for the response of GET /runs/{run_id}
  • Schema:
    TaskLog:
      title: TaskLog
      allOf:
        - $ref: '#/components/schemas/Log'
        - type: object
          properties:
            id:
              type: string
              description: A unique identifier which may be used to reference the task
            system_logs:
              type: array
              items:
                type: string
    
              description: |-
                System logs are any logs the system decides are relevant,
                which are not tied directly to a task.
                Content is implementation specific: format, size, etc.
                
                System logs may be collected here to provide convenient access.
                
                For example, the system may include the name of the host
                where the task is executing, an error message that caused
                a SYSTEM_ERROR state (e.g. disk is full), etc.
            tes_uri:
              type: string
              description: An optional URL pointing to an extended task definition defined by a [TES api](https://github.com/ga4gh/task-execution-schemas)
      required:
        - id
        - name
      description: Runtime information for a given task
  • Minimal example:
    {
      "id": "some_id",
      "name": "some_name"
    }
  • Notes:
    • Extends Log schema with additional properties id, system_logs and tes_uri; unlike Log, which has no required properties at all, requires TaskLog-specific id property and name property inherited from Log
    • Definition of system_logs is redundant, because it is already inherited from Log where it is defined almost identically; the only difference is the wording in the description, which could easily be generalized

Relevant TES schemas

tesTaskLog

  • Usage:
    • Array item schema for the logs property in tesTask, which is the schema for the body of POST /tasks and the response of GET /tasks/{id}
  • Schema:
    tesTaskLog:
      required:
      - logs
      - outputs
      type: object
      properties:
        logs:
          type: array
          description: Logs for each executor
          items:
            $ref: '#/components/schemas/tesExecutorLog'
        metadata:
          type: object
          additionalProperties:
            type: string
          description: Arbitrary logging metadata included by the implementation.
          example:
            host: worker-001
            slurmm_id: 123456
        start_time:
          type: string
          description: When the task started, in RFC 3339 format.
          example: 2020-10-02T10:00:00-05:00
        end_time:
          type: string
          description: When the task ended, in RFC 3339 format.
          example: 2020-10-02T11:00:00-05:00
        outputs:
          type: array
          description: |-
            Information about all output files. Directory outputs are
            flattened into separate items.
          items:
            $ref: '#/components/schemas/tesOutputFileLog'
        system_logs:
          type: array
          description: |-
            System logs are any logs the system decides are relevant,
            which are not tied directly to an Executor process.
            Content is implementation specific: format, size, etc.
    
            System logs may be collected here to provide convenient access.
    
            For example, the system may include the name of the host
            where the task is executing, an error message that caused
            a SYSTEM_ERROR state (e.g. disk is full), etc.
    
            System logs are only included in the FULL task view.
          items:
            type: string
      description: TaskLog describes logging information related to a Task.
  • Minimal example:
    {
      "logs": [],
      "outputs": []
    }
  • Notes:
    • Unlike WES, has metadata property to provide arbitrary task-level logging information as key-value pairs

tesExecutorLog

  • Usage:
    • Array item schema for the logs property in tesTaskLog, which is the schema for the logs property in tesTask, which is itself the schema for the body of POST /tasks and the response of GET /tasks/{id}
  • Schema:
    tesExecutorLog:
      required:
      - exit_code
      type: object
      properties:
        start_time:
          type: string
          description: Time the executor started, in RFC 3339 format.
          example: 2020-10-02T10:00:00-05:00
        end_time:
          type: string
          description: Time the executor ended, in RFC 3339 format.
          example: 2020-10-02T11:00:00-05:00
        stdout:
          type: string
          description: |-
            Stdout content.
    
            This is meant for convenience. No guarantees are made about the content.
            Implementations may chose different approaches: only the head, only the tail,
            a URL reference only, etc.
    
            In order to capture the full stdout client should set Executor.stdout
            to a container file path, and use Task.outputs to upload that file
            to permanent storage.
        stderr:
          type: string
          description: |-
            Stderr content.
    
            This is meant for convenience. No guarantees are made about the content.
            Implementations may chose different approaches: only the head, only the tail,
            a URL reference only, etc.
    
            In order to capture the full stderr client should set Executor.stderr
            to a container file path, and use Task.outputs to upload that file
            to permanent storage.
        exit_code:
          type: integer
          description: Exit code.
          format: int32
      description: ExecutorLog describes logging information related to an Executor.
  • Minimal example:
    {
      "exit_code": []
    }
  • Notes:
    • More or less a subset of WES Log schema without systems_logs (available upstream at tesTask.logs), name (available upstream in tesTask) and cmd (available as command properties in tesTask.executors[]), and with exit_code being required
    • Handling STDOUT and STDERR differs significantly from WES; full STDOUT and STDERR are expected to be provided via tesExecutor.stdout and tesExecutor.stderr (set container file paths), and tesTask.outputs (file upload)

@patmagee @kellrott @vsmalladi @lbeckman314 @briandoconnor @dglazer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant