You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Quilt's Workflow documentation mentions three benefits: consistency, completeness, and context. There should really be a fourth benefit, though it doesn't start with the letter c: freshness.
Let's say I have two tags: author, and code. Today's Quilt Workflows don't support push-level completeness. Before I give an illustrative example, let me explain how my organization uses Quilt.
My institution uses Quilt for data and GitHub for code. When a user pushes to an existing Quilt package, we require the user to specify two tags:
author: An email address. If someone has a question about the data captured in this package version, this is the person they can reach for questions
code: A link to a commit on GitHub. If someone wants to look at the code that produced the data, they can visit this link
We actually require more than those two tags, but I'm omitting them for simplicity.
Now let me illustrate the problem based on how workflows are designed today.
Person A pushes a change to a package. They update the author tag to point to their email address address. They update the code to point to their code.
The next day Person B pushes a change to the same package (after they "browse" the latest version, of course). Person B isn't aware of the company's package stanards. They don't know to update the author or code tag. Person B pushes their change to Quilt. Quilt Worklows examine the push and says, "look great!" and lets it go through. Now the tags are wrong. It looks like Person A authored two versions. Person B doesn't show up anywhere. The data from the second version has nothing to do with the code tag that the data is now associated with.
Fast forward a year. Person C looks at the package and has a question about the second version. They look at the code tag and view the code. The code doesn't make any sense - how could it have produced this data? So they look at the author tag. It says Person A. So they contact Person A. Person A has no idea what Person C is talking about. Person A says their code doesn't do that and they have no idea where the data came from. Person A says Person C will have to send an email to the entire department to figure out who might have produced the data from the second version. That's painful.
Person B completely ignores the tags and performs a push. Ideally they should get an error reminding them to update the tags. Today this code works fine.
Quilt's Workflow capability already supports throwing errors at the push level for messages (if the user forgets to add a message to their push they get an error reminding them). Ideally Quilt could support similar errors for tags per push too.
I don't have a strong preference about how or where the config.yml or .json should allow the user to specify which tags need to be updated as part of the push as long as I can specify it somewhere.
The text was updated successfully, but these errors were encountered:
Quilt's Workflow documentation mentions three benefits: consistency, completeness, and context. There should really be a fourth benefit, though it doesn't start with the letter
c
: freshness.Let's say I have two tags:
author
, andcode
. Today's Quilt Workflows don't support push-level completeness. Before I give an illustrative example, let me explain how my organization uses Quilt.My institution uses Quilt for data and GitHub for code. When a user pushes to an existing Quilt package, we require the user to specify two tags:
author
: An email address. If someone has a question about the data captured in this package version, this is the person they can reach for questionscode
: A link to a commit on GitHub. If someone wants to look at the code that produced the data, they can visit this linkWe actually require more than those two tags, but I'm omitting them for simplicity.
Now let me illustrate the problem based on how workflows are designed today.
author
tag to point to their email address address. They update thecode
to point to their code.author
orcode
tag. Person B pushes their change to Quilt. Quilt Worklows examine the push and says, "look great!" and lets it go through. Now the tags are wrong. It looks like Person A authored two versions. Person B doesn't show up anywhere. The data from the second version has nothing to do with thecode
tag that the data is now associated with.code
tag and view the code. The code doesn't make any sense - how could it have produced this data? So they look at theauthor
tag. It says Person A. So they contact Person A. Person A has no idea what Person C is talking about. Person A says their code doesn't do that and they have no idea where the data came from. Person A says Person C will have to send an email to the entire department to figure out who might have produced the data from the second version. That's painful.Example:
s3://example/.quilt/workflows/config.yml
contents:s3://example/schemas/minimal.json
contents:Person A
Person A produces some data.
Person A adds correct, up-to-date tags and pushes to the Quilt package.
Person B
Person B adds some data.
Person B completely ignores the tags and performs a push. Ideally they should get an error reminding them to update the tags. Today this code works fine.
Quilt's Workflow capability already supports throwing errors at the push level for messages (if the user forgets to add a message to their push they get an error reminding them). Ideally Quilt could support similar errors for tags per push too.
I don't have a strong preference about how or where the
config.yml
or.json
should allow the user to specify which tags need to be updated as part of the push as long as I can specify it somewhere.The text was updated successfully, but these errors were encountered: