-
Notifications
You must be signed in to change notification settings - Fork 12
Placeholder for factfile knowing what software it needs #91
Comments
There's a discussion on something similar to this in #50 The summary of that ticket is essentially we can point to the container the job should run in at the top-level of the factfile (and download it and run it there) |
Hey @ninjabear - we debated the two approaches. Josh made a strong case in favor of de-coupling the factfile's dependencies from a specific Docker image - anyway I'll leave him to make the case in this ticket :-) |
The current idea in play:
From a monitoring and management perspective this means that the server need only keep a record of container ids (possibly with a worker hostname for cluster?) to know which jobs are running. What this setup also lets us achieve is that if we wanted to run Factotum inside a different service than Docker we have the abstraction level already. This keeps Factotum very self-contained and lets all the manipulation been done at a higher level. |
To me I don't think the container IDs are the critical thing - Factotum already has some great IDs for tracking jobs. I think the main argument is that by having an abstraction layer (factfile -> software dependencies -> Docker image), you don't end up with references to stale Docker images inside your factfiles. Also as Josh says, it means Factotum runner doesn't really care whether you are using Docker/Rocket/a well-configured host machine. Does it make sense @ninjabear ? Happy to hear a counter-arg. |
I think we're saying basically the same thing. The only difference to this and what I'm proposing is that:
This means we don't have to know what's in a container, or try and "best fit" a job to a container. Having a map of software to containers has to live somewhere - if we don't need it there's no need to maintain it. This also removes the edge case that two containers are capable of running the same job (which would probably make things hard to debug) Doing it this way is also the path to downtime-free continuous releases. Since the factfiles haven't changed, and the container is cached by factotum prior to running, new releases can be pushed to the container repo, available for the next run. For example, pushing to snowplow-canary would mean the canary jobs would be using the updated container on their next run. We will need to add some additional contextual information with factotum server regardless, so the container id (in either use case) would be a necessary addition for working out which container it actually used. Essentially we're just disconnecting the factfile and the environment as they change for different reasons - a job is responsible for sequencing events, an environment is responsible for providing all the tools and bits you need. Edit: and what this means is that if I accidentally ship something daft, I don't have to bump version numbers in n factfiles |
Thanks @ninjabear - super-interesting stuff. Can you suggest a syntax for this in the factfile? |
Sure, I was thinking something like this:
where this is the section I've added:
And the "registry" is something like the docker registries, though bintray seems to also support these |
I like that! Maybe we make it "DOCKER" to make it more enum-y. |
Should we also add an optional credentials portion to the block for private images you might want to suck down? |
Nice idea we could have templated vars |
Something to the extent of software tags in the factfile that can then be used to lookup a docker image with the same software.
The text was updated successfully, but these errors were encountered: