Skip to content
This repository has been archived by the owner on Aug 13, 2024. It is now read-only.

Placeholder for factfile knowing what software it needs #91

Open
jbeemster opened this issue Nov 24, 2016 · 10 comments
Open

Placeholder for factfile knowing what software it needs #91

jbeemster opened this issue Nov 24, 2016 · 10 comments

Comments

@jbeemster
Copy link
Contributor

jbeemster commented Nov 24, 2016

Something to the extent of software tags in the factfile that can then be used to lookup a docker image with the same software.

@ninjabear
Copy link
Contributor

There's a discussion on something similar to this in #50

The summary of that ticket is essentially we can point to the container the job should run in at the top-level of the factfile (and download it and run it there)

@alexanderdean
Copy link
Contributor

Hey @ninjabear - we debated the two approaches. Josh made a strong case in favor of de-coupling the factfile's dependencies from a specific Docker image - anyway I'll leave him to make the case in this ticket :-)

@jbeemster
Copy link
Contributor Author

The current idea in play:

  1. A Factfile need only know about what software it will need to perform its job (in the form of comma delimited tags)
  2. A Factotum server will be able to use these tags to fetch a particular docker image which has the matching software set
  3. Once a matching image has been found we can spit the Factfile into a new docker container

From a monitoring and management perspective this means that the server need only keep a record of container ids (possibly with a worker hostname for cluster?) to know which jobs are running.

What this setup also lets us achieve is that if we wanted to run Factotum inside a different service than Docker we have the abstraction level already.

This keeps Factotum very self-contained and lets all the manipulation been done at a higher level.

@alexanderdean
Copy link
Contributor

alexanderdean commented Nov 24, 2016

To me I don't think the container IDs are the critical thing - Factotum already has some great IDs for tracking jobs.

I think the main argument is that by having an abstraction layer (factfile -> software dependencies -> Docker image), you don't end up with references to stale Docker images inside your factfiles.

Also as Josh says, it means Factotum runner doesn't really care whether you are using Docker/Rocket/a well-configured host machine.

Does it make sense @ninjabear ? Happy to hear a counter-arg.

@ninjabear
Copy link
Contributor

ninjabear commented Nov 24, 2016

I think we're saying basically the same thing. The only difference to this and what I'm proposing is that:

  1. Rather than working out what's on a set of containers, we group that software together and give it a name and a location. For example "snowplow-stable" or "snowplow-canary". Jobs can be pegged to releases in the same way, e.g. "snowplow-83"

This means we don't have to know what's in a container, or try and "best fit" a job to a container. Having a map of software to containers has to live somewhere - if we don't need it there's no need to maintain it. This also removes the edge case that two containers are capable of running the same job (which would probably make things hard to debug)

Doing it this way is also the path to downtime-free continuous releases. Since the factfiles haven't changed, and the container is cached by factotum prior to running, new releases can be pushed to the container repo, available for the next run. For example, pushing to snowplow-canary would mean the canary jobs would be using the updated container on their next run. We will need to add some additional contextual information with factotum server regardless, so the container id (in either use case) would be a necessary addition for working out which container it actually used.

Essentially we're just disconnecting the factfile and the environment as they change for different reasons - a job is responsible for sequencing events, an environment is responsible for providing all the tools and bits you need.

Edit: and what this means is that if I accidentally ship something daft, I don't have to bump version numbers in n factfiles

@alexanderdean
Copy link
Contributor

Thanks @ninjabear - super-interesting stuff. Can you suggest a syntax for this in the factfile?

@ninjabear
Copy link
Contributor

Sure, I was thinking something like this:

{
    "schema": "iglu:com.snowplowanalytics.factotum/factfile/jsonschema/1-0-0",
    "data": {
        "name": "echo order demo",
	"environment": {
		"registry_location": "xyz.com:5000",
		"image_name": "snowplow-canary",
		"type": "docker"
	}
        "tasks": [
            {
                "name": "echo alpha",
                "executor": "shell",
                "command": "echo",
                "arguments": [ "alpha" ],
                "dependsOn": [],
                "onResult": {
                    "terminateJobWithSuccess": [ 3 ],
                    "continueJob": [ 0 ]
                }
            },
            {
                "name": "echo beta",
                "executor": "shell",
                "command": "echo",
                "arguments": [ "beta" ],
                "dependsOn": [ "echo alpha" ],
                "onResult": {
                    "terminateJobWithSuccess": [ 3 ],
                    "continueJob": [ 0 ]
                }
            },
            {
                "name": "echo omega",
                "executor": "shell",
                "command": "echo",
                "arguments": [ "and omega!" ],
                "dependsOn": [ "echo beta" ],
                "onResult": {
                    "terminateJobWithSuccess": [ 3 ],
                    "continueJob": [ 0 ]
                }
            }
        ]
    }
}

where this is the section I've added:

	"environment": {
		"registry_location": "xyz.com:5000",
		"image_name": "snowplow-canary",
		"type": "docker"
	}

And the "registry" is something like the docker registries, though bintray seems to also support these

@alexanderdean
Copy link
Contributor

I like that! Maybe we make it "DOCKER" to make it more enum-y.

@jbeemster
Copy link
Contributor Author

Should we also add an optional credentials portion to the block for private images you might want to suck down?

@alexanderdean
Copy link
Contributor

Nice idea we could have templated vars

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants