Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Provide optional lock to prevent concurrent pipeline execution #105

Open
tombriggsallego opened this issue Dec 13, 2022 · 4 comments

Comments

@tombriggsallego
Copy link

Meltano Version

2.8.0

Python Version

3.8

Bug scope

CLI (options, error messages, logging, etc.)

Operating System

Linux Ubuntu

Description

If we run meltano run tap-something some-mapper target-something and that pipeline is already running, meltano (correctly!) throws an "already running" error and exits. However, if instead we

  1. run meltano run tap-something some-mapper target-something dbt-postgres:run,
  2. wait for the tap+mapper+target block to finish and the dbt-postgres portion to start, and then
  3. run meltano run tap-something some-mapper target-something dbt-postgres:run again

meltano will run the entire pipeline again, ultimately resulting in multiple copies of the same dbt project running at once. :(

If it matters we execute meltano via cron. The tap/mapper/target portion usually only takes a few minutes, but dbt often takes 20+ minutes to run. We had been planning to schedule the job for every 15 minutes and let meltano block concurrent runs when dbt was running long but unfortunately this prevents that.

Code

No response

@tayloramurphy
Copy link
Contributor

@aaronsteers thoughts on how we could help out with this? Likely would just be checking if the same plugin:command is already executing, right?

@aaronsteers
Copy link
Contributor

aaronsteers commented Dec 16, 2022

Featurewise, we could declare a new plugin command property that specifies only one copy can run at a time. That limit would need to be per environment, so prod would never be blocked by devtest, for instance. The challenge is that I don't know if the way we are logging commands today would work the same way it does for EL. In theory, though, this definitely could work.

A second approach could be to create a dummy "command" before and after the dbt execution runs. That dummy command would basically "take" a lock and subsequently "release" the lock. You'd probably want to build a max-age of the lock, so it could self-heal, and you probably would want to have an explicit command to "release" the lock in cases that you know that its process is not running.

A third option, and I think I like this best, would be to build the second solution into the dbt-ext plugin itself, and/or into the EDK, and have the ability to use prehooks and posthooks to do the same thing inline.

The challenge then would be where to store the lock artifact. That could be easy or hard depending on the deployment scenario.

@tombriggsallego
Copy link
Author

tombriggsallego commented Jan 11, 2023

I built a version of @aaronsteers 's option 2. It is available here. It's not pretty but it seems to do the trick. I think ultimately option 3 is the ideal; adding two extra commands to achieve this makes for an ugly pipeline command. :( Extending the EDK is beyond my capabilities at the moment though. ;)

@stale

This comment was marked as resolved.

@WillDaSilva WillDaSilva transferred this issue from meltano/meltano May 11, 2023
@WillDaSilva WillDaSilva changed the title bug: Check for already running pipeline doesn't include transformers (or at least dbt...) feat: Provide optional lock to prevent concurrent pipeline execution May 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants