Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the ability of Flyteadmin to force terminate an execution #5833

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

RRap0so
Copy link
Contributor

@RRap0so RRap0so commented Oct 10, 2024

Tracking issue

Related to #5832

Why are the changes needed?

Add a force flag on the server side of Flyte to be able to terminate executions if FlytePropeller is not running in the kubernetes cluster.

What changes were proposed in this pull request?

This adds the force bool into ExecutionTerminateRequest. This in turn will be passed to the k8s_executor. If it's a Forced termination, it will remove the finalizer from the execution CRD and delete.

If it's not forced it keeps doing just a deletion (like it was before)

How was this patch tested?

Unit tested.

Setup process

Screenshots

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

TBD

Docs link

TBD

@RRap0so RRap0so changed the title Rafaelraposo/force abort impl (#4) Add the ability of Flyteadmin to force terminate an execution Oct 10, 2024
@RRap0so RRap0so marked this pull request as ready for review October 10, 2024 05:56
Copy link

codecov bot commented Oct 10, 2024

Codecov Report

Attention: Patch coverage is 57.14286% with 12 lines in your changes missing coverage. Please review.

Project coverage is 36.33%. Comparing base (197ae13) to head (2cc10ea).

Files with missing lines Patch % Lines
flyteadmin/pkg/workflowengine/impl/k8s_executor.go 66.66% 4 Missing and 1 partial ⚠️
flyteidl/gen/pb-go/flyteidl/admin/execution.pb.go 0.00% 4 Missing ⚠️
flytectl/cmd/delete/delete.go 50.00% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5833      +/-   ##
==========================================
+ Coverage   34.48%   36.33%   +1.84%     
==========================================
  Files        1138     1304     +166     
  Lines      102742   110152    +7410     
==========================================
+ Hits        35434    40026    +4592     
- Misses      63634    65956    +2322     
- Partials     3674     4170     +496     
Flag Coverage Δ
unittests-datacatalog 51.37% <ø> (ø)
unittests-flyteadmin 55.56% <68.75%> (-0.04%) ⬇️
unittests-flytecopilot 12.17% <ø> (ø)
unittests-flytectl 62.11% <62.50%> (?)
unittests-flyteidl 7.16% <0.00%> (-0.01%) ⬇️
unittests-flyteplugins 53.35% <ø> (ø)
unittests-flytepropeller 42.02% <ø> (ø)
unittests-flytestdlib 55.35% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

RRap0so and others added 3 commits October 11, 2024 07:47
* Refactor k8s executor abort

Signed-off-by: Rafael Raposo <[email protected]>
Signed-off-by: Rafael Raposo <[email protected]>
Signed-off-by: Rafael Raposo <[email protected]>
@wild-endeavor
Copy link
Contributor

@EngHabu and @hamersaw had some concerns around this I think? Finalizers are there to help ensure that any downstream resources are cleaned up properly right? Will this still happen properly if the finalizers are forcefully removed? What's the downside of doing this?

@pvditt
Copy link
Contributor

pvditt commented Oct 23, 2024

@wild-endeavor this is a follow up to #5788 that added finalizers to flyteworkflows on creation as opposed to waiting for propeller to pick them up and add finalizers. This was to handle a scenario where flyteworkflow resources were getting deleted prior to getting finalizers set causing issues with subworklow status getting stuck.

I'll take a look at this PR this evening.

PropagationPolicy: &deletePropagationBackground,
})

w, err := target.FlyteClient.FlyteworkflowV1alpha1().FlyteWorkflows(data.Namespace).Get(ctx, data.ExecutionID.GetName(), v1.GetOptions{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you put this fetch under if data.Force so we don't do this fetch if we're just deleting without removing finalizers first

@@ -94,9 +95,35 @@ func (e K8sWorkflowExecutor) Abort(ctx context.Context, data interfaces.AbortDat
if err != nil {
return errors.NewFlyteAdminErrorf(codes.Internal, err.Error())
}
err = target.FlyteClient.FlyteworkflowV1alpha1().FlyteWorkflows(data.Namespace).Delete(ctx, data.ExecutionID.GetName(), v1.DeleteOptions{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still want to re-add this after the if data.force{... code block

@@ -30,9 +32,18 @@ func RemoteDeleteCommand() *cobra.Command {
Short: deleteCmdShort,
Long: deleteCmdLong,
}
var force bool
deleteCmd.PersistentFlags().BoolVarP(&force, "force", "f", false, "Force deletion without confirmation")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unfamiliar if there are any examples, but would be nice if just needed to include the force flag.

Command could be:

flytectl delete execution kxd1i72850 -d development -p flytesnacks --force

@hamersaw
Copy link
Contributor

hamersaw commented Oct 23, 2024

Finalizers are there to help ensure that any downstream resources are cleaned up properly right? Will this still happen properly if the finalizers are forcefully removed?

Only comment is that we need to ensure that all k8s resources that Flyte starts are cleaned up as part of removing finalizers and deleting the CR. IIUC this is meant to be used on already terminal Flyte workflows, but there are no safeguards here - a user could "force" delete an active execution and that may orphan Pods. Am I understanding this correctly?

@EngHabu
Copy link
Contributor

EngHabu commented Oct 23, 2024

I thought about this more, I think it's an ok change because the finalizer isn't what guarantees cleanup, it's the OwnerReference and the propagation delete policy. The finalizer is there to allow propeller to act on the delete request for the workflow CRD.
My issue though is that propeller adds finalizers on the created resources as well, if we are removing the finalizer because propeller isn't running anymore, the workflow CRD will still hangaround because pods (and other resources) will have finalizers...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants