Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flink Kubernetes Support #2

Open
wants to merge 13 commits into
base: release-1.7
Choose a base branch
from
Open

Conversation

esevastyanov
Copy link

The current implementation of Kubernetes support is made for a session cluster only.
For additional information please see README file


## Task Manager
Task manager is a temporary essence and is created (and deleted) by a job manager for a particular slot.
No deployments/jobs/services are created for a task manager only pods.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"for a task manager, only pods" comma missing?

Example:
```
kubectl create -f jobmanager-deployment.yaml
kubectl create -f jobmanager-service.yaml

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jobmanager-exposer-deployment.yaml ?
Also, a question сomes up instantly how exactly it exposes?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That creates the deployment with one job manager and service around it that exposes
(ClusterIP/NodePort/LoadBalancer/ExternalName) the job manager
https://kubernetes.io/docs/concepts/services-networking/service/#publishing-services-service-types

TBD

## Kubernetes Resource Management
Resource management uses a default service account every pod contains. It should has admin privileges to be able

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"should have"

package org.apache.flink.kubernetes.client;

/**
* represent a endpoint.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder what endpoint?

void terminateClusterPod(ResourceID resourceID) throws KubernetesClientException;

/**
* stop cluster and clean up all resources, include services, auxiliary services and all running pods.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments begin with a capital letter and some don't

public Collection<ResourceProfile> startNewWorker(ResourceProfile resourceProfile) {
LOG.info("Starting a new worker.");
try {
nodeManagerClient.createClusterPod(resourceProfile);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So at higher level we provide a worker with one slot only, does that strategy have a downside?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, it is our basis, we consciously do the same on samza.
It's a reasonable solution because in this case, different slot threads will not compete for the CPU and memory (since task manager doesn't isolate these resources). Also recovering is easier. However, we will use a slot sharing feature and share slots between different Flink operations according to pipeline logic to get rid of high network usage between task managers.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for a downside, you asked, I may mention the absence of resource sharing. In the case of low job utilization, a task manager will simply stand idle without much load.
Also, in this case, there will be no slot grouping. This feature tends to reduce network traffic by allocating slots on a single task manager. However, we will use slot sharing instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants