Flink Kubernetes Support #2

esevastyanov · 2019-03-22T10:20:59Z

The current implementation of Kubernetes support is made for a session cluster only.
For additional information please see README file

allocating (or sharing) resources

egor-ryashin · 2019-03-25T18:39:48Z

flink-kubernetes/README.md

+
+## Task Manager
+Task manager is a temporary essence and is created (and deleted) by a job manager for a particular slot. 
+No deployments/jobs/services are created for a task manager only pods. 


"for a task manager, only pods" comma missing?

egor-ryashin · 2019-03-25T19:30:09Z

flink-kubernetes/README.md

+Example:
+```
+kubectl create -f jobmanager-deployment.yaml
+kubectl create -f jobmanager-service.yaml


jobmanager-exposer-deployment.yaml ?
Also, a question сomes up instantly how exactly it exposes?

That creates the deployment with one job manager and service around it that exposes
(ClusterIP/NodePort/LoadBalancer/ExternalName) the job manager
https://kubernetes.io/docs/concepts/services-networking/service/#publishing-services-service-types

egor-ryashin · 2019-03-25T19:35:54Z

flink-kubernetes/README.md

+TBD
+
+## Kubernetes Resource Management
+Resource management uses a default service account every pod contains. It should has admin privileges to be able 


"should have"

egor-ryashin · 2019-03-25T22:34:22Z

flink-kubernetes/src/main/java/org/apache/flink/kubernetes/client/Endpoint.java

+package org.apache.flink.kubernetes.client;
+
+/**
+ * represent a endpoint.


I wonder what endpoint?

egor-ryashin · 2019-03-25T22:35:35Z

flink-kubernetes/src/main/java/org/apache/flink/kubernetes/client/KubernetesClient.java

+	void terminateClusterPod(ResourceID resourceID) throws KubernetesClientException;
+
+	/**
+	 * stop cluster and clean up all resources, include services, auxiliary services and all running pods.


Some comments begin with a capital letter and some don't

egor-ryashin · 2019-04-02T23:31:25Z

flink-kubernetes/src/main/java/org/apache/flink/kubernetes/KubernetesResourceManager.java

+	public Collection<ResourceProfile> startNewWorker(ResourceProfile resourceProfile) {
+		LOG.info("Starting a new worker.");
+		try {
+			nodeManagerClient.createClusterPod(resourceProfile);


So at higher level we provide a worker with one slot only, does that strategy have a downside?

For now, it is our basis, we consciously do the same on samza.
It's a reasonable solution because in this case, different slot threads will not compete for the CPU and memory (since task manager doesn't isolate these resources). Also recovering is easier. However, we will use a slot sharing feature and share slots between different Flink operations according to pipeline logic to get rid of high network usage between task managers.

As for a downside, you asked, I may mention the absence of resource sharing. In the case of low job utilization, a task manager will simply stand idle without much load.
Also, in this case, there will be no slot grouping. This feature tends to reduce network traffic by allocating slots on a single task manager. However, we will use slot sharing instead.

esevastyanov added 10 commits February 19, 2019 17:48

Pass jobVertex's ResourceProfile to SlotPool and ResourceManager

897860f

Added a comment for new method getResourceProfile()

44e14db

Use min resources instead of preferred to increase the probability of

a8cc09b

allocating (or sharing) resources

BACKEND-1137: K8s integration - initial commit

8b988ba

BACKEND-1137: K8s integration

e4da06d

BACKEND-1137: K8s integration - fixed a script

f8fb172

BACKEND-1137: K8s integration - refactoring

60094aa

BACKEND-1137: K8s integration - README

ccd1318

BACKEND-1137: Fixed style

2028d72

BACKEND-1137: Refactored and added some comments

41e2f66

egor-ryashin suggested changes Mar 26, 2019

View reviewed changes

esevastyanov added 3 commits March 29, 2019 19:22

BACKEND-1137: Refactored and added some comments

ec62634

BACKEND-1137: Refactored KubernetesClusterDescriptor

9efe807

BACKEND-1137: Refactored KubernetesClusterDescriptor and dependent

54f7c7d

egor-ryashin reviewed Apr 2, 2019

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flink Kubernetes Support #2

Flink Kubernetes Support #2

esevastyanov commented Mar 22, 2019

egor-ryashin Mar 25, 2019

egor-ryashin Mar 25, 2019

esevastyanov Apr 3, 2019

egor-ryashin Mar 25, 2019

egor-ryashin Mar 25, 2019

egor-ryashin Mar 25, 2019

egor-ryashin Apr 2, 2019

esevastyanov Apr 3, 2019

esevastyanov Apr 3, 2019

Flink Kubernetes Support #2

Are you sure you want to change the base?

Flink Kubernetes Support #2

Conversation

esevastyanov commented Mar 22, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment