-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
balloons: add support for explicit CPUs #352
Conversation
This commit introduces a new field in the balloons configuration CR, enabling users to specify their CPU preferences for workloads. When a user explicitly requests a CPU set, the policy will attempt to allocate CPUs from the specified set. If the request cannot be satisfied, the policy will fall back to other available CPUs. If a user adds `preferIsolCpus` in addition to specifying preferred CPUs, the policy will prioritize allocating isolated CPUs over the preferred CPUs set, in case these sets are different. Signed-off-by: Feruzjon Muyassarov <[email protected]>
Signed-off-by: Feruzjon Muyassarov <[email protected]>
Signed-off-by: Feruzjon Muyassarov <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a good idea or something we both want and need ? So far, we have been deliberate and careful to not allow setting explicitly CPUs by ID for workloads.
I was expecting this question 😄 |
So @askervin knows about this and is okay with this ? I'll wait for his review then. I'd rather not open this Pandora's box if we can avoid it in any way. Usually the user does not really want any specific CPU or cpuset by ID(s). Instead they want their workload to be topologically close to some memory controller or one or more PCI devices and that's why they ask for the ability to set specific cpusets for workloads. However, almost always it is better to provide a way for the user/workload to describe what they need ("this workload needs to be close to a PCI device of a particular vendor and class" or "this workload needs to be close to that other workload") and then let the policy in use implement the necessary means of how that is achieved. So, on UI-/UX-visible interfaces (such as workload annotations or policy configuration) we ususlly prefer things to be described in terms of what is needed not in terms of how that could be achieved on some piece of hardware, especially since the latter might not be portable across cluster nodes. Of course, there are exceptions to every rule... |
@@ -154,6 +154,10 @@ Balloons policy parameters: | |||
separate `cpu.classes` objects, see below. | |||
- `preferCloseToDevices`: prefer creating new balloons close to | |||
listed devices. List of strings | |||
- `preferCpus`: prefer allocating CPUs from an explicitly defined | |||
set of CPUs. Warning: when combining `preferIsolCpus` with `preferCpus`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that you can remove word "Warning" from this sentence. This is a good specification about the priorities of the two preferences.
This feature is definitely worth a warning, though, because it breaks the idea that the policy can decide which CPUs it puts to which balloon instance. Using this feature makes the policy configuration platform dependent. Things are very likely go wrong if this is for all nodes in a heterogenous clusters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fmuyassarov, you can see excellent content for the warning in @klihub's comment: #352 (comment)
@@ -56,6 +56,9 @@ const ( | |||
// virtDevIsolatedCpus is the name of a virtual device close to | |||
// host isolated CPUs. | |||
virtDevIsolatedCpus = "isolated CPUs" | |||
// virtDevPreferredCpus represents a virtual device that is | |||
// associated with a user-preferred set of CPUs | |||
virtDevPreferredCpus = "user preferred CPUs" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a single static virtual device name, but there can be many balloon types that have different preferCpus parameter sets.
Maybe you should have two or three balloon types with preferCpus parameters in e2e tests. Two of those could have even overlapping preferCpus. And in addition there should be at least one balloon type without this parameter. Then you can make sure that balloons of the balloon type without the preferCpus parameter will get CPUs that are not mentioned in any of the preferCpus of other balloon types.
When thinking of this case, it sounds to me that you will need a bunch of new virtual devices,. One for each balloon type with this parameter.
@@ -1202,6 +1206,9 @@ func (p *balloons) fillCloseToDevices(blnDefs []*BalloonDef) { | |||
if blnDef.PreferIsolCpus { | |||
blnDef.PreferCloseToDevices = append(blnDef.PreferCloseToDevices, virtDevIsolatedCpus) | |||
} | |||
if cpuset.MustParse(blnDef.PreferCpus).Size() != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no error handling for invalid preferCpus, and the value is parsed more than once. The opposite would be better.
Would you change this so that you parse each preferCpus only once and store the value to an internal attribute of balloon type? If you'd do this as part of configuration validation, then you could give a nice error message on parse errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MustParse
doesn't return an error so I guess moving the caller to validateConfig won't change anything in terms of error handling. But, I think we can use cpuset.Parse()
similarly to this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and I'm okay to move the calling the MustParse/Parse to validateConfig()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But probably we should all agree first if this PR is something that can be beneficial and merged. Because otherwise I will waste my & your time trying to fix something that we don't want to take in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MustParse
doesn't return an error so I guess moving the caller to validateConfig won't change anything in terms of error handling. But, I think we can usecpuset.Parse()
similarly to this?
Yes, exactly. We definitely do not want to crash on a typo in a config.
But probably we should all agree first if this PR is something that can be beneficial and merged. Because otherwise I will waste my & your time trying to fix something that we don't want to take in.
I agree. I think this PR is a very good start for this feature, thank you for writing it! Let's continue if @klihub and @kad show green(ish) light for this. I think you can put this PR into the draft state.
Would you proceed with preferCoreType (or preferPCores/preferECores... feel free to decide what would be the most suitable parameter name) instead? You should be able to test it in our e2e vm by adding environment variables to balloons policy container: look for OVERRIDE_SYS_CORE_CPUS and OVERRIDE_SYS_ATOM_CPUS from the code. Those variables enable faking that there are P and E cores in the system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you proceed with preferCoreType (or preferPCores/preferECores... feel free to decide what would be the most suitable parameter name) instead? You should be able to test it in our e2e vm by adding environment variables to balloons policy container: look for OVERRIDE_SYS_CORE_CPUS and OVERRIDE_SYS_ATOM_CPUS from the code. Those variables enable faking that there are P and E cores in the system.
Yes, I will proceed with that meanwhile
Based on the requests and wishes seen so far regarding CPU affinity, I'm leaning towards handing the gun to the user with all the big fat warnings on it (many picked from your inputs above). Use this only in case of emergency. You will lose all the nice features. The warranty is void if this seal is broken. Not for production use. The reason is that I'm sure that also in the future there will be special cases where balloons simply is not smart enough out-of-the-box. There could be a special process or a feature or what not running/watching/tracing/enabled exactly on certain core, for instance. Or it might be that (once again) balloons would not be smart enough to fulfill a requirement as the precedence of its preferences is not configurable or something. And because there is no easy alternative for handling such special cases, it would be sad if affinity of single difficult workload would prevent using the balloons policy (and naturally the topology aware policy, too). For this kind of purposes I'd like to enable users to have this parameter in their sleave if everything else fails and there is no time to wait until balloons has proper support for their request. But to be honest, I'm not leaning towards this direction without hesitation. I see it as a risk if anyone builds a production system that uses this option. I see it also as a risk that we might not get a feature request if someone can hack around their problem by using this feature. These are big risks. Maybe we could add to the list of warnings: "never use this parameter without telling the resource policy developers what they should implement to this policy so that you could stop using this parameter". If you see better, I think we can postpone this feature and let @fmuyassarov work on P/E cores, which is definitely worth supporting. And adding that support immediately removes one reason to use preferCpus. :D |
@askervin If you tell me that we need this, and that we can paint it with big enough "dontcha shoot yourself in the foot" warnings, then I trust your judgement and am fine with it, too. |
Let's "sleep over" with that configuration option. I agree that giving exact cpuset to the user might be usable in some rare scenarios, as well, it would lead to many of corner cases: e.g. multiple balloons that have overlapping cpusets, or in some cpus being allocated to other ballons in normal policy way if this balloon can be shrink to 0, etc,etc... |
I am totally on board with that, if that is good enough for @askervin. |
Let's mark this as a draft, while we sleep/sit on it. |
Hi, I have a usecase, where we limit certain non-k8s pod workloads on our clusters (systemd services like kube-api, kube-scheduler and other "infra" services) to only use a certain CPU set. the goal is to have "infra" and "workload" completely separate using balloons, and I don't see a way to create that separation without this feature. Would this be something balloon policy is able to do? or do you have an idea on a feature that is less dangerous that would allow us to do this (I don't mind coding it, but I don't have any other idea on how to solve it)? optionally, would an "excludeCPUs" option to list CPUs to exclude from balloon scheduling for host workloads that are not pods tl;dr - we have cpu cores where we don't want workloads running (generally would prefer allowing certain workloads to run and disallowing others) P.S |
@NoamNakash |
@NoamNakash, sorry about the delay. After quite a few discussions we ended up with a decision not to implement this feature, at least not at this point. I will elaborate this in a separate comment. For your particular use case, that is separating non-k8s processes on dedicated CPUs, there is an option called Do you still think it works for you? |
We have decided not to implement this feature for now. The reason is that preferring explicitly listed CPUs invites users to misuse this policy in non-scalable and error prone ways that will lead into suboptimal CPU affinity. This option also breaks our design principle by turning the policy configuration from descriptive what kind of CPUs are needed into imperative which CPUs are needed. This said, we acknowledge that the policy will never offer all possible options to tell what kind of CPUs are needed. If you find a missing option, we invite you to share your use case and possibly propose options that would help solving that. Until the missing option is not integrated, or you wish to use the balloons or the topology-aware policy to manage your Kubernetes workloads except for few special Kubernetes workloads that you want to run on explicitly listed CPUs, we advice you to use the following workaround.
|
Introduce a new parameter (
preferCpus
) in Balloons policy for giving preferred CPU sets on each balloon type.When combining
preferIsolCpus
withpreferCpus
, priority will be given to isolated CPUs over the user-specified set of CPUs if these are two different sets.