Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker memory_hard_limit bypasses quotas #9924

Closed
henrikjohansen opened this issue Jan 29, 2021 · 5 comments
Closed

Docker memory_hard_limit bypasses quotas #9924

henrikjohansen opened this issue Jan 29, 2021 · 5 comments

Comments

@henrikjohansen
Copy link

henrikjohansen commented Jan 29, 2021

Nomad version

Nomad v1.0.2+ent (8b533db)

Issue

It seems like quota accounting is done during job submission in respect to resources declared in the resource stanza. Quota limits for memory can thus be negated by the job operator using the memory_hard_limit task config option :(

PoC

This quota limits the default namespace to 4096MB memory :

$ nomad quota status default-quota 
Name        = default-quota
Description = Limit the shared default namespace
Limits      = 1

Quota Limits
Region  CPU Usage  Memory Usage  Network Usage
global  0 / 2500   0 / 4096      0 / inf

This jobspec declares a limit of 256MB memory and sets memory_hard_limit to twice the allowed memory consumption (8192MB).

$ cat example.nomad
job "example" {
  datacenters = ["dc1"]
  namespace = "default"

  group "cache" {
    network {
      port "db" {
        to = 6379
      }
    }

    task "redis" {
      driver = "docker"

      config {
        image = "redis:3.2"
        memory_hard_limit = 8192

        ports = ["db"]
      }

      resources {
        cpu    = 500
        memory = 256
      }
    }
  }
}

After you plan & run the job the quota looks like this :

$ nomad quota status default-quota 
Name        = default-quota
Description = Limit the shared default namespace
Limits      = 1

Quota Limits
Region  CPU Usage   Memory Usage  Network Usage
global  500 / 2500  256 / 4096    0 / inf

This is not desired behavior - at least not for us. Yes, we can block jobs from setting memory_hard_limit using sentinel policies but we have use-cases where this is needed (and you need to realize this is possible in the first place).

In reality, memory_hard_limit should count towards quota consumption just like the ordinary resources declaration.

@henrikjohansen henrikjohansen changed the title Tasks using the Docker driver can easily escaping quota limits for memory? Tasks using the Docker driver can easily escape quota limits for memory? Jan 29, 2021
@tgross
Copy link
Member

tgross commented Jan 29, 2021

Hi @henrikjohansen! This looks like a general misfeature of how we handle the memory_hard_limit case in the scheduler. The scheduler is fairly ignorant about fields that are specific to a task driver because of the way task drivers are implemented as plugins. We're in the middle of doing some design work for an upcoming oversubscription feature (#606, finally!) and I'll make sure we cover this use case when we discuss that.

Thanks for opening this issue -- feedback like this on ENT features is hugely valuable!

(cc'ing @mikenomitch as a heads up)

@tgross tgross changed the title Tasks using the Docker driver can easily escape quota limits for memory? Docker memory_hard_limit bypasses quotas Jan 29, 2021
@henrikjohansen
Copy link
Author

Hi @tgross. There are a least 3 issues here I think? 🤔

  • you can escape your quota because memory_hard_limit is not contributing to your quota consumption.
  • memory_hard_limit is not considered by the scheduler as I can schedule jobs with a hard limit of many times the amount of memory present on our largest nodes.
  • it should be possible to disallow the use of memory_hard_limit without having to write sentinel polices.

@schmichael
Copy link
Member

The resources.memory_max parameter added as part of Memory Oversubscription in Nomad 1.1 supersedes Docker's memory_hard_limit parameter and is taken into account by quotas.

it should be possible to disallow the use of memory_hard_limit without having to write sentinel polices.

This unfortunately still seems to be the case and is a bug we intend to fix. A config option will be added to disable memory_hard_limit. We will default to leaving it enabled for a time but encourage anyone relying on it to explicitly enable it as we intend to eventually default to disabling memory_hard_limit in favor of the global memory_max parameter.

@schmichael schmichael removed their assignment Jul 8, 2021
@schmichael
Copy link
Member

My plan right now is to remove Docker's memory_hard_limit outright since it is superseded by resources.memory_max. I hate breaking backward compatibility, but I think short term migration pain is better than long term maintenance of conflicting parameters and potentially a config flag to manipulate them.

Roadmap

  • Nomad 1.2 - If a jobspec has Docker's memory_hard_limit set, return a Warning that it will be removed and link to memory_max
  • Nomad 1.3 - Remove memory_hard_limit entirely, ignore it on existing jobs and deny jobs trying to register with it.

The ugly part is that since this is in driver config we don't normally validate it on the server. We'll need to add a special case to peek in for this particular field.

Feedback welcome! I'd love to "rip off the bandaid" with this one as it were and not add more features when we could just remove the deprecated one.

@henrikjohansen
Copy link
Author

@schmichael As per 1.9.x the memory_hard_limit option is still part of the Docker driver and has now had 3 additional years of potential production exposure making deprecation even harder.

I am going to close this issue - any ENT customer running into this issue also has Sentinel available as a makeshift band-aid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants