Slow Image pulls when pulling 4-5 images with no soci index fromnvcr.io
#988
-
I described the issue here, cross-posting in this discussion for visibility. I am on I suspect that this is related to #931, as the snapshotter is trying to resolve a layer using the API path for manifests.
{
"error": "cannot unpack the layer: cannot fetch layer: size of descriptor is 0; unable to resolve: unable to resolve ref (nvcr.io/nvidia/k8s/container-toolkit@sha256:745cad9a8a1e0a0d92738687a85b5a314d324dfca7c2dc6f2b2111508f6fbec9): Head \"https://nvcr.io/v2/nvidia/k8s/container-toolkit/manifests/sha256:745cad9a8a1e0a0d92738687a85b5a314d324dfca7c2dc6f2b2111508f6fbec9\": HEAD https://nvcr.io/v2/nvidia/k8s/container-toolkit/manifests/sha256:745cad9a8a1e0a0d92738687a85b5a314d324dfca7c2dc6f2b2111508f6fbec9 giving up after 3 attempt(s)",
"key": "k8s.io/260/extract-45204369-6hIj sha256:a177c22b4e0d76a18351a1a31c666de1643a68f2a3b4c6408762ffef8e5318cc",
"level": "warning",
"msg": "failed to prepare snapshot; deferring to container runtime",
"parent": "k8s.io/259/sha256:63c5d3862f93c51ccb88bbc83cdc6a515e90e7d375631f0bee85b2f01b5cf715",
"time": "2023-12-08T14:43:25.786365548Z"
} In this case, When a GPU node comes up on EKS, we run 4-5 containers with images from I updated the retry configuration to, [http]
MinWaitMsec=15
MaxRetries=2 I do not have concrete numbers on my hand right now, but I could see a noticeable improvement. Do you have any suggestions? Can I express something like, "If there is no soci index present, immediately defer all the layers to container runtime." via the configuration? Let me know if you need any more information. Thanks a lot. Excited to integrate this in our EKS cluster! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
Hi, sorry for the lack of update here. It does look like the same issue as #931. Can you update to v0.5.0 and give it another try? |
Beta Was this translation helpful? Give feedback.
Hi, sorry for the lack of update here. It does look like the same issue as #931. Can you update to v0.5.0 and give it another try?