Description of problem: Customer has a mounted /var/log as a separate partition, and when it fills up, any schedule container will be stuck in ContainerCreating status due to not having space on disk. * Even with /var/log space taken, node still have "Ready" status, so new pods will continually get stuck instead of scheduling to another pod. * Even if the pod outputs nothing and does not need any log space, it still gets stuck. Version-Release number of selected component (if applicable): 3.11.x How reproducible: Always. Steps to Reproduce: 1. Mount /var/log as a separate partition in any OpenShift cluster node; 2. Fill up all partition's space; 3. Try to create a pod in the cluster: it gets stuck in ContainerCreating status instead of scheduling to another node. Error is: "Failed create pod sandbox: mkdir /var/log/pods/<uid>: no space left on device" Actual results: * Cluster won't create pods on the node as they are stuck with ContainerCreating. This requires manual intervention of cluster administration. * Node still is in "Ready" state. Expected results: * Pods must be created since they don't really rely on /var/log to work; * if kubelet requires /var/log space for creating pods, it should monitor it.
Example of contents from /var/log/pods: # ls -lha /var/log/pods/bbf67ae8-a722-11e9-ae46-525400575a3e/app-cli/ total 0 drwxr-xr-x. 2 root root 19 Jul 15 13:05 . drwxr-xr-x. 3 root root 21 Jul 15 13:05 .. lrwxrwxrwx. 1 root root 165 Jul 15 13:05 0.log -> /var/lib/docker/containers/4c6e1da83b270d21197ff04be743b02de71327603657113d0dfd0781931ad50d/4c6e1da83b270d21197ff04be743b02de71327603657113d0dfd0781931ad50d-json.log
It appears that a previous RFE for this same request was closed by OCP Product Management. I think this one will go the same way unless there's a clear demonstrable impact to a large number of customers. Also, looking at the case, it looks like a possible workaround was suggested to the customer for now. I think we should evaluate whether this is an appropriate feature for 4.x (i.e., is 4.x still impacted by this?). And, if so, move this to Jira for tracking in the Node team's feature backlog. Seth, thoughts on this in 4.x?
Closing as dup of closed RFE. It isn't that we wouldn't like to do something about this, but we can't implement this in 3.x z-streams and for 4.x, it isn't an issue because custom partition layouts are not supported in RHCOS. *** This bug has been marked as a duplicate of bug 1574866 ***