Description of problem:
Observed on starter-ca-central-1 cluster. Our pre-upgrade diagnostics reported disk pressure on 10 compute nodes. Upon further investigation, all 10 nodes had had their container storage fully consumed by a single rogue pod (tropospheric) that continuously creates a core dump on the container storage volume.
Version-Release number of selected component (if applicable):
openshift v3.11.44
cri-o://1.11.9
How reproducible:
100%
Steps to Reproduce:
1. Create a project that continuously writes to the container storage volume.
Actual results:
Eventually, this project will consume the entire disk and ultimately criple the container runtime.
Expected results:
Ideally, a container would not be able to consume all of the shared resources that the container runtime is providing.
Additional info:
Sending to Installer to eval change for 3.x.
For 4.x, I opened https://jira.coreos.com/browse/NODE-163 to verify we can set `overlay.size` in /etc/containers/storage via the ContainerRuntimeConfigs CRD.