Description of problem: In an OCP 3.7.42 cluster, eviction manager is detecting DiskPressure and showing the following log messages: MESSAGE=I0417 04:11:39.799428 51434 helpers.go:819] eviction manager: observations: signal=allocatableNodeFs.available, available: 580589153, capacity: 4086Mi MESSAGE=I0417 04:11:49.867444 51434 helpers.go:819] eviction manager: observations: signal=allocatableNodeFs.available, available: 580589153, capacity: 4086Mi MESSAGE=I0417 04:11:59.938085 51434 helpers.go:819] eviction manager: observations: signal=allocatableNodeFs.available, available: -1566894495, capacity: 4086Mi MESSAGE=I0417 04:11:59.938098 51434 helpers.go:833] eviction manager: thresholds - ignoring grace period: threshold [signal=allocatableNodeFs.available, quantity=0] observed -1566894495 MESSAGE=I0417 04:11:59.938103 51434 helpers.go:833] eviction manager: thresholds - reclaim not satisfied: threshold [signal=allocatableNodeFs.available, quantity=0] observed -1566894495 MESSAGE=I0417 04:11:59.938107 51434 helpers.go:833] eviction manager: thresholds - updated stats: threshold [signal=allocatableNodeFs.available, quantity=0] observed -1566894495 MESSAGE=I0417 04:11:59.938131 51434 helpers.go:833] eviction manager: thresholds - grace periods satisified: threshold [signal=allocatableNodeFs.available, quantity=0] observed -1566894495 MESSAGE=I0417 04:12:30.214136 51434 helpers.go:819] eviction manager: observations: signal=allocatableNodeFs.available, available: -1566644639, capacity: 4086Mi As you can see, available disk space is showing negative figures. There is plenty of available disk space, though. After running some tests, it seems that this is an issue affecting overlayfs driver, since we weren't able to reproduce it using devicemapper. Version-Release number of selected component (if applicable): OCP 3.7.42 How reproducible: The Issue is reproducible in OCP 3.7.X nodes with a dedicated /var/lib/docker partition configured with overlay2 storage driver and with size higher than /var/lib/origin/openshift.local.volumes/ Steps to Reproduce: - Configure atomic-openshift-node service with loglevel 5 - Configure docker to use json-file as log-driver in the OCP nodes # /etc/sysconfig/docker OPTIONS=' --selinux-enabled --log-driver=json-file --signature-verification=False' - Create a very verbose container like, (e.g a container with a yes entrypoint command) and it to a number equal or higher to the number of schedulable nodes oc run --image=registry.access.redhat.com/rhel:latest yes yes - Monitorize atomic-openshift-node output in a node with a running "yes" pod journalctl -f -u atomic-openshift-node | egrep -i "allocatableNodeFs.available Some time later we should see a log trace similar to this, where we can figure out of a negative result of the available allocatableNodeFs eviction manager: thresholds - ignoring grace period: threshold [signal=allocatableNodeFs.available, quantity=0] observed -2164894492 Once this trace is shown the node shows a DiskPressure condition when we describe it using the oc command Actual results: allocatableNodeFs.available showing negative disk space available, while there is plenty of space. Expected results: Disk space is detected properly. Additional info:
Verify on v3.7.51, SignalAllocatableNodeFsAvailable is removed. and other eviction signal value is correct. May 28 21:43:20 ip-172-18-15-142 atomic-openshift-node: I0528 21:43:20.735927 7155 eviction_manager.go:221] eviction manager: synchronize housekeeping May 28 21:43:20 ip-172-18-15-142 atomic-openshift-node: I0528 21:43:20.758393 7155 helpers.go:766] eviction manager: observations: signal=imagefs.inodesFree, available: 15846373, capacity: 15510Ki, time: 2018-05-28 21:43:09.775816185 -0400 EDT May 28 21:43:20 ip-172-18-15-142 atomic-openshift-node: I0528 21:43:20.758440 7155 helpers.go:768] eviction manager: observations: signal=allocatableMemory.available, available: 16163488Ki, capacity: 16265888Ki May 28 21:43:20 ip-172-18-15-142 atomic-openshift-node: I0528 21:43:20.758448 7155 helpers.go:766] eviction manager: observations: signal=memory.available, available: 15644552Ki, capacity: 16265888Ki, time: 2018-05-28 21:43:09.775816185 -0400 EDT May 28 21:43:20 ip-172-18-15-142 atomic-openshift-node: I0528 21:43:20.758455 7155 helpers.go:766] eviction manager: observations: signal=nodefs.available, available: 29676084Ki, capacity: 31010Mi, time: 2018-05-28 21:43:09.775816185 -0400 EDT May 28 21:43:20 ip-172-18-15-142 atomic-openshift-node: I0528 21:43:20.758462 7155 helpers.go:766] eviction manager: observations: signal=nodefs.inodesFree, available: 15846373, capacity: 15510Ki, time: 2018-05-28 21:43:09.775816185 -0400 EDT May 28 21:43:20 ip-172-18-15-142 atomic-openshift-node: I0528 21:43:20.758469 7155 helpers.go:766] eviction manager: observations: signal=imagefs.available, available: 29676084Ki, capacity: 31010Mi, time: 2018-05-28 21:43:09.775816185 -0400 EDT May 28 21:43:20 ip-172-18-15-142 atomic-openshift-node: I0528 21:43:20.758485 7155 eviction_manager.go:323] eviction manager: no resources are starved
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:1798