Description of problem: node log is repeating following messages: Jun 06 11:21:50 netdev28 atomic-openshift-node[24976]: I0606 11:21:50.696665 24976 fsHandler.go:131] du and find on following dirs took 1.898312093s: [ /var/lib/docker/containers/f01a2988c639323417c2acdf7c07511cfde49241ae52935c159c0542c404c916] Jun 06 11:22:29 netdev28 atomic-openshift-node[24976]: I0606 11:22:29.166540 24976 fsHandler.go:131] du and find on following dirs took 2.525366746s: [ /var/lib/docker/containers/f58933fe3229c70ae51f4600b031cf7ad2c951210f4dc961f440b39fa464d970] Version-Release number of selected component (if applicable): Development build from latest origin How reproducible: won't stop Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Master Log: Node Log (of failed PODs): PV Dump: PVC Dump: StorageClass Dump (if StorageClass used by PV/PVC): Additional info:
Can you provide dump of PV, PVC and PODs in use?
No PV/PVC configured. # oc get po NAME READY STATUS RESTARTS AGE docker-registry-4-2hdmp 1/1 Running 0 1h hello-rc-c9m05 1/1 Running 0 4d hello-rc is "hello openshift!"
This occurs because the node is running low on resources (https://github.com/kubernetes/kubernetes/issues/42164) which can easily happen because of https://bugzilla.redhat.com/show_bug.cgi?id=1459252; so I would say https://bugzilla.redhat.com/show_bug.cgi?id=1459252 is the root cause and this is just a symptom
This is a Dell R730 24 physical CPUs, 256G memory 10Ge networking. Which resource is running short? top: Tasks: 505 total, 2 running, 503 sleeping, 0 stopped, 0 zombie %Cpu(s): 2.8 us, 1.3 sy, 0.0 ni, 95.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 26386145+total, 20777251+free, 37240100 used, 18848852 buff/cache KiB Swap: 0 total, 0 free, 0 used. 21836817+avail Mem
Is this trying to delete something on disk? If so where/what is it?
No, it's cadvisor keeping track of filesystem stats and taking too long for some reason. It's out of scope of storage, I think this is a metrics issue
bounding this to Solly on the kube team to further debug
Spoke with Solly Ross. Problem was cause by go v1.8.1 build. files/directories were created and not cleaned up. Based on this the message is correct. So not a bug.
This was fixed in cadvisor https://github.com/google/cadvisor/pull/1766 for OCP 3.7+