Bug 1459265

Summary:	journactl on node repeats: du and find on following dirs took
Product:	OpenShift Container Platform	Reporter:	Phil Cameron <pcameron>
Component:	Node	Assignee:	Seth Jennings <sjenning>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Xiaoli Tian <xtian>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	3.5.0	CC:	aos-bugs, aos-storage-staff, decarr, eparis, fshaikh, gblomqui, jokerman, mmccomas, schoudha
Target Milestone:	---	Keywords:	Reopened
Target Release:	3.11.z
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-06-18 14:09:37 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Phil Cameron 2017-06-06 16:47:15 UTC

Description of problem:
node log is repeating following messages:
Jun 06 11:21:50 netdev28 atomic-openshift-node[24976]: I0606 11:21:50.696665   24976 fsHandler.go:131] du and find on following dirs took 1.898312093s: [ /var/lib/docker/containers/f01a2988c639323417c2acdf7c07511cfde49241ae52935c159c0542c404c916]
Jun 06 11:22:29 netdev28 atomic-openshift-node[24976]: I0606 11:22:29.166540   24976 fsHandler.go:131] du and find on following dirs took 2.525366746s: [ /var/lib/docker/containers/f58933fe3229c70ae51f4600b031cf7ad2c951210f4dc961f440b39fa464d970]


Version-Release number of selected component (if applicable):
Development build from latest origin 

How reproducible:
won't stop

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:

Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:

StorageClass Dump (if StorageClass used by PV/PVC):

Additional info:

Comment 1 Bradley Childs 2017-06-06 17:01:15 UTC

Can you provide dump of PV, PVC and PODs in use?

Comment 2 Phil Cameron 2017-06-06 17:09:10 UTC

No PV/PVC configured.

# oc get po
NAME                      READY     STATUS        RESTARTS   AGE
docker-registry-4-2hdmp   1/1       Running       0          1h
hello-rc-c9m05            1/1       Running       0          4d


hello-rc is "hello openshift!"

Comment 3 Matthew Wong 2017-06-06 17:47:57 UTC

This occurs because the node is running low on resources (https://github.com/kubernetes/kubernetes/issues/42164) which can easily happen because of https://bugzilla.redhat.com/show_bug.cgi?id=1459252; so I would say https://bugzilla.redhat.com/show_bug.cgi?id=1459252 is the root cause and this is just a symptom

Comment 4 Phil Cameron 2017-06-06 18:22:40 UTC

This is a Dell R730 24 physical CPUs, 256G memory 10Ge networking. Which resource is running short?

top:
Tasks: 505 total,   2 running, 503 sleeping,   0 stopped,   0 zombie
%Cpu(s):  2.8 us,  1.3 sy,  0.0 ni, 95.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 26386145+total, 20777251+free, 37240100 used, 18848852 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 21836817+avail Mem

Comment 5 Phil Cameron 2017-06-06 18:30:28 UTC

Is this trying to delete something on disk? If so where/what is it?

Comment 6 Matthew Wong 2017-06-06 18:56:40 UTC

No, it's cadvisor keeping track of filesystem stats and taking too long for some reason. It's out of scope of storage, I think this is a metrics issue

Comment 7 Eric Paris 2017-06-06 19:15:53 UTC

bounding this to Solly on the kube team to further debug

Comment 8 Phil Cameron 2017-06-06 19:55:31 UTC

Spoke with Solly Ross. Problem was cause by go v1.8.1 build. files/directories were created and not cleaned up. Based on this the message is correct. So not a bug.

Comment 13 Seth Jennings 2019-06-18 14:09:37 UTC

This was fixed in cadvisor https://github.com/google/cadvisor/pull/1766 for OCP 3.7+