Bug 1949612

Summary: Install with 1.21 Kubelet is spamming logs with failed to get stats failed command 'du'
Product: OpenShift Container Platform Reporter: Ryan Phillips <rphillips>
Component: NodeAssignee: Joel Smith <joelsmith>
Node sub component: Kubelet QA Contact: Sunil Choudhary <schoudha>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: aos-bugs, piqin, sgrunert, spasquie
Version: 4.8   
Target Milestone: ---   
Target Release: 4.8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-07-27 23:00:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ryan Phillips 2021-04-14 16:32:20 UTC
Description of problem:
Using https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/releasestream/4.8.0-0.ci/release/4.8.0-0.ci-2021-04-14-143618 and the image registry.ci.openshift.org/ocp/release:4.8.0-0.ci-2021-04-14-143618 we are seeing logs of logs stating 

"Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/d451e557-9ff8-4cc6-91d3-6cfc21e7ea4f/etc-hosts with error exit status 1" pod="openshift-dns/node-resolver-tckq7"

Running the du command as root from the commandline works.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Sascha Grunert 2021-05-03 11:05:15 UTC
Investigation: If a container already mounts /etc/hosts, like the node-resolver DaemonSet does in OpenShift, then the kubelet will not mount /var/lib/kubelet/pods/[ID]/etc-hosts:

https://github.com/kubernetes/kubernetes/blob/dfc91819b78e7dbf56194f50eff4c19c9fecd01b/pkg/kubelet/kubelet_pods.go#L156-L157

We could either skip the error handled here:

https://github.com/kubernetes/kubernetes/blob/dfc91819b78e7dbf56194f50eff4c19c9fecd01b/pkg/kubelet/stats/cadvisor_stats_provider.go#L149-L152

Or fallback to the actual hosts /etc/hosts path when collecting the stats.

Comment 3 Miciah Dashiel Butler Masters 2021-05-04 21:09:52 UTC
*** Bug 1953292 has been marked as a duplicate of this bug. ***

Comment 4 Sunil Choudhary 2021-05-05 08:59:55 UTC
Checked on 4.8.0-0.nightly-2021-05-05-030749, I still see the errors.
I see the PR is merged 12 hours ago, I will wait for newer built and check again.

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.0-0.nightly-2021-05-05-030749   True        False         22m     Cluster version is 4.8.0-0.nightly-2021-05-05-030749

...
May 05 08:56:15 ip-10-0-130-16 hyperkube[1393]: E0505 08:56:15.129413    1393 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/514749c2-7eee-4c22-8746-8962b5d3d01b/etc-hosts with error exit status 1" pod="openshift-dns/node-resolver-m8lhm"
May 05 08:56:15 ip-10-0-130-16 hyperkube[1393]: E0505 08:56:15.201122    1393 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/5b10d60d-3f33-4ed4-88aa-9140f7c97021/etc-hosts with error exit status 1" pod="default/ip-10-0-130-16us-east-2computeinternal-debug"
May 05 08:56:20 ip-10-0-130-16 hyperkube[1393]: E0505 08:56:20.773678    1393 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/514749c2-7eee-4c22-8746-8962b5d3d01b/etc-hosts with error exit status 1" pod="openshift-dns/node-resolver-m8lhm"
May 05 08:56:20 ip-10-0-130-16 hyperkube[1393]: E0505 08:56:20.793201    1393 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/514749c2-7eee-4c22-8746-8962b5d3d01b/etc-hosts with error exit status 1" pod="openshift-dns/node-resolver-m8lhm"
May 05 08:56:21 ip-10-0-130-16 hyperkube[1393]: E0505 08:56:21.075247    1393 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/5b10d60d-3f33-4ed4-88aa-9140f7c97021/etc-hosts with error exit status 1" pod="default/ip-10-0-130-16us-east-2computeinternal-debug"
May 05 08:56:21 ip-10-0-130-16 hyperkube[1393]: E0505 08:56:21.103136    1393 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/5b10d60d-3f33-4ed4-88aa-9140f7c97021/etc-hosts with error exit status 1" pod="default/ip-10-0-130-16us-east-2computeinternal-debug"
May 05 08:56:25 ip-10-0-130-16 hyperkube[1393]: E0505 08:56:25.388302    1393 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/514749c2-7eee-4c22-8746-8962b5d3d01b/etc-hosts with error exit status 1" pod="openshift-dns/node-resolver-m8lhm"
May 05 08:56:35 ip-10-0-130-16 hyperkube[1393]: E0505 08:56:35.517480    1393 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/514749c2-7eee-4c22-8746-8962b5d3d01b/etc-hosts with error exit status 1" pod="openshift-dns/node-resolver-m8lhm"
May 05 08:56:36 ip-10-0-130-16 hyperkube[1393]: E0505 08:56:36.351683    1393 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/514749c2-7eee-4c22-8746-8962b5d3d01b/etc-hosts with error exit status 1" pod="openshift-dns/node-resolver-m8lhm"
May 05 08:56:36 ip-10-0-130-16 hyperkube[1393]: E0505 08:56:36.447237    1393 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/514749c2-7eee-4c22-8746-8962b5d3d01b/etc-hosts with error exit status 1" pod="openshift-dns/node-resolver-m8lhm"
...

Comment 5 Sunil Choudhary 2021-05-06 06:10:24 UTC
I still see large number of failed du command for etc-hosts.

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.0-0.nightly-2021-05-06-032413   True        False         51m     Cluster version is 4.8.0-0.nightly-2021-05-06-032413

...
May 06 06:07:32 ip-10-0-130-110 hyperkube[1390]: E0506 06:07:32.543776    1390 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/5bae7667-40b7-49ea-b9a1-c561cdd19851/etc-hosts with error exit status 1" pod="openshift-marketplace/redhat-marketplace-jh2m9"
May 06 06:07:32 ip-10-0-130-110 hyperkube[1390]: E0506 06:07:32.563866    1390 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/8ea64dcd-24db-4776-9afe-bf1e30725de8/etc-hosts with error exit status 1" pod="openshift-dns/node-resolver-bdb27"
May 06 06:07:38 ip-10-0-130-110 hyperkube[1390]: E0506 06:07:38.128978    1390 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/5bae7667-40b7-49ea-b9a1-c561cdd19851/etc-hosts with error exit status 1" pod="openshift-marketplace/redhat-marketplace-jh2m9"
May 06 06:07:38 ip-10-0-130-110 hyperkube[1390]: E0506 06:07:38.131811    1390 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/8ea64dcd-24db-4776-9afe-bf1e30725de8/etc-hosts with error exit status 1" pod="openshift-dns/node-resolver-bdb27"
May 06 06:07:47 ip-10-0-130-110 hyperkube[1390]: E0506 06:07:47.083374    1390 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/8ea64dcd-24db-4776-9afe-bf1e30725de8/etc-hosts with error exit status 1" pod="openshift-dns/node-resolver-bdb27"
May 06 06:07:47 ip-10-0-130-110 hyperkube[1390]: E0506 06:07:47.152245    1390 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/5bae7667-40b7-49ea-b9a1-c561cdd19851/etc-hosts with error exit status 1" pod="openshift-marketplace/redhat-marketplace-jh2m9"
May 06 06:07:47 ip-10-0-130-110 hyperkube[1390]: E0506 06:07:47.157470    1390 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/5bae7667-40b7-49ea-b9a1-c561cdd19851/etc-hosts with error exit status 1" pod="openshift-marketplace/redhat-marketplace-jh2m9"
May 06 06:07:47 ip-10-0-130-110 hyperkube[1390]: E0506 06:07:47.168141    1390 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/8ea64dcd-24db-4776-9afe-bf1e30725de8/etc-hosts with error exit status 1" pod="openshift-dns/node-resolver-bdb27"
May 06 06:07:48 ip-10-0-130-110 hyperkube[1390]: E0506 06:07:48.174082    1390 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/8ea64dcd-24db-4776-9afe-bf1e30725de8/etc-hosts with error exit status 1" pod="openshift-dns/node-resolver-bdb27"
May 06 06:07:48 ip-10-0-130-110 hyperkube[1390]: E0506 06:07:48.215605    1390 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/5bae7667-40b7-49ea-b9a1-c561cdd19851/etc-hosts with error exit status 1" pod="openshift-marketplace/redhat-marketplace-jh2m9"
...

Comment 7 Sunil Choudhary 2021-06-10 09:07:14 UTC
Checked on 4.8.0-0.nightly-2021-06-10-000903, no longer see failed du command messages.

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.0-0.nightly-2021-06-10-000903   True        False         172m    Cluster version is 4.8.0-0.nightly-2021-06-10-000903

Comment 10 errata-xmlrpc 2021-07-27 23:00:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438