Description of problem: Using https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/releasestream/4.8.0-0.ci/release/4.8.0-0.ci-2021-04-14-143618 and the image registry.ci.openshift.org/ocp/release:4.8.0-0.ci-2021-04-14-143618 we are seeing logs of logs stating "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/d451e557-9ff8-4cc6-91d3-6cfc21e7ea4f/etc-hosts with error exit status 1" pod="openshift-dns/node-resolver-tckq7" Running the du command as root from the commandline works. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Investigation: If a container already mounts /etc/hosts, like the node-resolver DaemonSet does in OpenShift, then the kubelet will not mount /var/lib/kubelet/pods/[ID]/etc-hosts: https://github.com/kubernetes/kubernetes/blob/dfc91819b78e7dbf56194f50eff4c19c9fecd01b/pkg/kubelet/kubelet_pods.go#L156-L157 We could either skip the error handled here: https://github.com/kubernetes/kubernetes/blob/dfc91819b78e7dbf56194f50eff4c19c9fecd01b/pkg/kubelet/stats/cadvisor_stats_provider.go#L149-L152 Or fallback to the actual hosts /etc/hosts path when collecting the stats.
*** Bug 1953292 has been marked as a duplicate of this bug. ***
Checked on 4.8.0-0.nightly-2021-05-05-030749, I still see the errors. I see the PR is merged 12 hours ago, I will wait for newer built and check again. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0-0.nightly-2021-05-05-030749 True False 22m Cluster version is 4.8.0-0.nightly-2021-05-05-030749 ... May 05 08:56:15 ip-10-0-130-16 hyperkube[1393]: E0505 08:56:15.129413 1393 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/514749c2-7eee-4c22-8746-8962b5d3d01b/etc-hosts with error exit status 1" pod="openshift-dns/node-resolver-m8lhm" May 05 08:56:15 ip-10-0-130-16 hyperkube[1393]: E0505 08:56:15.201122 1393 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/5b10d60d-3f33-4ed4-88aa-9140f7c97021/etc-hosts with error exit status 1" pod="default/ip-10-0-130-16us-east-2computeinternal-debug" May 05 08:56:20 ip-10-0-130-16 hyperkube[1393]: E0505 08:56:20.773678 1393 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/514749c2-7eee-4c22-8746-8962b5d3d01b/etc-hosts with error exit status 1" pod="openshift-dns/node-resolver-m8lhm" May 05 08:56:20 ip-10-0-130-16 hyperkube[1393]: E0505 08:56:20.793201 1393 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/514749c2-7eee-4c22-8746-8962b5d3d01b/etc-hosts with error exit status 1" pod="openshift-dns/node-resolver-m8lhm" May 05 08:56:21 ip-10-0-130-16 hyperkube[1393]: E0505 08:56:21.075247 1393 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/5b10d60d-3f33-4ed4-88aa-9140f7c97021/etc-hosts with error exit status 1" pod="default/ip-10-0-130-16us-east-2computeinternal-debug" May 05 08:56:21 ip-10-0-130-16 hyperkube[1393]: E0505 08:56:21.103136 1393 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/5b10d60d-3f33-4ed4-88aa-9140f7c97021/etc-hosts with error exit status 1" pod="default/ip-10-0-130-16us-east-2computeinternal-debug" May 05 08:56:25 ip-10-0-130-16 hyperkube[1393]: E0505 08:56:25.388302 1393 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/514749c2-7eee-4c22-8746-8962b5d3d01b/etc-hosts with error exit status 1" pod="openshift-dns/node-resolver-m8lhm" May 05 08:56:35 ip-10-0-130-16 hyperkube[1393]: E0505 08:56:35.517480 1393 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/514749c2-7eee-4c22-8746-8962b5d3d01b/etc-hosts with error exit status 1" pod="openshift-dns/node-resolver-m8lhm" May 05 08:56:36 ip-10-0-130-16 hyperkube[1393]: E0505 08:56:36.351683 1393 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/514749c2-7eee-4c22-8746-8962b5d3d01b/etc-hosts with error exit status 1" pod="openshift-dns/node-resolver-m8lhm" May 05 08:56:36 ip-10-0-130-16 hyperkube[1393]: E0505 08:56:36.447237 1393 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/514749c2-7eee-4c22-8746-8962b5d3d01b/etc-hosts with error exit status 1" pod="openshift-dns/node-resolver-m8lhm" ...
I still see large number of failed du command for etc-hosts. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0-0.nightly-2021-05-06-032413 True False 51m Cluster version is 4.8.0-0.nightly-2021-05-06-032413 ... May 06 06:07:32 ip-10-0-130-110 hyperkube[1390]: E0506 06:07:32.543776 1390 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/5bae7667-40b7-49ea-b9a1-c561cdd19851/etc-hosts with error exit status 1" pod="openshift-marketplace/redhat-marketplace-jh2m9" May 06 06:07:32 ip-10-0-130-110 hyperkube[1390]: E0506 06:07:32.563866 1390 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/8ea64dcd-24db-4776-9afe-bf1e30725de8/etc-hosts with error exit status 1" pod="openshift-dns/node-resolver-bdb27" May 06 06:07:38 ip-10-0-130-110 hyperkube[1390]: E0506 06:07:38.128978 1390 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/5bae7667-40b7-49ea-b9a1-c561cdd19851/etc-hosts with error exit status 1" pod="openshift-marketplace/redhat-marketplace-jh2m9" May 06 06:07:38 ip-10-0-130-110 hyperkube[1390]: E0506 06:07:38.131811 1390 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/8ea64dcd-24db-4776-9afe-bf1e30725de8/etc-hosts with error exit status 1" pod="openshift-dns/node-resolver-bdb27" May 06 06:07:47 ip-10-0-130-110 hyperkube[1390]: E0506 06:07:47.083374 1390 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/8ea64dcd-24db-4776-9afe-bf1e30725de8/etc-hosts with error exit status 1" pod="openshift-dns/node-resolver-bdb27" May 06 06:07:47 ip-10-0-130-110 hyperkube[1390]: E0506 06:07:47.152245 1390 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/5bae7667-40b7-49ea-b9a1-c561cdd19851/etc-hosts with error exit status 1" pod="openshift-marketplace/redhat-marketplace-jh2m9" May 06 06:07:47 ip-10-0-130-110 hyperkube[1390]: E0506 06:07:47.157470 1390 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/5bae7667-40b7-49ea-b9a1-c561cdd19851/etc-hosts with error exit status 1" pod="openshift-marketplace/redhat-marketplace-jh2m9" May 06 06:07:47 ip-10-0-130-110 hyperkube[1390]: E0506 06:07:47.168141 1390 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/8ea64dcd-24db-4776-9afe-bf1e30725de8/etc-hosts with error exit status 1" pod="openshift-dns/node-resolver-bdb27" May 06 06:07:48 ip-10-0-130-110 hyperkube[1390]: E0506 06:07:48.174082 1390 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/8ea64dcd-24db-4776-9afe-bf1e30725de8/etc-hosts with error exit status 1" pod="openshift-dns/node-resolver-bdb27" May 06 06:07:48 ip-10-0-130-110 hyperkube[1390]: E0506 06:07:48.215605 1390 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/5bae7667-40b7-49ea-b9a1-c561cdd19851/etc-hosts with error exit status 1" pod="openshift-marketplace/redhat-marketplace-jh2m9" ...
Checked on 4.8.0-0.nightly-2021-06-10-000903, no longer see failed du command messages. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0-0.nightly-2021-06-10-000903 True False 172m Cluster version is 4.8.0-0.nightly-2021-06-10-000903
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438