Bug 1949612
Summary: | Install with 1.21 Kubelet is spamming logs with failed to get stats failed command 'du' | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Ryan Phillips <rphillips> |
Component: | Node | Assignee: | Joel Smith <joelsmith> |
Node sub component: | Kubelet | QA Contact: | Sunil Choudhary <schoudha> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | high | ||
Priority: | high | CC: | aos-bugs, piqin, sgrunert, spasquie |
Version: | 4.8 | ||
Target Milestone: | --- | ||
Target Release: | 4.8.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-07-27 23:00:48 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Ryan Phillips
2021-04-14 16:32:20 UTC
Investigation: If a container already mounts /etc/hosts, like the node-resolver DaemonSet does in OpenShift, then the kubelet will not mount /var/lib/kubelet/pods/[ID]/etc-hosts: https://github.com/kubernetes/kubernetes/blob/dfc91819b78e7dbf56194f50eff4c19c9fecd01b/pkg/kubelet/kubelet_pods.go#L156-L157 We could either skip the error handled here: https://github.com/kubernetes/kubernetes/blob/dfc91819b78e7dbf56194f50eff4c19c9fecd01b/pkg/kubelet/stats/cadvisor_stats_provider.go#L149-L152 Or fallback to the actual hosts /etc/hosts path when collecting the stats. *** Bug 1953292 has been marked as a duplicate of this bug. *** Checked on 4.8.0-0.nightly-2021-05-05-030749, I still see the errors. I see the PR is merged 12 hours ago, I will wait for newer built and check again. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0-0.nightly-2021-05-05-030749 True False 22m Cluster version is 4.8.0-0.nightly-2021-05-05-030749 ... May 05 08:56:15 ip-10-0-130-16 hyperkube[1393]: E0505 08:56:15.129413 1393 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/514749c2-7eee-4c22-8746-8962b5d3d01b/etc-hosts with error exit status 1" pod="openshift-dns/node-resolver-m8lhm" May 05 08:56:15 ip-10-0-130-16 hyperkube[1393]: E0505 08:56:15.201122 1393 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/5b10d60d-3f33-4ed4-88aa-9140f7c97021/etc-hosts with error exit status 1" pod="default/ip-10-0-130-16us-east-2computeinternal-debug" May 05 08:56:20 ip-10-0-130-16 hyperkube[1393]: E0505 08:56:20.773678 1393 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/514749c2-7eee-4c22-8746-8962b5d3d01b/etc-hosts with error exit status 1" pod="openshift-dns/node-resolver-m8lhm" May 05 08:56:20 ip-10-0-130-16 hyperkube[1393]: E0505 08:56:20.793201 1393 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/514749c2-7eee-4c22-8746-8962b5d3d01b/etc-hosts with error exit status 1" pod="openshift-dns/node-resolver-m8lhm" May 05 08:56:21 ip-10-0-130-16 hyperkube[1393]: E0505 08:56:21.075247 1393 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/5b10d60d-3f33-4ed4-88aa-9140f7c97021/etc-hosts with error exit status 1" pod="default/ip-10-0-130-16us-east-2computeinternal-debug" May 05 08:56:21 ip-10-0-130-16 hyperkube[1393]: E0505 08:56:21.103136 1393 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/5b10d60d-3f33-4ed4-88aa-9140f7c97021/etc-hosts with error exit status 1" pod="default/ip-10-0-130-16us-east-2computeinternal-debug" May 05 08:56:25 ip-10-0-130-16 hyperkube[1393]: E0505 08:56:25.388302 1393 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/514749c2-7eee-4c22-8746-8962b5d3d01b/etc-hosts with error exit status 1" pod="openshift-dns/node-resolver-m8lhm" May 05 08:56:35 ip-10-0-130-16 hyperkube[1393]: E0505 08:56:35.517480 1393 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/514749c2-7eee-4c22-8746-8962b5d3d01b/etc-hosts with error exit status 1" pod="openshift-dns/node-resolver-m8lhm" May 05 08:56:36 ip-10-0-130-16 hyperkube[1393]: E0505 08:56:36.351683 1393 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/514749c2-7eee-4c22-8746-8962b5d3d01b/etc-hosts with error exit status 1" pod="openshift-dns/node-resolver-m8lhm" May 05 08:56:36 ip-10-0-130-16 hyperkube[1393]: E0505 08:56:36.447237 1393 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/514749c2-7eee-4c22-8746-8962b5d3d01b/etc-hosts with error exit status 1" pod="openshift-dns/node-resolver-m8lhm" ... I still see large number of failed du command for etc-hosts. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0-0.nightly-2021-05-06-032413 True False 51m Cluster version is 4.8.0-0.nightly-2021-05-06-032413 ... May 06 06:07:32 ip-10-0-130-110 hyperkube[1390]: E0506 06:07:32.543776 1390 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/5bae7667-40b7-49ea-b9a1-c561cdd19851/etc-hosts with error exit status 1" pod="openshift-marketplace/redhat-marketplace-jh2m9" May 06 06:07:32 ip-10-0-130-110 hyperkube[1390]: E0506 06:07:32.563866 1390 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/8ea64dcd-24db-4776-9afe-bf1e30725de8/etc-hosts with error exit status 1" pod="openshift-dns/node-resolver-bdb27" May 06 06:07:38 ip-10-0-130-110 hyperkube[1390]: E0506 06:07:38.128978 1390 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/5bae7667-40b7-49ea-b9a1-c561cdd19851/etc-hosts with error exit status 1" pod="openshift-marketplace/redhat-marketplace-jh2m9" May 06 06:07:38 ip-10-0-130-110 hyperkube[1390]: E0506 06:07:38.131811 1390 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/8ea64dcd-24db-4776-9afe-bf1e30725de8/etc-hosts with error exit status 1" pod="openshift-dns/node-resolver-bdb27" May 06 06:07:47 ip-10-0-130-110 hyperkube[1390]: E0506 06:07:47.083374 1390 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/8ea64dcd-24db-4776-9afe-bf1e30725de8/etc-hosts with error exit status 1" pod="openshift-dns/node-resolver-bdb27" May 06 06:07:47 ip-10-0-130-110 hyperkube[1390]: E0506 06:07:47.152245 1390 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/5bae7667-40b7-49ea-b9a1-c561cdd19851/etc-hosts with error exit status 1" pod="openshift-marketplace/redhat-marketplace-jh2m9" May 06 06:07:47 ip-10-0-130-110 hyperkube[1390]: E0506 06:07:47.157470 1390 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/5bae7667-40b7-49ea-b9a1-c561cdd19851/etc-hosts with error exit status 1" pod="openshift-marketplace/redhat-marketplace-jh2m9" May 06 06:07:47 ip-10-0-130-110 hyperkube[1390]: E0506 06:07:47.168141 1390 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/8ea64dcd-24db-4776-9afe-bf1e30725de8/etc-hosts with error exit status 1" pod="openshift-dns/node-resolver-bdb27" May 06 06:07:48 ip-10-0-130-110 hyperkube[1390]: E0506 06:07:48.174082 1390 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/8ea64dcd-24db-4776-9afe-bf1e30725de8/etc-hosts with error exit status 1" pod="openshift-dns/node-resolver-bdb27" May 06 06:07:48 ip-10-0-130-110 hyperkube[1390]: E0506 06:07:48.215605 1390 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/5bae7667-40b7-49ea-b9a1-c561cdd19851/etc-hosts with error exit status 1" pod="openshift-marketplace/redhat-marketplace-jh2m9" ... Checked on 4.8.0-0.nightly-2021-06-10-000903, no longer see failed du command messages. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0-0.nightly-2021-06-10-000903 True False 172m Cluster version is 4.8.0-0.nightly-2021-06-10-000903 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 |