Bug 1949612 - Install with 1.21 Kubelet is spamming logs with failed to get stats failed command 'du'
Summary: Install with 1.21 Kubelet is spamming logs with failed to get stats failed co...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.8
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.8.0
Assignee: Joel Smith
QA Contact: Sunil Choudhary
URL:
Whiteboard:
: 1953292 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-04-14 16:32 UTC by Ryan Phillips
Modified: 2021-07-27 23:01 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-27 23:00:48 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift kubernetes pull 729 0 None open Bug 1949612: UPSTREAM: 101708: Fix log spam for du failure on pod etc-hosts metrics 2021-05-04 15:56:08 UTC
Red Hat Bugzilla 1953292 1 high CLOSED A lot of error msg "Unable to fetch pod etc hosts stats" found in kubelet log 2022-08-04 22:39:37 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 23:01:03 UTC

Description Ryan Phillips 2021-04-14 16:32:20 UTC
Description of problem:
Using https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/releasestream/4.8.0-0.ci/release/4.8.0-0.ci-2021-04-14-143618 and the image registry.ci.openshift.org/ocp/release:4.8.0-0.ci-2021-04-14-143618 we are seeing logs of logs stating 

"Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/d451e557-9ff8-4cc6-91d3-6cfc21e7ea4f/etc-hosts with error exit status 1" pod="openshift-dns/node-resolver-tckq7"

Running the du command as root from the commandline works.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Sascha Grunert 2021-05-03 11:05:15 UTC
Investigation: If a container already mounts /etc/hosts, like the node-resolver DaemonSet does in OpenShift, then the kubelet will not mount /var/lib/kubelet/pods/[ID]/etc-hosts:

https://github.com/kubernetes/kubernetes/blob/dfc91819b78e7dbf56194f50eff4c19c9fecd01b/pkg/kubelet/kubelet_pods.go#L156-L157

We could either skip the error handled here:

https://github.com/kubernetes/kubernetes/blob/dfc91819b78e7dbf56194f50eff4c19c9fecd01b/pkg/kubelet/stats/cadvisor_stats_provider.go#L149-L152

Or fallback to the actual hosts /etc/hosts path when collecting the stats.

Comment 3 Miciah Dashiel Butler Masters 2021-05-04 21:09:52 UTC
*** Bug 1953292 has been marked as a duplicate of this bug. ***

Comment 4 Sunil Choudhary 2021-05-05 08:59:55 UTC
Checked on 4.8.0-0.nightly-2021-05-05-030749, I still see the errors.
I see the PR is merged 12 hours ago, I will wait for newer built and check again.

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.0-0.nightly-2021-05-05-030749   True        False         22m     Cluster version is 4.8.0-0.nightly-2021-05-05-030749

...
May 05 08:56:15 ip-10-0-130-16 hyperkube[1393]: E0505 08:56:15.129413    1393 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/514749c2-7eee-4c22-8746-8962b5d3d01b/etc-hosts with error exit status 1" pod="openshift-dns/node-resolver-m8lhm"
May 05 08:56:15 ip-10-0-130-16 hyperkube[1393]: E0505 08:56:15.201122    1393 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/5b10d60d-3f33-4ed4-88aa-9140f7c97021/etc-hosts with error exit status 1" pod="default/ip-10-0-130-16us-east-2computeinternal-debug"
May 05 08:56:20 ip-10-0-130-16 hyperkube[1393]: E0505 08:56:20.773678    1393 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/514749c2-7eee-4c22-8746-8962b5d3d01b/etc-hosts with error exit status 1" pod="openshift-dns/node-resolver-m8lhm"
May 05 08:56:20 ip-10-0-130-16 hyperkube[1393]: E0505 08:56:20.793201    1393 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/514749c2-7eee-4c22-8746-8962b5d3d01b/etc-hosts with error exit status 1" pod="openshift-dns/node-resolver-m8lhm"
May 05 08:56:21 ip-10-0-130-16 hyperkube[1393]: E0505 08:56:21.075247    1393 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/5b10d60d-3f33-4ed4-88aa-9140f7c97021/etc-hosts with error exit status 1" pod="default/ip-10-0-130-16us-east-2computeinternal-debug"
May 05 08:56:21 ip-10-0-130-16 hyperkube[1393]: E0505 08:56:21.103136    1393 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/5b10d60d-3f33-4ed4-88aa-9140f7c97021/etc-hosts with error exit status 1" pod="default/ip-10-0-130-16us-east-2computeinternal-debug"
May 05 08:56:25 ip-10-0-130-16 hyperkube[1393]: E0505 08:56:25.388302    1393 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/514749c2-7eee-4c22-8746-8962b5d3d01b/etc-hosts with error exit status 1" pod="openshift-dns/node-resolver-m8lhm"
May 05 08:56:35 ip-10-0-130-16 hyperkube[1393]: E0505 08:56:35.517480    1393 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/514749c2-7eee-4c22-8746-8962b5d3d01b/etc-hosts with error exit status 1" pod="openshift-dns/node-resolver-m8lhm"
May 05 08:56:36 ip-10-0-130-16 hyperkube[1393]: E0505 08:56:36.351683    1393 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/514749c2-7eee-4c22-8746-8962b5d3d01b/etc-hosts with error exit status 1" pod="openshift-dns/node-resolver-m8lhm"
May 05 08:56:36 ip-10-0-130-16 hyperkube[1393]: E0505 08:56:36.447237    1393 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/514749c2-7eee-4c22-8746-8962b5d3d01b/etc-hosts with error exit status 1" pod="openshift-dns/node-resolver-m8lhm"
...

Comment 5 Sunil Choudhary 2021-05-06 06:10:24 UTC
I still see large number of failed du command for etc-hosts.

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.0-0.nightly-2021-05-06-032413   True        False         51m     Cluster version is 4.8.0-0.nightly-2021-05-06-032413

...
May 06 06:07:32 ip-10-0-130-110 hyperkube[1390]: E0506 06:07:32.543776    1390 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/5bae7667-40b7-49ea-b9a1-c561cdd19851/etc-hosts with error exit status 1" pod="openshift-marketplace/redhat-marketplace-jh2m9"
May 06 06:07:32 ip-10-0-130-110 hyperkube[1390]: E0506 06:07:32.563866    1390 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/8ea64dcd-24db-4776-9afe-bf1e30725de8/etc-hosts with error exit status 1" pod="openshift-dns/node-resolver-bdb27"
May 06 06:07:38 ip-10-0-130-110 hyperkube[1390]: E0506 06:07:38.128978    1390 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/5bae7667-40b7-49ea-b9a1-c561cdd19851/etc-hosts with error exit status 1" pod="openshift-marketplace/redhat-marketplace-jh2m9"
May 06 06:07:38 ip-10-0-130-110 hyperkube[1390]: E0506 06:07:38.131811    1390 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/8ea64dcd-24db-4776-9afe-bf1e30725de8/etc-hosts with error exit status 1" pod="openshift-dns/node-resolver-bdb27"
May 06 06:07:47 ip-10-0-130-110 hyperkube[1390]: E0506 06:07:47.083374    1390 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/8ea64dcd-24db-4776-9afe-bf1e30725de8/etc-hosts with error exit status 1" pod="openshift-dns/node-resolver-bdb27"
May 06 06:07:47 ip-10-0-130-110 hyperkube[1390]: E0506 06:07:47.152245    1390 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/5bae7667-40b7-49ea-b9a1-c561cdd19851/etc-hosts with error exit status 1" pod="openshift-marketplace/redhat-marketplace-jh2m9"
May 06 06:07:47 ip-10-0-130-110 hyperkube[1390]: E0506 06:07:47.157470    1390 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/5bae7667-40b7-49ea-b9a1-c561cdd19851/etc-hosts with error exit status 1" pod="openshift-marketplace/redhat-marketplace-jh2m9"
May 06 06:07:47 ip-10-0-130-110 hyperkube[1390]: E0506 06:07:47.168141    1390 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/8ea64dcd-24db-4776-9afe-bf1e30725de8/etc-hosts with error exit status 1" pod="openshift-dns/node-resolver-bdb27"
May 06 06:07:48 ip-10-0-130-110 hyperkube[1390]: E0506 06:07:48.174082    1390 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/8ea64dcd-24db-4776-9afe-bf1e30725de8/etc-hosts with error exit status 1" pod="openshift-dns/node-resolver-bdb27"
May 06 06:07:48 ip-10-0-130-110 hyperkube[1390]: E0506 06:07:48.215605    1390 cadvisor_stats_provider.go:151] "Unable to fetch pod etc hosts stats" err="failed to get stats failed command 'du' ($ nice -n 19 du -x -s -B 1) on path /var/lib/kubelet/pods/5bae7667-40b7-49ea-b9a1-c561cdd19851/etc-hosts with error exit status 1" pod="openshift-marketplace/redhat-marketplace-jh2m9"
...

Comment 7 Sunil Choudhary 2021-06-10 09:07:14 UTC
Checked on 4.8.0-0.nightly-2021-06-10-000903, no longer see failed du command messages.

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.0-0.nightly-2021-06-10-000903   True        False         172m    Cluster version is 4.8.0-0.nightly-2021-06-10-000903

Comment 10 errata-xmlrpc 2021-07-27 23:00:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.