Description of problem: openshift node logs thousands of "du: cannot access" Version-Release number of selected component (if applicable): atomic-openshift-node-3.2.0.44-1.git.0.a4463d9.el7.x86_64 How reproducible: always Steps to Reproduce: 1. run pods on an openshift node with the above version (also noted on earlier versions) Actual results: the below error is logged roughly 6000 times/day on a low-utilization node Expected results: this error should be handled Additional info: example of a complete log entry: Jul 15 14:11:52 ip-172-31-3-96.ec2.internal atomic-openshift-node[7724]: E0715 14:11:52.590669 7724 fsHandler.go:106] failed to collect filesystem stats - du command failed on /var/lib/docker/containers/cc8af30222fa33248476635981347a56bb10e7014cd8546f75fe2d6fcb301740 with output stdout: , stderr: du: cannot access ‘/var/lib/docker/containers/cc8af30222fa33248476635981347a56bb10e7014cd8546f75fe2d6fcb301740’: No such file or directory
I believe this is fixed in 3.3.
Test on openshift 3.3, There is still has this error, but not too much. In one of my nodes there are two related errors. openshift v3.3.0.6 kubernetes v1.3.0+57fb9ac etcd 2.3.0+git 4346 Jul 18 03:51:49 ip-172-18-10-95 atomic-openshift-node: E0718 03:51:49.519355 17412 fsHandler.go:106] failed to collect filesystem stats - du command failed on /rootfs/var/lib/docker/containers/a5febc665c196470ec125d7354103b707604 476ad8c31255d54a53cdb9352b41 with output stdout: , stderr: du: cannot access '/rootfs/var/lib/docker/containers/a5febc665c196470ec125d7354103b707604476ad8c31255d54a53cdb9352b41': No such file or directory Jul 18 04:48:48 ip-172-18-10-95 atomic-openshift-node: E0718 04:48:48.245490 17412 fsHandler.go:106] failed to collect filesystem stats - du command failed on /rootfs/var/lib/docker/containers/27d314be763347414adef76078fc057cc0490aa6a2e3b818df6deef883d46c41 with output stdout: , stderr: du: cannot access '/rootfs/var/lib/docker/containers/27d314be763347414adef76078fc057cc0490aa6a2e3b818df6deef883d46c41': No such file or directory
Could you please paste the output of `sudo docker info`? Also, do you know what those 2 containers are - what were they running, did OpenShift start them, etc?
(In reply to Andy Goldstein from comment #3) > Could you please paste the output of `sudo docker info`? > > Also, do you know what those 2 containers are - what were they running, did > OpenShift start them, etc? Those container is created by openshift. like this one I create 10 pods with same image then delete it. only has meet this error. 18961 Jul 19 00:53:53 ip-172-18-4-118 atomic-openshift-node: E0719 00:53:53.669042 8978 fsHandler.go:106] failed to collect filesystem stats - du command failed on /var/lib/docker/containers/319d3d8950094b7c1d7147ca74e6663b839a348475 2e0e154b1dbff71f6c6e25 with output stdout: , stderr: du: cannot access ‘/var/lib/docker/containers/319d3d8950094b7c1d7147ca74e6663b839a3484752e0e154b1dbff71f6c6e25’: No such file or directory [root@ip-172-18-4-118 containers]# docker info Containers: 0 Running: 0 Paused: 0 Stopped: 0 Images: 5 Server Version: 1.10.3 Storage Driver: devicemapper Pool Name: docker-202:2-67110041-pool Pool Blocksize: 65.54 kB Base Device Size: 10.74 GB Backing Filesystem: xfs Data file: /dev/loop0 Metadata file: /dev/loop1 Data Space Used: 1.692 GB Data Space Total: 107.4 GB Data Space Available: 22.64 GB Metadata Space Used: 2.13 MB Metadata Space Total: 2.147 GB Metadata Space Available: 2.145 GB Udev Sync Supported: true Deferred Removal Enabled: false Deferred Deletion Enabled: false Deferred Deleted Device Count: 0 Data loop file: /var/lib/docker/devicemapper/devicemapper/data WARNING: Usage of loopback devices is strongly discouraged for production use. Either use `--storage-opt dm.thinpooldev` or use `--storage-opt dm.no_warn_on_loop_devices=true` to suppress this warning. Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata Library Version: 1.02.107-RHEL7 (2016-06-09) Execution Driver: native-0.2 Logging Driver: json-file Plugins: Volume: local Network: null host bridge Authorization: rhel-push-plugin Kernel Version: 3.10.0-327.22.2.el7.x86_64 Operating System: Red Hat Enterprise Linux Server 7.2 (Maipo) OSType: linux Architecture: x86_64 Number of Docker Hooks: 2 CPUs: 1 Total Memory: 3.518 GiB Name: ip-172-18-4-118.ec2.internal ID: FHHR:XIJ3:XQV7:U5I2:VMF5:PNF6:OJYZ:MV33:PR4T:T3U6:SQ74:4WJV WARNING: bridge-nf-call-iptables is disabled WARNING: bridge-nf-call-ip6tables is disabled Registries: registry.qe.openshift.com (insecure), registry.access.redhat.com (secure), docker.io (secure)
sometime when delete all the pod in project, also meet device busy error. 1. delete pod in project [root@ip-172-18-1-64 ~]# oc delete `oc get pod -n dma -o name` -n dma pod "besteffort-3iap2" deleted pod "besteffort-95ct6" deleted pod "besteffort-9n99g" deleted pod "besteffort-bpzy0" deleted pod "besteffort-hdxs8" deleted pod "besteffort-jkyrd" deleted pod "besteffort-lqmga" deleted pod "besteffort-n72uy" deleted pod "besteffort-qc3xu" deleted pod "besteffort-ttt3j" deleted 2. check node logs. Jul 19 03:40:57 ip-172-18-4-118 atomic-openshift-node: I0719 03:40:57.451528 35151 kubelet.go:2043] Failed to remove orphaned pod "c134c5fe-4d6c-11e6-b1cc-0e227273c3bd" dir; err: remove /var/lib/origin/openshift.local.volumes/pods/c134c5fe-4d6c-11e6-b1cc-0e227273c3bd/volumes/kubernetes.io~secret/deployer-token-0hr8m: device or resource busy http://pastebin.test.redhat.com/393492
DeShuai, looking for some clarity here. Does the du error message continue indefinitely for a particular pod or does it go away within a short period of time?
I am pretty sure this is related to the orphaning of docker cgroups, see 1328913. The suspected cause of the orphaning is the DM error mentioned in comment #5 This can be confirmed using the following method: Get this error described: 18961 Jul 19 00:53:53 ip-172-18-4-118 atomic-openshift-node: E0719 00:53:53.669042 8978 fsHandler.go:106] failed to collect filesystem stats - du command failed on /var/lib/docker/containers/319d3d8950094b7c1d7147ca74e6663b839a348475 2e0e154b1dbff71f6c6e25 with output stdout: , stderr: du: cannot access ‘/var/lib/docker/containers/319d3d8950094b7c1d7147ca74e6663b839a3484752e0e154b1dbff71f6c6e25’: No such file or directory If the path /sys/fs/cgroup/cpu,cpuacct/system.slice/319d3d8950094b7c1d7147ca74e6663b839a3484752e0e154b1dbff71f6c6e25 exists, then it is the same problem. Additional confirmation is if /sys/fs/cgroup/cpu,cpuacct/system.slice/319d3d8950094b7c1d7147ca74e6663b839a3484752e0e154b1dbff71f6c6e25/tasks is empty (no tasks in the cgroup)
Still blocked on https://bugzilla.redhat.com/show_bug.cgi?id=1367141
Trail has gone cold on this. Docker 1.10 is old. No associated customer issue. Closing.