Description of problem: For some reason there are a lot of stole container volumes and node process keeps to perform `du` commands, volume_stat_caculator panics and it's using 100% CPU. It seems getting better by restarting node process. atomic-openshift-node: I0217 fsHandler.go:116] `du` on following dirs took 1.047866932s: [ /var/lib/docker/containers/51eb2d3df53ee04711af3c688b7b8f05fff59793283d6372cbda26f90d55015b] Logs will be attached. * du logs atomic-openshift-node: I0217 fsHandler.go:116] `du` on following dirs took 1.047866932s: [ /var/lib/docker/containers/51eb2d3df53ee04711af3c688b7b8f05fff59793283d6372cbda26f90d55015b] * panic logs atomic-openshift-node: I0220 operation_executor.go:824] UnmountVolume.TearDown succeeded for volume "kubernetes.io/secret/57b508f8-f5e6-11e6-8d55-0050568348cc-default-token-d6lb3" (OuterVolumeSpecName: "default-token-d6lb3") pod "57b508f8-f5e6-11e6-8d55-0050568348cc" (UID: "57b508f8-f5e6-11e6-8d55-0050568348cc"). InnerVolumeSpecName "default-token-d6lb3". PluginName "kubernetes.io/secret", VolumeGidValue "" atomic-openshift-node: E0220 19:32:52.565299 102547 runtime.go:52] Recovered from panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) atomic-openshift-node: /builddir/build/BUILD/atomic-openshift-git-0.0988966/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/runtime/runtime.go:58 atomic-openshift-node: /builddir/build/BUILD/atomic-openshift-git-0.0988966/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/runtime/runtime.go:51 atomic-openshift-node: /builddir/build/BUILD/atomic-openshift-git-0.0988966/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/runtime/runtime.go:41 atomic-openshift-node: /usr/lib/golang/src/runtime/asm_amd64.s:472 atomic-openshift-node: /usr/lib/golang/src/runtime/panic.go:443 atomic-openshift-node: /usr/lib/golang/src/runtime/panic.go:62 atomic-openshift-node: /usr/lib/golang/src/runtime/sigpanic_unix.go:24 atomic-openshift-node: /builddir/build/BUILD/atomic-openshift-git-0.0988966/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/server/stats/volume_stat_caculator.go:98 atomic-openshift-node: /builddir/build/BUILD/atomic-openshift-git-0.0988966/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/server/stats/volume_stat_caculator.go:63 atomic-openshift-node: /builddir/build/BUILD/atomic-openshift-git-0.0988966/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/wait/wait.go:88 atomic-openshift-node: /builddir/build/BUILD/atomic-openshift-git-0.0988966/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/wait/wait.go:89 atomic-openshift-node: /usr/lib/golang/src/runtime/asm_amd64.s:1998 - stole volumes # df -h | grep openshift.local.volumes | grep tmpfs | wc -l 6329 # df -h | grep openshift.local.volumes | grep tmpfs | head -1 tmpfs 32G 12K 32G 1% /var/lib/origin/openshift.local.volumes/pods/ca4300a4-d2af-11e6-8a27-0050568379a2/volumes/kubernetes.io~secret/default-token-6jo4n Version-Release number of selected component (if applicable): atomic-openshift-3.3.1.7-1.git.0.0988966.el7.x86_64 How reproducible: Not yet known. Once at customer env. No change performed at the time the issue started. Steps to Reproduce: 1. 2. 3. Actual results: For some reason there are a lot of stole container volumes and node process keeps to perform `du` commands, volume_stat_caculator panics and it's using 100% CPU. Expected results: No stole volumes Additional info:
The panics are fixed in OCP 3.4 and higher by this: https://github.com/openshift/origin/commit/4f830f3 I will backport to OSE 3.3.
This has been merged into ocp and is in OCP v3.3.1.16 or newer.
Verify on ocp v3.3.1.16, no this panic on node. [root@ip-172-18-12-128 ~]# openshift version openshift v3.3.1.16 kubernetes v1.3.0+52492b4 etcd 2.3.0+git
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:0512