Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1425301 - [3.3] Node 100% CPU usage with many stale container volumes and volume_stat_caculator panics
[3.3] Node 100% CPU usage with many stale container volumes and volume_stat_c...
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Pod (Show other bugs)
3.3.1
Unspecified Unspecified
high Severity urgent
: ---
: 3.3.1
Assigned To: Seth Jennings
DeShuai Ma
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-02-21 02:00 EST by Takayoshi Kimura
Modified: 2017-03-15 16:03 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Fixes an issue where the openshift node logs a panic with a nil deference during volume teardown.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-03-15 16:03:06 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:0512 normal SHIPPED_LIVE OpenShift Container Platform 3.4.1.10, 3.3.1.17, and 3.2.1.28 bug fix update 2017-03-15 20:01:17 EDT

  None (edit)
Description Takayoshi Kimura 2017-02-21 02:00:08 EST
Description of problem:

For some reason there are a lot of stole container volumes and node process keeps to perform `du` commands, volume_stat_caculator panics and it's using 100% CPU.

It seems getting better by restarting node process.

atomic-openshift-node: I0217 fsHandler.go:116] `du` on following dirs took 1.047866932s: [ /var/lib/docker/containers/51eb2d3df53ee04711af3c688b7b8f05fff59793283d6372cbda26f90d55015b]

Logs will be attached.

* du logs

atomic-openshift-node: I0217 fsHandler.go:116] `du` on following dirs took 1.047866932s: [ /var/lib/docker/containers/51eb2d3df53ee04711af3c688b7b8f05fff59793283d6372cbda26f90d55015b]

* panic logs

atomic-openshift-node: I0220 operation_executor.go:824] UnmountVolume.TearDown succeeded for volume "kubernetes.io/secret/57b508f8-f5e6-11e6-8d55-0050568348cc-default-token-d6lb3" (OuterVolumeSpecName: "default-token-d6lb3") pod "57b508f8-f5e6-11e6-8d55-0050568348cc" (UID: "57b508f8-f5e6-11e6-8d55-0050568348cc"). InnerVolumeSpecName "default-token-d6lb3". PluginName "kubernetes.io/secret", VolumeGidValue ""
atomic-openshift-node: E0220 19:32:52.565299  102547 runtime.go:52] Recovered from panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
atomic-openshift-node: /builddir/build/BUILD/atomic-openshift-git-0.0988966/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/runtime/runtime.go:58
atomic-openshift-node: /builddir/build/BUILD/atomic-openshift-git-0.0988966/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/runtime/runtime.go:51
atomic-openshift-node: /builddir/build/BUILD/atomic-openshift-git-0.0988966/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/runtime/runtime.go:41
atomic-openshift-node: /usr/lib/golang/src/runtime/asm_amd64.s:472
atomic-openshift-node: /usr/lib/golang/src/runtime/panic.go:443
atomic-openshift-node: /usr/lib/golang/src/runtime/panic.go:62
atomic-openshift-node: /usr/lib/golang/src/runtime/sigpanic_unix.go:24
atomic-openshift-node: /builddir/build/BUILD/atomic-openshift-git-0.0988966/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/server/stats/volume_stat_caculator.go:98
atomic-openshift-node: /builddir/build/BUILD/atomic-openshift-git-0.0988966/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/server/stats/volume_stat_caculator.go:63
atomic-openshift-node: /builddir/build/BUILD/atomic-openshift-git-0.0988966/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/wait/wait.go:88
atomic-openshift-node: /builddir/build/BUILD/atomic-openshift-git-0.0988966/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/wait/wait.go:89
atomic-openshift-node: /usr/lib/golang/src/runtime/asm_amd64.s:1998

- stole volumes

# df -h | grep openshift.local.volumes | grep tmpfs | wc -l
6329
# df -h | grep openshift.local.volumes | grep tmpfs | head -1
tmpfs                                   32G   12K   32G   1% /var/lib/origin/openshift.local.volumes/pods/ca4300a4-d2af-11e6-8a27-0050568379a2/volumes/kubernetes.io~secret/default-token-6jo4n

Version-Release number of selected component (if applicable):

atomic-openshift-3.3.1.7-1.git.0.0988966.el7.x86_64

How reproducible:

Not yet known. Once at customer env. No change performed at the time the issue started.

Steps to Reproduce:
1.
2.
3.

Actual results:

For some reason there are a lot of stole container volumes and node process keeps to perform `du` commands, volume_stat_caculator panics and it's using 100% CPU.

Expected results:

No stole volumes

Additional info:
Comment 12 Seth Jennings 2017-02-21 13:20:11 EST
The panics are fixed in OCP 3.4 and higher by this:
https://github.com/openshift/origin/commit/4f830f3

I will backport to OSE 3.3.
Comment 18 Troy Dawson 2017-02-24 15:36:11 EST
This has been merged into ocp and is in OCP v3.3.1.16 or newer.
Comment 20 DeShuai Ma 2017-03-03 02:46:10 EST
Verify on ocp v3.3.1.16, no this panic on node.
[root@ip-172-18-12-128 ~]# openshift version
openshift v3.3.1.16
kubernetes v1.3.0+52492b4
etcd 2.3.0+git
Comment 22 errata-xmlrpc 2017-03-15 16:03:06 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0512

Note You need to log in before you can comment on or make changes to this bug.