Bug 1512717

Summary: Corrupt Sandbox Checkpoint Files
Product: OpenShift Container Platform Reporter: Eric Paris <eparis>
Component: NodeAssignee: Seth Jennings <sjenning>
Status: CLOSED ERRATA QA Contact: DeShuai Ma <dma>
Severity: high Docs Contact:
Priority: high    
Version: 3.7.0CC: abhgupta, amurdaca, aos-bugs, bbennett, bmeng, danw, decarr, dma, eparis, haowang, hongli, jokerman, jupierce, mfojtik, mmccomas, rpenta, wjiang, xiaocwan, xiuwang, yasun, yinzhou, yufchang
Target Milestone: ---Keywords: OnlineStarter
Target Release: 3.7.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of: 1509799 Environment:
Last Closed: 2018-04-05 09:30:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1509799, 1540606, 1540608    
Bug Blocks:    

Comment 2 Eric Paris 2017-11-13 23:13:02 UTC
We saw a pod, with 44 of these completely break. It's like it just stopped making forward progress. I ran:

oc get --raw /debug/pprof/profile --server=https://172.31.71.195:10250 > profile

and it hung for about a hour. top showed 'openshift' using 100% CPU. I deleted the 'bad' sandboxes, restarted node, and now the node seems largely ok...

Comment 3 Seth Jennings 2017-11-13 23:13:34 UTC
Upstream PR:
https://github.com/kubernetes/kubernetes/pull/55641

Comment 4 Seth Jennings 2017-11-14 15:10:23 UTC
Origin PR:
https://github.com/openshift/origin/pull/17302

Comment 6 weiwei jiang 2018-01-25 07:26:28 UTC
Checked with 
# openshift version 
openshift v3.7.26
kubernetes v1.7.6+a08f5eeb62
etcd 3.2.8

And can not reproduce this issue, so verify this.

Comment 10 errata-xmlrpc 2018-04-05 09:30:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0636