Bug 1412087
Summary: | Hit panic and segment error in atomic-openshift-node log | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Zhang Cheng <chezhang> | ||||
Component: | Networking | Assignee: | Ben Bennett <bbennett> | ||||
Status: | CLOSED ERRATA | QA Contact: | Zhang Cheng <chezhang> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 3.4.0 | CC: | agoldste, aos-bugs, chezhang, dma, ghuang, mifiedle, weliang, wmeng, xtian | ||||
Target Milestone: | --- | ||||||
Target Release: | 3.4.z | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: |
Cause: When the admission controller that adds security contexts is disabled, the node can crash.
Consequence: The node crashes trying to process a security context that is not present.
Fix: Check the pointer is defined before dereferencing it.
Result: The node doesn't crash.
|
Story Points: | --- | ||||
Clone Of: | |||||||
: | 1415282 (view as bug list) | Environment: | |||||
Last Closed: | 2017-01-31 20:19:54 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1415282 | ||||||
Attachments: |
|
Comment 2
Zhang Cheng
2017-01-11 08:56:35 UTC
Created attachment 1239383 [details]
openshift-sdn-debug.tgz
Node runs on ec2 t2.large instance, met this bug in our one env, don't know how to reproduce it.
PR https://github.com/openshift/origin/pull/12446 resolves the symptom of the problem. But investigation is ongoing to determine if there is a deeper problem where the SecurityContext is not being set, but should. I'm seeing a few pods that are missing the openshift.io/scc annotation, and their containers' SecurityContext fields are all nil when they shouldn't be. When you were using this cluster, were you just doing normal operations (oc create, oc run, etc)? Or was there anything out of the ordinary? Did you ever change the admission control configuration in the master config file? Hi Zhang Cheng, I am trying to reproduce your bug locally, in my testing env, OCP 3.4.0.39 can work with docker 1.10, but not docker 1.12(master and node are stuck in NotReady state), could you let me know which steps you did to make OCP 3.4.30 work with 1.12? Commit pushed to master at https://github.com/openshift/origin https://github.com/openshift/origin/commit/6255e656cf97021047086d925192ca506b81ffae Add a nil check to Container.SecurityContext We were panicing sometimes when we dereferenced a nil pointer when looking at the Container.SecurityContext which is defined as optional. This fix adds a check to see if it is not nil before dereferencing. Fixes bug 1412087 (https://bugzilla.redhat.com/show_bug.cgi?id=1412087) The PR https://github.com/openshift/origin/pull/12446 prevents the crash when the admission controller is disabled. Fortunately, disabling the admission controller that adds the security contexts is not likely to be desired at the customer site, so this is not a release blocker for 3.4.0 and will be fixed in 3.4.1 and 3.3.x. @Zhang, By default docker 1.10 is installed when install OCP3.4 by using Flexy , even I rebuild my ec2 instances based on your previous build https://openshift-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/AOS_V3_Installation/job/Launch%20Environment%20Flexy/9570/, the docker version is still 1.10. When I remove docker 1.10 and reinstall docker 1.12, both master and node can not be up. I know this bug is fixed, but just curious how you upgrade docker to 1.12 in OCP3.4 @Gan Huang Thanks for your kindly reply. @Weibin, please refer to Gan Huang's comments. Passed and Verified on OCP 3.4.1.0, test steps follow my above comments. Test env: OCP v3.4.1.0 kubernetes v1.4.0+776c994 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:0218 |