Bug 1612006
| Summary: | node daemon in SIGSEGV loop | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Borja Aranda <farandac> |
| Component: | Storage | Assignee: | Tomas Smetana <tsmetana> |
| Status: | CLOSED ERRATA | QA Contact: | Liang Xia <lxia> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 3.9.0 | CC: | aos-bugs, aos-storage-staff, bbennett, bchilds, lxia, mmariyan, tsmetana, wehe |
| Target Milestone: | --- | ||
| Target Release: | 3.9.z | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: |
undefined
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-12-13 19:27:05 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Borja Aranda
2018-08-03 08:47:18 UTC
There's an apparent bug in the reconciler's reconstructVolume method that might have caused this: https://github.com/openshift/ose/blob/enterprise-3.9/vendor/k8s.io/kubernetes/pkg/kubelet/volumemanager/reconciler/reconciler.go#L487 It's been fixed in 3.10, however it's a simple one-line change, so I would consider backporting to 3.9. The process has changed: we should backport to Origin: https://github.com/openshift/origin/pull/20707 Tried with below OCP version, $ oc version oc v3.9.55 kubernetes v1.9.1+a0ce1bc657 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://qe-lxia-master-etcd-1:8443 openshift v3.9.55 kubernetes v1.9.1+a0ce1bc657 The steps used to verify, * set up a OCP cluster with 2 nodes. * prepare some pods using gluster volumes and make sure they are running. * drain one of the node. * restart node service on the drained node. * wait some time (20 seconds in my case), mark node schedulable. * check the nodes/pods. Going through the drain node/restart node service/mark node schedulable/check nodes and pods for 30+ times, and did not see broken nodes/pods. Moving bug to verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3748 |