Bug 1591805
Summary: | etcd pod stuck in CrashLoopBackOff after upgrade - port 2380 already in use | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Mike Fiedler <mifiedle> | ||||
Component: | Cluster Version Operator | Assignee: | Scott Dodson <sdodson> | ||||
Status: | CLOSED ERRATA | QA Contact: | liujia <jiajliu> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 3.10.0 | CC: | aos-bugs, jiajliu, jokerman, jupierce, mifiedle, mmccomas, sdodson, wmeng | ||||
Target Milestone: | --- | Flags: | jiajliu:
needinfo-
|
||||
Target Release: | 3.10.z | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | No Doc Update | |||||
Doc Text: |
undefined
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2018-07-30 20:22:32 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Mike Fiedler
2018-06-15 15:28:38 UTC
I've seen this once before and it was because etcd was still running on the host as a systemd service. Not sure how frequently this happens. I've only seen it once. Re-running the upgrade in the same configuration did not reproduce this. It did hit https://bugzilla.redhat.com/show_bug.cgi?id=1591752 which is already provisionally targeted for 3.10.0. Agree with leaving in 3.10.z for now. We'll be running upgrade tests through code freeze I've seen this happen and Justin Pierce has run into it when doing a 3.10.x to 3.10.x+1 upgrade in starter environments. Something is starting etcd on the host. When tracing through our code I noticed that we delete /etc/systemd/system/etcd.service which effectively unmasks the service. I think we should stop doing that. https://github.com/openshift/openshift-ansible/pull/9115 meet same issue as comment 4. openshift-ansible-3.10.18-1.git.314.cfe4f91.el7.noarch.rpm upgrade success for RPM install (container runtime docker-1.13.1) https://github.com/openshift/openshift-ansible/pull/9246 follow up fix from mike Fix is in openshift-ansible-3.10.21-1 Verified on openshift-ansible-3.10.21-1.git.0.6446011.el7.noarch Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2263 |