Bug 1393187
Summary: | etcd cluster is unavailable or misconfigured during upgrade | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Anping Li <anli> | ||||
Component: | Cluster Version Operator | Assignee: | Scott Dodson <sdodson> | ||||
Status: | CLOSED ERRATA | QA Contact: | Anping Li <anli> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 3.4.0 | CC: | aos-bugs, boris.ruppert, jiajliu, jokerman, mmccomas | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: |
Previously the upgrade playbook would in inadvertently upgrade etcd when it should not have. If this upgrade triggered an upgrade to etcd3 then the upgrade would fail as etcd would become unavailable. We no longer upgrade etcd when it's not necessary ensuring upgrades proceed successfully.
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2017-01-18 12:51:02 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Anping Li
2016-11-09 04:18:34 UTC
The same error client: etcd cluster is unavailable or misconfigured Description of problem: When upgrade current 3.3 to latest 3.3 with 3.4 quick installer, the upgrade will fail on [restart master] for cacher.go:220] unexpected ListAndWatch error: pkg/storage/cacher.go:163: Failed to list *api.Group: client: etcd cluster is unavailable or misconfigured. Master service status is activating but it does not work. # oc get node The connection to the server 192.168.2.184:8443 was refused - did you specify the right host or port? Try to restart master service manually, still fail. Version-Release number of selected component (if applicable): atomic-openshift-utils-3.4.17-1.git.0.4698b0c.el7.noarch openshift-ansible-playbooks-3.4.17-1.git.0.4698b0c.el7.noarch openshift-ansible-playbooks-3.4.17-1.git.0.4698b0c.el7.noarch How reproducible: always Steps to Reproduce: 1.Install OCP3.3 with 3.3 quick installer 2.Run upgrade with 3.4 quick installer # atomic-openshift-installer -d -c /tmp/installer.cfg.yml upgrade This tool will help you upgrade your existing OpenShift installation. Currently running: openshift-enterprise 3.3 (1) Update to latest 3.3 (2) Upgrade to next release: 3.4 Choose an option from above: 3.Choose 1 4.It will continue to run 3.3 upgrade playbook. installer - DEBUG - Going to subprocess out to ansible now with these args: ansible-playbook --inventory-file=/tmp/hosts /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_3/upgrade.yml Actual results: The 3.3 minor upgrade fail. RUNNING HANDLER [restart master] *********************************************** fatal: [openshift-151.lab.eng.nay.redhat.com]: FAILED! => { "changed": false, "failed": true } MSG: Unable to restart service atomic-openshift-master: Job for atomic-openshift-master.service failed because a timeout was exceeded. See "systemctl status atomic-openshift-master.service" and "journalctl -xe" for details. just comment, The etcd had been upgraded to etcd3 before the error jump out. Hit this issue two times until now, but not always! Yeah, I'm pretty sure this is happening because the backup step currently upgrades etcd when we don't really intend to do so. We're only installing it for backup purposes on embedded etcd environments where it wouldn't already be installed. The reason you're seeing it sometimes but not others is likely because you've got some hosts with RHEL 7.3 GA repos but some with 7.3.1 repos. In RHEL 7.3.1 etcd3 now obsoletes etcd so it would be seen as an upgrade. So for not, stop upgrading etcd when backing up etcd. https://github.com/openshift/openshift-ansible/pull/2773 Also, the fix for https://bugzilla.redhat.com/show_bug.cgi?id=1382634 stopped upgrading etcd during backups for non embedded installs too. You can see those changes here and I believe they would've fixed the issue of upgrading from 3.2 to 3.3 as well. https://github.com/openshift/openshift-ansible/pull/2745 Given that issue I believe is already fixed in 3.3 I'm moving this to 3.4 and providing the fix in comment 5. Commit pushed to master at https://github.com/openshift/openshift-ansible https://github.com/openshift/openshift-ansible/commit/bd120d5cc460fa0c0d42c388dda00c6f15ee76cd Don't upgrade etcd on backup operations Fixes Bug 1393187 Fixes BZ1393187 *** Bug 1391935 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:0066 |