Bug 1578482
Summary: | OCP 3.10: etcd scaleup on CRI-O HA cluster fails with dict object has no attribute etcd_ip error | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Walid A. <wabouham> | |
Component: | Installer | Assignee: | Russell Teague <rteague> | |
Status: | CLOSED ERRATA | QA Contact: | Gaoyun Pei <gpei> | |
Severity: | high | Docs Contact: | ||
Priority: | medium | |||
Version: | 3.10.0 | CC: | andreas.kunkel, aos-bugs, dmoessne, gpei, jkaur, jmalde, jokerman, mifiedle, mjahangi, mmccomas, rteague, sdodson, vlaad, wabouham, wmeng | |
Target Milestone: | --- | |||
Target Release: | 3.10.z | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | aos-scalability-310 | |||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: |
During etcd scaleup, facts about the etcd cluster are required in order to add new hosts. The necessary tasks have been added to ensure those facts are set before configuring new hosts and therefore allow the scaleup to complete as expected.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1628201 (view as bug list) | Environment: | ||
Last Closed: | 2018-11-11 16:39:10 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1628201 |
Description
Walid A.
2018-05-15 16:55:09 UTC
*** Bug 1582230 has been marked as a duplicate of this bug. *** *** Bug 1587882 has been marked as a duplicate of this bug. *** The attached case is not specific to CRI-O but I believe it's the same root cause. The only customer case attached to this indicates that they have a workaround or at least it's not a blocker for them. I don't think Urgent is the appropriate severity for this BZ. A fix is proposed for 1628201 which will be backported to 3.10 once it is verified. Proposed: https://github.com/openshift/openshift-ansible/pull/10167 (release-3.10) Test with openshift-ansible-3.10.51-1.git.0.44a646c.el7.noarch.rpm. 1) New etcd collocated with master When scaling-up etcd member on existing master hosts, just like the scenario mentioned in Description, it could work well. New etcd members were added, running as static pod, new etcd url added into etcdClientInfo.urls on all masters. 2) New etcd not collocated with master When trying to scale-up etcd member on new hosts, the scale-up playbook on the 1st new etcd: TASK [etcd : Verify cluster is healthy] **************************************** ... FAILED - RETRYING: Verify cluster is healthy (1 retries left). fatal: [ec2-54-211-178-50.compute-1.amazonaws.com]: FAILED! => {"attempts": 30, "changed": false, "cmd": "/usr/local/bin/master-exec etcd etcd etcdctl --cert-file /etc/etcd/peer.crt --key-file /etc/etcd/peer.key --ca-file /etc/etcd/ca.crt --endpoints https://ip-172-18-6-78.ec2.internal:2379 cluster-health", "msg": "[Errno 2] No such file or directory", "rc": 2} The new etcd member was installed as rpm etcd, it doesn't have static master scripts. Attached the full ansible log below. Running into the same issue as Gaoyun, with 3 standalone etcd's and 3 masters. Taking down one etcd and scaling up with a new etcd fails at the same step with the same error (No such file or directory). Was running into the etcd_ip error until I started specifying "openshift_version" and "openshift_image_tag" in addition to "openshift_release" in the vars section of my inventory. openshift-ansible-3.10.53-1 Verify this bug with openshift-ansible-3.10.53-1.git.0.ba2c2ec.el7.noarch.rpm New etcd members could be added successfully for the following scenarios * New etcd collocated with masters * New etcd not collocated with masters Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2709 |