Description of problem: When running the openshift-ansible/playbooks/openshift-etcd/scaleup.yml playbook to add 2 etcds to an existing OCP 3.10.0-0.38.0 HA CRI-O cluster (rpm install), we get the following error: TASK [etcd : Add new etcd members to cluster] *************************************************************************************************** task path: /root/openshift-ansible/roles/etcd/tasks/add_new_member.yml:5 fatal: [<new_etcd_hostname_fqdn>]: FAILED! => { "msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'etcd_ip'\n\nThe error appears to have been in '/root/openshift-ansible/roles/etcd/tasks/add_new_member.yml': line 5, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Add new etcd members to cluster\n ^ here\n\nexception type: <class 'ansible.errors.AnsibleUndefinedVariable'>\nexception: 'dict object' has no attribute 'etcd_ip'" } The scaleup.yaml playbook was executed from a cloned openshift-ansible repo, on one of the master/etcd hosts. This is on AWS EC2, the HA cluster has 1 load-balancer, 3 master/etcds nodes, 2 infra nodes, 4 compute nodes, and 2 newly added master nodes which were added successfully with the openshift-master/scaleup.yaml playbook. The 2 etcds I am trying to add are on the 2 newly added master node instances. Version-Release number of selected component (if applicable): ~/openshift-ansible # git log --oneline -1 df9f5ed Merge pull request #8299 from mgleung/upgrade-calico-master # openshift version openshift v3.10.0-0.38.0 kubernetes v1.10.0+b81c8f8 etcd 3.2.16 Container Runtime Version: cri-o://1.10.1 # rpm -qva | grep cri-o cri-o-1.10.1-2.git728df92.el7.x86_64 # rpm -q openshift-ansible openshift-ansible-3.10.0-0.38.0.git.7.848b045.el7.noarch # rpm -q ansible ansible-2.4.3.0-1.el7ae.noarch # ansible --version ansible 2.4.3.0 config file = /etc/ansible/ansible.cfg configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules'] ansible python module location = /usr/lib/python2.7/site-packages/ansible executable location = /usr/bin/ansible python version = 2.7.5 (default, May 4 2018, 09:38:16) [GCC 4.8.5 20150623 (Red Hat 4.8.5-34)] How reproducible: Always Steps to Reproduce: 1. Install an OCP 3.10.0-0.38.0 HA CRI-O (rpm install) cluster (1 load-balancer, 3 master/etcds nodes, 2 infra nodes, 4 compute nodes) in AWS EC2 with openshift-ansible/ with inventory attached: Run: - ansible-playbook -i inv openshift-ansible/playbooks/prerequisites.yml - ansible-playbook -i inv run openshift-ansible/playbooks/deploy_cluster.yml 2.Create two new instances in AWS EC2 3. Update inventory to add two master nodes (attached "master_scaleup_inventory"). 4. On one master/etcd host, run: ansible-playbook -i master_scaleup_inv openshift-ansible/playbooks/prerequisites.yml 5. ansible-playbook -i master_scaleup_inv openshift-ansible/playbooks/openshift-master/scaleup.yml 6. Verify cluster is up and 2 newly added masters are running as static pods: oc get pods -n kube-system 7. Create etcd_scaleup inventory with the 2 newly added master hostnames 8. Run etcd_scaleup inventory ansible-playbook -i etcd_scaleup_inv openshift-ansible/playbooks/openshift-etcd/scaleup.yml 9. I have also tried to re-run the prerequistes.yaml playbook before the openshift-etcd/scaleup.yml playbook and I am getting the same error in the same task Actual results: Error in task: TASK [etcd : Add new etcd members to cluster] (see "Description of problem" above) Expected results: Playbook successful completion and 2 added etcds running as static pods on the two newly added master nodes Additional info: Inventory files and logs from ansible-playbook with the -vvv flag are in next comment
*** Bug 1582230 has been marked as a duplicate of this bug. ***
*** Bug 1587882 has been marked as a duplicate of this bug. ***
The attached case is not specific to CRI-O but I believe it's the same root cause. The only customer case attached to this indicates that they have a workaround or at least it's not a blocker for them. I don't think Urgent is the appropriate severity for this BZ.
master: https://github.com/openshift/openshift-ansible/pull/10002
release-3.10: https://github.com/openshift/openshift-ansible/pull/10016
A fix is proposed for 1628201 which will be backported to 3.10 once it is verified.
Proposed: https://github.com/openshift/openshift-ansible/pull/10167 (release-3.10)
Test with openshift-ansible-3.10.51-1.git.0.44a646c.el7.noarch.rpm. 1) New etcd collocated with master When scaling-up etcd member on existing master hosts, just like the scenario mentioned in Description, it could work well. New etcd members were added, running as static pod, new etcd url added into etcdClientInfo.urls on all masters. 2) New etcd not collocated with master When trying to scale-up etcd member on new hosts, the scale-up playbook on the 1st new etcd: TASK [etcd : Verify cluster is healthy] **************************************** ... FAILED - RETRYING: Verify cluster is healthy (1 retries left). fatal: [ec2-54-211-178-50.compute-1.amazonaws.com]: FAILED! => {"attempts": 30, "changed": false, "cmd": "/usr/local/bin/master-exec etcd etcd etcdctl --cert-file /etc/etcd/peer.crt --key-file /etc/etcd/peer.key --ca-file /etc/etcd/ca.crt --endpoints https://ip-172-18-6-78.ec2.internal:2379 cluster-health", "msg": "[Errno 2] No such file or directory", "rc": 2} The new etcd member was installed as rpm etcd, it doesn't have static master scripts. Attached the full ansible log below.
New 3.10 PR: https://github.com/openshift/openshift-ansible/pull/10255
Running into the same issue as Gaoyun, with 3 standalone etcd's and 3 masters. Taking down one etcd and scaling up with a new etcd fails at the same step with the same error (No such file or directory). Was running into the etcd_ip error until I started specifying "openshift_version" and "openshift_image_tag" in addition to "openshift_release" in the vars section of my inventory.
openshift-ansible-3.10.53-1
Verify this bug with openshift-ansible-3.10.53-1.git.0.ba2c2ec.el7.noarch.rpm New etcd members could be added successfully for the following scenarios * New etcd collocated with masters * New etcd not collocated with masters
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2709