Description of problem: In a disconnected UPI environment, while adding an RHEL worker node in the existing cluster, the scale-up playbook failed on the below error. -------- TASK [openshift_node : Wait for bootstrap endpoint to show up] ******************************************************************************************* FAILED - RETRYING: Wait for bootstrap endpoint to show up (60 retries left). FAILED - RETRYING: Wait for bootstrap endpoint to show up (59 retries left). FAILED - RETRYING: Wait for bootstrap endpoint to show up (58 retries left). FAILED - RETRYING: Wait for bootstrap endpoint to show up (57 retries left). --- FAILED - RETRYING: Wait for bootstrap endpoint to show up (1 retries left). fatal: [worker2.xy.com]: FAILED! => {"attempts": 60, "changed": false, "content": "", "elapsed": 0, "msg": "Status code was -1 and not [200]: Request failed: <urlopen error [Errno 111] Connection refused>", "redirected": false, "status": -1, "url": "https://api.<cluster_name>.<domain>.com:22623/config/worker"} ------ [+] https://github.com/openshift/openshift-ansible/blob/release-4.2/roles/openshift_node/tasks/config.yml#L42 This environment had 2 LB(external and internal). These LBs are haproxy based and Machine Config server (for port 22623) on external LB is not defined as per installation docs: https://docs.openshift.com/container-platform/4.2/installing/installing_bare_metal/installing-bare-metal.html Actual results: Node scale up playbook seems to be looking for 22623 bootstrap endpoint but trying to reach via the "https://api.<cluster_name>.<domain>.com:22623/config/worker" because as per the documentation[1], it says 22623 does not need to be opened on external LB. Expected results: Either a worker node should get added without adding 22623 port on the external LB or documentation needs to be clear enough on what all ports are required at the external LB. [1] https://docs.openshift.com/container-platform/4.2/installing/installing_bare_metal/installing-bare-metal.html Additional Info: ** As a workaround, we made changes in the external haproxy.conf and added 22623 in the frontend section just like it's there in internal LB configuration. Then the question arises is, what is the purpose does it serve to have 2 LBs with identical configurations? Can we not have just one LB with 2 NICs? ** Also, based on the below Bugzilla, we did not recommend enabling firewalld on RHEL worker nodes. However, the BZ status is closed with errata and it says the issue is fixed with: openshift-ansible-4.2.0-201908142219.git.188.7254b39.el7. Have we tested the worker node scaling having firewalld enabled? [+] https://github.com/openshift/openshift-ansible/blob/release-4.2/roles/openshift_node/tasks/config.yml#L19 ---------- # The base OS RHEL with "Minimal" installation option is # enabled firewalld serivce by default, it denies unexpected 10250 port. # Reference: https://bugzilla.redhat.com/show_bug.cgi?id=1740439 <---- closed with an errata - name: Disable firewalld service systemd: name: "firewalld.service" enabled: false register: service_status failed_when: - service_status is failed - not ('Could not find the requested service' in service_status.msg) ----------
Setting to current development branch (4.4). For fixes, if any, required/requested for prior versions, clones of this BZ will be created targeting those z-streams.
Hello, I am facing this issue with 4.4.0 openshift-ansible on RHOS IPI provisioned cluster. I can see that target version has changed to 4.5 ... is there any easier workaround than adding external load balancer?
This bug was fixed in https://github.com/openshift/openshift-ansible/pull/12099
Verify this bug with openshift-ansible-4.5.0-202003062301.git.0.dc37bae.el7.noarch.rpm Scale-up playbook is using the internal api LB address to fetch bootstrap.ign, so no need additional 22623 opened on external LB now. TASK [openshift_node : Wait for bootstrap endpoint to show up] ***************** ... "redirected": false, "status": 200, "url": "https://api-int.gpei-45g.qe.gcp.devcluster.openshift.com:22623/config/worker"}
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409