This bug was initially created as a copy of Bug #1804083 I am copying this bug because: Description of problem: When trying to scaleup a RHEL worker to existing UPI on OSP cluster, the ansible procedure stuck at the loop `TASK [openshift_node : Wait for bootstrap endpoint to show up]`. my scenario is, 1. I create the RHEL worker with the same subnet for the RHCOS worker, 2. I also create a floating ip for the RHEL worker to make the worker can be sshed from the outside for jenkins slave After I add new rules for the external_network subnet range(which the RHEL worker floating ip belong to), the openshift-ansbile back to work. Version-Release number of the following components: rpm -q openshift-ansible rpm -q ansible ansible --version How reproducible: Steps to Reproduce: 1. Setup a UPI on OSP cluster according to https://github.com/openshift/installer/blob/release-4.4/docs/user/openstack/install_upi.md 2. Scaleup a RHEL worker according to https://github.com/openshift/openshift-ansible/blob/release-4.4/README.md 3. Actual results: TASK [openshift_node : Wait for bootstrap endpoint to show up] ***************** Tuesday 18 February 2020 10:56:10 +0800 (0:00:00.406) 0:03:09.745 ****** FAILED - RETRYING: Wait for bootstrap endpoint to show up (60 retries left). FAILED - RETRYING: Wait for bootstrap endpoint to show up (59 retries left). ... FAILED - RETRYING: Wait for bootstrap endpoint to show up (2 retries left). FAILED - RETRYING: Wait for bootstrap endpoint to show up (1 retries left). fatal: [wjuos442181-5sf8q-rhel-0.wjuos442181.qe.devcluster.openshift.com]: FAILED! => {"attempts": 60, "changed": false, "content": "", "elapsed": 30, "msg": "Status code was -1 and not [200]: Request failed: <urlopen error timed out>", "redirected": false, "status": -1, "url": "https://api.wjuos442181.qe.devcluster.openshift.com:22623/config/worker"} PLAY RECAP ********************************************************************* localhost : ok=1 changed=1 unreachable=0 failed=0 skipped=3 rescued=0 ignored=0 wjuos442181-5sf8q-rhel-0.wjuos442181.qe.devcluster.openshift.com : ok=15 changed=9 unreachable=0 failed=1 skipped=2 rescued=0 ignored=0 Tuesday 18 February 2020 11:37:05 +0800 (0:40:54.601) 0:44:04.346 ****** =============================================================================== openshift_node : Wait for bootstrap endpoint to show up -------------- 2454.60s openshift_node : Install openshift support packages ------------------- 121.96s openshift_node : Install openshift packages ---------------------------- 60.98s openshift_node : Get cluster nodes -------------------------------------- 1.20s openshift_node : Setting sebool container_manage_cgroup ----------------- 1.13s openshift_node : Enable the CRI-O service ------------------------------- 0.75s openshift_node : Get kubernetes server version -------------------------- 0.63s openshift_node : Enable IP Forwarding ----------------------------------- 0.43s openshift_node : Enable persistent storage on journal ------------------- 0.43s openshift_node : Create temp directory ---------------------------------- 0.41s openshift_node : Disable swap ------------------------------------------- 0.40s openshift_node : Get cluster version ------------------------------------ 0.36s openshift_node : Disable firewalld service ------------------------------ 0.32s openshift_node : include_tasks ------------------------------------------ 0.12s openshift_node : Fail if new_workers group contains active nodes -------- 0.08s openshift_node : Set fact l_kubernetes_version -------------------------- 0.08s openshift_node : include_tasks ------------------------------------------ 0.08s openshift_node : Set fact l_cluster_version ----------------------------- 0.07s openshift_node : Override kubernetes version when running CI ------------ 0.07s openshift_node : Override cluster version when running CI --------------- 0.07s Expected results: Additional info: Please attach logs from ansible-playbook with the -vvv flag
weiwei Is this system still available? Also to recapture what you did to the system: Did you add a security rule to allow the floating ip of the rhel worker? Or did you somehow point the rhel worker to the internal dns? Thanks. There is a couple of ways of solving this problem and I want to document the one you tested. Thanks.
*** This bug has been marked as a duplicate of bug 1804083 ***