1813123 – [UPI on OSP] RHEL worker scaleup failed due to master security group only is limit to machineNetwork

Bug 1813123 - [UPI on OSP] RHEL worker scaleup failed due to master security group only is limit to machineNetwork

Summary: [UPI on OSP] RHEL worker scaleup failed due to master security group only is ...

Keywords:
Status:	CLOSED DUPLICATE of bug 1804083
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Installer
Sub Component:
Version:	4.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.5.0
Assignee:	Adolfo Duarte
QA Contact:	David Sanz
Docs Contact:
URL:
Whiteboard:
Depends On:	1804083
Blocks:
TreeView+	depends on / blocked

Reported:	2020-03-13 02:11 UTC by weiwei jiang
Modified:	2020-04-09 14:39 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-04-09 14:39:06 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description weiwei jiang 2020-03-13 02:11:51 UTC

This bug was initially created as a copy of Bug #1804083

I am copying this bug because: 



Description of problem:
When trying to scaleup a RHEL worker to existing UPI on OSP cluster,
the ansible procedure stuck at the loop `TASK [openshift_node : Wait for bootstrap endpoint to show up]`.

my scenario is, 
1. I create the RHEL worker with the same subnet for the RHCOS worker,
2. I also create a floating ip for the RHEL worker to make the worker can be sshed from the outside for jenkins slave


After I add new rules for the external_network subnet range(which the RHEL worker floating ip belong to), the openshift-ansbile back to work.


Version-Release number of the following components:
rpm -q openshift-ansible
rpm -q ansible
ansible --version

How reproducible:

Steps to Reproduce:
1. Setup a UPI on OSP cluster according to https://github.com/openshift/installer/blob/release-4.4/docs/user/openstack/install_upi.md
2. Scaleup a RHEL worker according to https://github.com/openshift/openshift-ansible/blob/release-4.4/README.md
3.

Actual results:

TASK [openshift_node : Wait for bootstrap endpoint to show up] *****************
Tuesday 18 February 2020  10:56:10 +0800 (0:00:00.406)       0:03:09.745 ****** 
FAILED - RETRYING: Wait for bootstrap endpoint to show up (60 retries left).
FAILED - RETRYING: Wait for bootstrap endpoint to show up (59 retries left).
...
FAILED - RETRYING: Wait for bootstrap endpoint to show up (2 retries left).
FAILED - RETRYING: Wait for bootstrap endpoint to show up (1 retries left).
fatal: [wjuos442181-5sf8q-rhel-0.wjuos442181.qe.devcluster.openshift.com]: FAILED! => {"attempts": 60, "changed": false, "content": "", "elapsed": 30, "msg": "Status code was -1 and not [200]: Request failed: <urlopen error timed out>", "redirected": false, "status": -1, "url": "https://api.wjuos442181.qe.devcluster.openshift.com:22623/config/worker"}

PLAY RECAP *********************************************************************
localhost                  : ok=1    changed=1    unreachable=0    failed=0    skipped=3    rescued=0    ignored=0   
wjuos442181-5sf8q-rhel-0.wjuos442181.qe.devcluster.openshift.com : ok=15   changed=9    unreachable=0    failed=1    skipped=2    rescued=0    ignored=0   

Tuesday 18 February 2020  11:37:05 +0800 (0:40:54.601)       0:44:04.346 ****** 
=============================================================================== 
openshift_node : Wait for bootstrap endpoint to show up -------------- 2454.60s
openshift_node : Install openshift support packages ------------------- 121.96s
openshift_node : Install openshift packages ---------------------------- 60.98s
openshift_node : Get cluster nodes -------------------------------------- 1.20s
openshift_node : Setting sebool container_manage_cgroup ----------------- 1.13s
openshift_node : Enable the CRI-O service ------------------------------- 0.75s
openshift_node : Get kubernetes server version -------------------------- 0.63s
openshift_node : Enable IP Forwarding ----------------------------------- 0.43s
openshift_node : Enable persistent storage on journal ------------------- 0.43s
openshift_node : Create temp directory ---------------------------------- 0.41s
openshift_node : Disable swap ------------------------------------------- 0.40s
openshift_node : Get cluster version ------------------------------------ 0.36s
openshift_node : Disable firewalld service ------------------------------ 0.32s
openshift_node : include_tasks ------------------------------------------ 0.12s
openshift_node : Fail if new_workers group contains active nodes -------- 0.08s
openshift_node : Set fact l_kubernetes_version -------------------------- 0.08s
openshift_node : include_tasks ------------------------------------------ 0.08s
openshift_node : Set fact l_cluster_version ----------------------------- 0.07s
openshift_node : Override kubernetes version when running CI ------------ 0.07s
openshift_node : Override cluster version when running CI --------------- 0.07s

Expected results:

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 9 Adolfo Duarte 2020-03-25 09:01:00 UTC

weiwei

Is this system still available? 

Also to recapture what you did to the system: 


Did you add a security rule to allow the floating ip of the rhel worker? 

Or did you somehow point the rhel worker to the internal dns? 

Thanks. There is a couple of ways of solving this problem and I want to document the one you tested. 
Thanks.

Comment 12 Martin André 2020-04-09 14:39:06 UTC


*** This bug has been marked as a duplicate of bug 1804083 ***

Note You need to log in before you can comment on or make changes to this bug.