1848945 – [OpenShift on OpenStack] Routes failed to access once ingress port VIP is on RHEL worker

Bug 1848945 - [OpenShift on OpenStack] Routes failed to access once ingress port VIP is on RHEL worker

Summary: [OpenShift on OpenStack] Routes failed to access once ingress port VIP is on ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Installer
Sub Component:
Version:	4.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	low
Target Milestone:	---
Target Release:	4.6.0
Assignee:	Eric Duen
QA Contact:	weiwei jiang
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1855055 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-06-19 10:33 UTC by weiwei jiang
Modified:	2021-02-22 13:54 UTC (History)
CC List:	12 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-02-22 13:54:32 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift installer pull 4205	0	None	closed	Bug 1848945: OpenStack - Documentation for adding worker nodes using ansible	2021-02-18 21:04:21 UTC
Red Hat Product Errata	RHBA-2021:0510	0	None	None	None	2021-02-22 13:54:49 UTC

Description weiwei jiang 2020-06-19 10:33:28 UTC

Description of problem:

Install UPI on OSP and scaleup with RHEL worker.
All routes will not be accessible if ingress port VIP is on RHEL worker.

But ingress port VIP on RHCOS worker work well.

Version-Release number of the following components:
4.4.0-0.nightly-2020-06-18-212632

How reproducible:
Always

Steps to Reproduce:
1. Install UPI on OSP
2. Scaleup with RHEL worker
3. Rollout new deployments for router in openshift-ingress
4. Make sure ingress port VIP is on RHEL worker
$ oc debug nodes/wj44uos619a-jlxxg-rhel-0 -- chroot /host ip addr show eth0 1 ↵
Starting pod/wj44uos619a-jlxxg-rhel-0-debug ...
To use host binaries, run `chroot /host`
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc pfifo_fast state UP group default qlen 1000
link/ether fa:16:3e:b8:33:93 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.79/18 brd 192.168.63.255 scope global noprefixroute dynamic eth0
valid_lft 84370sec preferred_lft 84370sec
inet 192.168.0.7/18 scope global secondary eth0
valid_lft forever preferred_lft forever
inet6 fe80::f816:3eff:feb8:3393/64 scope link
valid_lft forever preferred_lft forever
5. try to access the web console or other routes

Actual results:
5. $ curl https://console-openshift-console.apps.wj44uos619a.qe.devcluster.openshift.com/ -v -k
* Trying 10.0.97.74:443...
* TCP_NODELAY set
* connect to 10.0.97.74 port 443 failed: No route to host
* Failed to connect to console-openshift-console.apps.wj44uos619a.qe.devcluster.openshift.com port 443: No route to host
* Closing connection 0
curl: (7) Failed to connect to console-openshift-console.apps.wj44uos619a.qe.devcluster.openshift.com port 443: No route to host

Expected results:
All routes should be accessible

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 zhaozhanqi 2020-07-09 07:08:17 UTC

*** Bug 1855055 has been marked as a duplicate of this bug. ***

Comment 3 Hongan Li 2020-07-09 08:16:45 UTC

the workaround is rescheduling the router pod to rhcos node and make the ingress VIP migrate to rhcos node.
 
To reschedule the router pod, we can delete the router pod on RHEL worker.

Comment 4 Hongan Li 2020-07-09 09:32:16 UTC

To avoid the router pod is scheduled to RHEL worker during upgrade, another more reasonable workaround is adding the label "node.openshift.io/os_id: rhcos" to ingresscontroller before upgrade. 

$ oc -n openshift-ingress-operator edit ingresscontroller/default -o yaml
spec:
  nodePlacement:
    nodeSelector:
      matchLabels:
        kubernetes.io/os: linux
        node-role.kubernetes.io/worker: ""
        node.openshift.io/os_id: rhcos

Comment 7 Martin André 2020-07-09 13:47:26 UTC

Potentially a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1804083? You need to ensure that the RHEL nodes are able to access the cluster's API.

Comment 20 Adolfo Duarte 2020-09-10 14:21:14 UTC

Lowering the priority to low. this is not a blocker for 4.6.

Comment 30 errata-xmlrpc 2021-02-22 13:54:32 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6.18 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0510

Note You need to log in before you can comment on or make changes to this bug.