Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1936443

Summary: Hive based OCP IPI baremetal installation fails to connect to API VIP port 22623
Product: OpenShift Container Platform Reporter: Ulrich Schlueter <uschlute>
Component: InstallerAssignee: Steven Hardy <shardy>
Installer sub component: OpenShift on Bare Metal IPI QA Contact: Amit Ugol <augol>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: urgent CC: bschmaus, dguthrie, kiran, lars, ngupta, ohochman, pablo.iranzo, rbartal, sasha, shardy, trwest, tuado
Version: 4.8Keywords: Triaged
Target Milestone: ---   
Target Release: 4.8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1939417 (view as bug list) Environment:
Last Closed: 2021-07-27 22:51:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1831748, 1935163, 1939417, 1940275    

Description Ulrich Schlueter 2021-03-08 14:09:29 UTC
Thanks for opening a bug report!
Before hitting the button, please fill in as much of the template below as you can.
If you leave out information, it's harder to help you.
Be ready for follow-up questions, and please respond in a timely manner.
If we can't reproduce a bug we might close your issue.
If we're wrong, PLEASE feel free to reopen it and explain why.

Version:
4.7.0

$ openshift-install version
4.7.0

Platform:

baremetal 

Please specify:
IPI

What happened?

Running the installer from hive in a private lab set fails with 
"Error: could not fetch data from user_data_url: GET https://XXXXXXXXXX:22623/config/master giving up after 5 attempts" (one error per master) , where XXXXXXXXXX is the API VIP configured in the install-config.yaml file. The URL is accessible from the provisioning host, bootstrap vm and other places, but not from pods in the hive cluster. It looks like iptables rules from here https://github.com/openshift/ovn-kubernetes/blob/master/go-controller/pkg/cni/OCP_HACKS.go are blocking the access. 



What did you expect to happen?

Connection to external ports to work or other connection in place

How to reproduce it (as minimally and precisely as possible)?

4.7.0 baremetal installation

Comment 2 Steven Hardy 2021-03-08 14:46:18 UTC
I suspect this is related to changes made during 4.7 to collect the MCS rendered config via terraform, the aim being to pass the full configuration via the Ironic config drive (so that common network configurations like bond+vlans become possible).

https://github.com/openshift/installer/pull/4427

However in the hive case it's likely there are firewall rules that prevent access to the MCS port, and this is probably blocking access to the MCS on the bootstrap VM.

I proposed a revert ref https://github.com/openshift/installer/pull/4722 since the full-ignition approach didn't work out for workers, so it ended up being a partial solution to the bond+vlan requirement (we're looking into alternatives)

Comment 5 Lars Kellogg-Stedman 2021-03-09 03:46:00 UTC
> However in the hive case it's likely there are firewall rules that prevent access to the MCS port, and this is probably blocking access to the MCS on the bootstrap VM.

In particular, there are rules both on the host *and* in the pod namespace that block access to port 22623. E.g. with a shell in a pod at PID 2451918, we see:

  [root@os-ctrl-2 ~]# nsenter -t 2451918 -n iptables -S
  -P INPUT ACCEPT
  -P FORWARD ACCEPT
  -P OUTPUT ACCEPT
  -A FORWARD -p tcp -m tcp --dport 22623 --tcp-flags FIN,SYN,RST,ACK SYN -j REJECT --reject-with icmp-port-unreachable
  -A FORWARD -p tcp -m tcp --dport 22624 --tcp-flags FIN,SYN,RST,ACK SYN -j REJECT --reject-with icmp-port-unreachable
  [...]
  -A OUTPUT -p tcp -m tcp --dport 22623 --tcp-flags FIN,SYN,RST,ACK SYN -j REJECT --reject-with icmp-port-unreachable
  -A OUTPUT -p tcp -m tcp --dport 22624 --tcp-flags FIN,SYN,RST,ACK SYN -j REJECT --reject-with icmp-port-unreachable
  [...]

Comment 6 Pablo Iranzo Gómez 2021-03-10 11:17:00 UTC
Will this be backported to 4.7 or only 4.8+ ?

Comment 13 Steven Hardy 2021-03-16 11:28:18 UTC
(In reply to Pablo Iranzo Gómez from comment #6)
> Will this be backported to 4.7 or only 4.8+ ?

I am planning to backport it to 4.7, I'll clone this bug and propose the backport manually since the automated cherry-pick on https://github.com/openshift/installer/pull/4722 failed

Comment 14 Amit Ugol 2021-03-22 11:00:49 UTC
*** Bug 1932799 has been marked as a duplicate of this bug. ***

Comment 17 Alexander Chuzhoy 2021-05-11 14:14:23 UTC
Verified:

Was able to deploy a spoke cluster with hive.

Hub cluster version:
4.8.0-0.nightly-2021-04-30-201824

Hive version:
hive-operator.v1.1.2

Comment 25 errata-xmlrpc 2021-07-27 22:51:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438

Comment 26 Red Hat Bugzilla 2023-09-15 01:33:10 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days