Bug 1936443 - Hive based OCP IPI baremetal installation fails to connect to API VIP port 22623
Summary: Hive based OCP IPI baremetal installation fails to connect to API VIP port 22623
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.8
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ---
: 4.8.0
Assignee: Steven Hardy
QA Contact: Amit Ugol
URL:
Whiteboard:
Depends On:
Blocks: dit 1935163 1939417 1940275
TreeView+ depends on / blocked
 
Reported: 2021-03-08 14:09 UTC by Ulrich Schlueter
Modified: 2023-09-15 01:33 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1939417 (view as bug list)
Environment:
Last Closed: 2021-07-27 22:51:42 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift installer pull 4722 0 None closed Bug 1936443: Revert "baremetal: send full ignition to masters" 2021-05-07 08:41:41 UTC
Red Hat Knowledge Base (Solution) 5871571 0 None None None 2021-03-10 07:22:30 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 22:52:19 UTC

Description Ulrich Schlueter 2021-03-08 14:09:29 UTC
Thanks for opening a bug report!
Before hitting the button, please fill in as much of the template below as you can.
If you leave out information, it's harder to help you.
Be ready for follow-up questions, and please respond in a timely manner.
If we can't reproduce a bug we might close your issue.
If we're wrong, PLEASE feel free to reopen it and explain why.

Version:
4.7.0

$ openshift-install version
4.7.0

Platform:

baremetal 

Please specify:
IPI

What happened?

Running the installer from hive in a private lab set fails with 
"Error: could not fetch data from user_data_url: GET https://XXXXXXXXXX:22623/config/master giving up after 5 attempts" (one error per master) , where XXXXXXXXXX is the API VIP configured in the install-config.yaml file. The URL is accessible from the provisioning host, bootstrap vm and other places, but not from pods in the hive cluster. It looks like iptables rules from here https://github.com/openshift/ovn-kubernetes/blob/master/go-controller/pkg/cni/OCP_HACKS.go are blocking the access. 



What did you expect to happen?

Connection to external ports to work or other connection in place

How to reproduce it (as minimally and precisely as possible)?

4.7.0 baremetal installation

Comment 2 Steven Hardy 2021-03-08 14:46:18 UTC
I suspect this is related to changes made during 4.7 to collect the MCS rendered config via terraform, the aim being to pass the full configuration via the Ironic config drive (so that common network configurations like bond+vlans become possible).

https://github.com/openshift/installer/pull/4427

However in the hive case it's likely there are firewall rules that prevent access to the MCS port, and this is probably blocking access to the MCS on the bootstrap VM.

I proposed a revert ref https://github.com/openshift/installer/pull/4722 since the full-ignition approach didn't work out for workers, so it ended up being a partial solution to the bond+vlan requirement (we're looking into alternatives)

Comment 5 Lars Kellogg-Stedman 2021-03-09 03:46:00 UTC
> However in the hive case it's likely there are firewall rules that prevent access to the MCS port, and this is probably blocking access to the MCS on the bootstrap VM.

In particular, there are rules both on the host *and* in the pod namespace that block access to port 22623. E.g. with a shell in a pod at PID 2451918, we see:

  [root@os-ctrl-2 ~]# nsenter -t 2451918 -n iptables -S
  -P INPUT ACCEPT
  -P FORWARD ACCEPT
  -P OUTPUT ACCEPT
  -A FORWARD -p tcp -m tcp --dport 22623 --tcp-flags FIN,SYN,RST,ACK SYN -j REJECT --reject-with icmp-port-unreachable
  -A FORWARD -p tcp -m tcp --dport 22624 --tcp-flags FIN,SYN,RST,ACK SYN -j REJECT --reject-with icmp-port-unreachable
  [...]
  -A OUTPUT -p tcp -m tcp --dport 22623 --tcp-flags FIN,SYN,RST,ACK SYN -j REJECT --reject-with icmp-port-unreachable
  -A OUTPUT -p tcp -m tcp --dport 22624 --tcp-flags FIN,SYN,RST,ACK SYN -j REJECT --reject-with icmp-port-unreachable
  [...]

Comment 6 Pablo Iranzo Gómez 2021-03-10 11:17:00 UTC
Will this be backported to 4.7 or only 4.8+ ?

Comment 13 Steven Hardy 2021-03-16 11:28:18 UTC
(In reply to Pablo Iranzo Gómez from comment #6)
> Will this be backported to 4.7 or only 4.8+ ?

I am planning to backport it to 4.7, I'll clone this bug and propose the backport manually since the automated cherry-pick on https://github.com/openshift/installer/pull/4722 failed

Comment 14 Amit Ugol 2021-03-22 11:00:49 UTC
*** Bug 1932799 has been marked as a duplicate of this bug. ***

Comment 17 Alexander Chuzhoy 2021-05-11 14:14:23 UTC
Verified:

Was able to deploy a spoke cluster with hive.

Hub cluster version:
4.8.0-0.nightly-2021-04-30-201824

Hive version:
hive-operator.v1.1.2

Comment 25 errata-xmlrpc 2021-07-27 22:51:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438

Comment 26 Red Hat Bugzilla 2023-09-15 01:33:10 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days


Note You need to log in before you can comment on or make changes to this bug.