Thanks for opening a bug report!
Before hitting the button, please fill in as much of the template below as you can.
If you leave out information, it's harder to help you.
Be ready for follow-up questions, and please respond in a timely manner.
If we can't reproduce a bug we might close your issue.
If we're wrong, PLEASE feel free to reopen it and explain why.
$ openshift-install version
Running the installer from hive in a private lab set fails with
"Error: could not fetch data from user_data_url: GET https://XXXXXXXXXX:22623/config/master giving up after 5 attempts" (one error per master) , where XXXXXXXXXX is the API VIP configured in the install-config.yaml file. The URL is accessible from the provisioning host, bootstrap vm and other places, but not from pods in the hive cluster. It looks like iptables rules from here https://github.com/openshift/ovn-kubernetes/blob/master/go-controller/pkg/cni/OCP_HACKS.go are blocking the access.
What did you expect to happen?
Connection to external ports to work or other connection in place
How to reproduce it (as minimally and precisely as possible)?
4.7.0 baremetal installation
I suspect this is related to changes made during 4.7 to collect the MCS rendered config via terraform, the aim being to pass the full configuration via the Ironic config drive (so that common network configurations like bond+vlans become possible).
However in the hive case it's likely there are firewall rules that prevent access to the MCS port, and this is probably blocking access to the MCS on the bootstrap VM.
I proposed a revert ref https://github.com/openshift/installer/pull/4722 since the full-ignition approach didn't work out for workers, so it ended up being a partial solution to the bond+vlan requirement (we're looking into alternatives)
> However in the hive case it's likely there are firewall rules that prevent access to the MCS port, and this is probably blocking access to the MCS on the bootstrap VM.
In particular, there are rules both on the host *and* in the pod namespace that block access to port 22623. E.g. with a shell in a pod at PID 2451918, we see:
[root@os-ctrl-2 ~]# nsenter -t 2451918 -n iptables -S
-P INPUT ACCEPT
-P FORWARD ACCEPT
-P OUTPUT ACCEPT
-A FORWARD -p tcp -m tcp --dport 22623 --tcp-flags FIN,SYN,RST,ACK SYN -j REJECT --reject-with icmp-port-unreachable
-A FORWARD -p tcp -m tcp --dport 22624 --tcp-flags FIN,SYN,RST,ACK SYN -j REJECT --reject-with icmp-port-unreachable
-A OUTPUT -p tcp -m tcp --dport 22623 --tcp-flags FIN,SYN,RST,ACK SYN -j REJECT --reject-with icmp-port-unreachable
-A OUTPUT -p tcp -m tcp --dport 22624 --tcp-flags FIN,SYN,RST,ACK SYN -j REJECT --reject-with icmp-port-unreachable
Will this be backported to 4.7 or only 4.8+ ?
(In reply to Pablo Iranzo Gómez from comment #6)
> Will this be backported to 4.7 or only 4.8+ ?
I am planning to backport it to 4.7, I'll clone this bug and propose the backport manually since the automated cherry-pick on https://github.com/openshift/installer/pull/4722 failed
*** Bug 1932799 has been marked as a duplicate of this bug. ***
Was able to deploy a spoke cluster with hive.
Hub cluster version:
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days