Created attachment 1855639 [details] must-gather log Description of problem: Please also refer to bugzilla https://bugzilla.redhat.com/show_bug.cgi?id=2040933 as it is related. We are performing a OCP 4.9.17 installation on KVM. The network on the OCP nodes do not have a default gateway configured. There are static routes added to each of the nodes in order to communicate to the outside world, such as quay.io to pull down the necessary images for installation. When standing up the control plane, the master nodes will fail to come online. As noted in the journal log for one of the master nodes, the configure-ovs.sh script will fail while attempting to looking for a default gateway route: Jan 26 20:05:23 master-0.pok-106.ocptest.pok.stglabs.ibm.com configure-ovs.sh[1298]: + '[' 12 -lt 12 ']' Jan 26 20:05:23 master-0.pok-106.ocptest.pok.stglabs.ibm.com configure-ovs.sh[1298]: + extra_bridge_file=/etc/ovnk/extra_bridge Jan 26 20:05:23 master-0.pok-106.ocptest.pok.stglabs.ibm.com configure-ovs.sh[1298]: + '[' '' '!=' br-ex ']' Jan 26 20:05:23 master-0.pok-106.ocptest.pok.stglabs.ibm.com configure-ovs.sh[1298]: + '[' -f /etc/ovnk/extra_bridge ']' Jan 26 20:05:23 master-0.pok-106.ocptest.pok.stglabs.ibm.com configure-ovs.sh[1298]: + convert_to_bridge '' br-ex phys0 Jan 26 20:05:23 master-0.pok-106.ocptest.pok.stglabs.ibm.com configure-ovs.sh[1298]: + iface= Jan 26 20:05:23 master-0.pok-106.ocptest.pok.stglabs.ibm.com configure-ovs.sh[1298]: + bridge_name=br-ex Jan 26 20:05:23 master-0.pok-106.ocptest.pok.stglabs.ibm.com configure-ovs.sh[1298]: + port_name=phys0 Jan 26 20:05:23 master-0.pok-106.ocptest.pok.stglabs.ibm.com configure-ovs.sh[1298]: + ovs_port=ovs-port-br-ex Jan 26 20:05:23 master-0.pok-106.ocptest.pok.stglabs.ibm.com configure-ovs.sh[1298]: + ovs_interface=ovs-if-br-ex Jan 26 20:05:23 master-0.pok-106.ocptest.pok.stglabs.ibm.com configure-ovs.sh[1298]: + default_port_name=ovs-port-phys0 Jan 26 20:05:23 master-0.pok-106.ocptest.pok.stglabs.ibm.com configure-ovs.sh[1298]: + bridge_interface_name=ovs-if-phys0 Jan 26 20:05:23 master-0.pok-106.ocptest.pok.stglabs.ibm.com configure-ovs.sh[1298]: + '[' '' = br-ex ']' Jan 26 20:05:23 master-0.pok-106.ocptest.pok.stglabs.ibm.com configure-ovs.sh[1298]: + '[' -z '' ']' Jan 26 20:05:23 master-0.pok-106.ocptest.pok.stglabs.ibm.com configure-ovs.sh[1298]: + echo 'ERROR: Unable to find default gateway interface' Jan 26 20:05:23 master-0.pok-106.ocptest.pok.stglabs.ibm.com configure-ovs.sh[1298]: ERROR: Unable to find default gateway interface Jan 26 20:05:23 master-0.pok-106.ocptest.pok.stglabs.ibm.com configure-ovs.sh[1298]: + exit 1 Jan 26 20:05:23 master-0.pok-106.ocptest.pok.stglabs.ibm.com systemd[1]: ovs-configuration.service: Main process exited, code=exited, status=1/FAILURE Jan 26 20:05:23 master-0.pok-106.ocptest.pok.stglabs.ibm.com systemd[1]: ovs-configuration.service: Failed with result 'exit-code'. Jan 26 20:05:23 master-0.pok-106.ocptest.pok.stglabs.ibm.com systemd[1]: Failed to start Configures OVS with proper host networking configuration. Jan 26 20:05:23 master-0.pok-106.ocptest.pok.stglabs.ibm.com systemd[1]: ovs-configuration.service: Consumed 355ms CPU time This is a single NIC setup, with no default gateway and should be a valid network configuration. But there seems to be a requirement to have at least one default gateway route. Version-Release number of selected component (if applicable): OCP 4.9.17 RHCOS 4.9.0 How reproducible: Consistently reproducible. Steps to Reproduce: 1. Perform OCP 4.9.17 installation 2. Start bootstrap and master nodes with no default gw defined. 3. Master nodes will fail when running the configure-ovs.sh script since it is dependent on picking the correct interface based on the default gateway. Actual results: Bootstrap and master (control plane) nodes will boot. Master nodes will fail to come online and report any status. Expected results: All of the bootstrap, master (control plane), and worker (compute) nodes should all successfully install the RHCOS build successfully and become Ready. Additional info: Attached journal logs from bootstrap-0 and master-0 nodes. A must-gather log is also attached.
Created attachment 1855647 [details] bootstrap-0 journal log
Created attachment 1855648 [details] master-0 journal log
does it specify in the docs that the default gateway can be omitted? is this a valid configuration ?
Hi Prashanth, I have verified with our STSM that having no default gateway set at all is absolutely not an invalid network configuration. This issue was originally found while performing a OCP 4.8.14 installation for zVM under his environment. I was able to replicate this on ours within KVM. There are two instances of "default gateway" mentioned within the OCP install documentation: https://docs.openshift.com/container-platform/4.9/installing/installing_ibm_z/installing-ibm-z.html#installation-user-infra-machines-routing-bonding_installing-ibm-z Currently, it is not mentioned as a requirement.
Moving this bug to the Doc component per conversation from: https://coreos.slack.com/archives/C0138QKKYTU/p1643286051024000 Re-assigning to Silke for future documentation update. Hi Prashanth, Muhammad, and Phil, please feel free to include any information or requirements that Silke needs to update the documentation.
Created a PR to update the docs https://github.com/openshift/openshift-docs/pull/41848/
Please check whether the fix is applicable from 4.6.z+ versions onwards. QA ack required
Making comment 8 un-private as Silke does not have the access to view private comments.
Docs PR is merged and this applies to OCP 4.10 and later only.