Description of problem: Unable to create a MachineSet when the nodes to create are in a different subnet Version-Release number of selected component (if applicable): 4.5 How reproducible: always Steps to Reproduce: 1. Configure 2 vlans and subnets (for now): Masters, Workers. 2. The Masters and Workers subnets/vlans have *no* firewall rules. 3. The two vlans are configured in RHV-M, The Openshift 4.5 cluster was successfully installed in the Masters vlan. 4. To assign workers to the workers VLAN, create a Machine Set like this: os_disk: size_gb: 120 template_name: my-template network_interfaces: - vnic_profile_id: "acc8cbbe-bf66-4a8f-a237-0893a009c00a" 5. New VMs are successfully created in RHV and configured with the NIC configured to the new Workers vnic_profile_id. The VM starts and the configuration of the new node begins. 6. However the issue starts when "keepalived" is attempted to be started on the new VM. Because this new node resides in the Workers subnet which is a different subnet from the Masters subnet, keepalived exits with an error: "Failed due to No interface nor address found for the given VIPs" Actual results: oc get nodes do not list the new worker nodes Expected results: Nodes are successfully created, and its possible to list them with oc get nodes
Looking at the code in https://github.com/openshift/machine-config-operator/blob/3add4b96e379d34aa953c8bde82bf0c61054be6d/pkg/operator/bootstrap.go#L256 the keepalived.conf template is stored at filename: "ovirt/static-pod-resources/keepalived/keepalived.conf.tmpl", so I suppose it is a static pod Now if this is a static pod, I can't see how the feature "[ovirt] support network interfaces in machine spec" as implemented in https://bugzilla.redhat.com/show_bug.cgi?id=1830852 should work, as if the worker node is created in a different subnet, keepalived will fail on startup, and the node won't come up. But maybe I am missing something.
Thank you for submitting this bug report. I'm investigating if this scenario is actually supported. It may be the case that we need to be able to move the VIP from the master to the worker nodes, or we might have to run two different sets of keepaliveds for the two VIPs. I'll get back to you as soon as I have an answer.
I might have not been very clear, but actually I suppose the problem here is I suppose is that the VIP for the Ingress Controller are potentially assigned to all worker nodes, hence the requirement to have keepalived on all worker nodes. I don't see indeed why we would want to have the VIP for the API server on the worker nodes.
*** This bug has been marked as a duplicate of bug 1894539 ***