1905134 – Unable to create a MachineSet when the nodes to create are in a different subnet

Bug 1905134 - Unable to create a MachineSet when the nodes to create are in a different subnet

Summary: Unable to create a MachineSet when the nodes to create are in a different subnet

Keywords:
Status:	CLOSED DUPLICATE of bug 1894539
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Machine Config Operator
Sub Component:
Version:	4.5
Hardware:	All
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Antonio Murdaca
QA Contact:	Michael Nguyen
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-12-07 15:58 UTC by Emmanuel Kasper
Modified:	2020-12-11 09:04 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-12-10 17:25:49 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Emmanuel Kasper 2020-12-07 15:58:18 UTC

Description of problem:
Unable to create a MachineSet when the nodes to create are in a different subnet

Version-Release number of selected component (if applicable): 4.5


How reproducible: always


Steps to Reproduce:
1. Configure  2 vlans and subnets (for now): Masters, Workers.
2. The Masters and Workers subnets/vlans  have *no* firewall rules.
3. The two vlans are configured in RHV-M, The Openshift 4.5 cluster was successfully installed in the Masters vlan.
4. To assign workers to the workers VLAN, create a Machine Set like this:
          os_disk:
            size_gb: 120
          template_name: my-template
          network_interfaces:
            - vnic_profile_id: "acc8cbbe-bf66-4a8f-a237-0893a009c00a"
5. New VMs are successfully created in RHV and configured with the NIC configured to the new Workers vnic_profile_id. The VM starts and the configuration of the new node begins.
6. However the issue starts when "keepalived" is attempted to be started on the new VM. Because this new node resides in the Workers subnet which is a different subnet from the Masters subnet, keepalived exits with an error: "Failed due to No interface nor address found for the given VIPs"

Actual results: 
oc get nodes do not list the new worker nodes

Expected results:
Nodes are successfully created, and its possible to list them with oc get nodes

Comment 1 Emmanuel Kasper 2020-12-07 16:17:05 UTC

Looking at the code in https://github.com/openshift/machine-config-operator/blob/3add4b96e379d34aa953c8bde82bf0c61054be6d/pkg/operator/bootstrap.go#L256
the keepalived.conf template is stored at 
filename: "ovirt/static-pod-resources/keepalived/keepalived.conf.tmpl",
so I suppose it is a static pod

Now if this is a static pod, I can't see how the feature "[ovirt] support network interfaces in machine spec" as implemented in https://bugzilla.redhat.com/show_bug.cgi?id=1830852
should work, as if the worker node is created in a different subnet, keepalived will fail on startup, and the node won't come up.
But maybe I am missing something.

Comment 2 Janos Bonic 2020-12-07 16:40:25 UTC

Thank you for submitting this bug report. I'm investigating if this scenario is actually supported. It may be the case that we need to be able to move the VIP from the master to the worker nodes, or we might have to run two different sets of keepaliveds for the two VIPs. I'll get back to you as soon as I have an answer.

Comment 3 Emmanuel Kasper 2020-12-07 16:51:20 UTC

I might have not been very clear, but actually I suppose the problem here is I suppose is that the VIP for the Ingress Controller are potentially assigned to all worker nodes, hence the requirement to have keepalived on all worker nodes.
I don't see indeed why we would want to have the VIP for the API server on the worker nodes.

Comment 4 Emmanuel Kasper 2020-12-10 17:25:49 UTC


*** This bug has been marked as a duplicate of bug 1894539 ***

Note You need to log in before you can comment on or make changes to this bug.