2033862 – MachineSet is not scaling up due to an OpenStack error trying to create multiple ports with the same MAC address

Bug 2033862 - MachineSet is not scaling up due to an OpenStack error trying to create multiple ports with the same MAC address

Summary: MachineSet is not scaling up due to an OpenStack error trying to create multi...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Cloud Compute
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	medium
Target Milestone:	---
Target Release:	4.11.0
Assignee:	Martin André
QA Contact:	Itzik Brown
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	2050064
TreeView+	depends on / blocked

Reported:	2021-12-18 06:49 UTC by Vincent Lours
Modified:	2022-08-10 10:41 UTC (History)
CC List:	12 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: A bug in Cisco ACI's neutron implementation, present in RHOSP16, causes the query for subnets belonging to a given network to return unexpected results. Consequence: The OpenStack cluster-api-provider could potentially try to provision instances with duplicated ports on the same subnet, leading to a failed provisioning. Fix: Add additional filtering in the OpenStack cluster-api-provider to ensure there is no more than one port per subnet. Result: It is now possible to deploy OCP on RHOSP16 with Cisco ACI.
Clone Of:
Environment:
Last Closed:	2022-08-10 10:40:43 UTC
Target Upstream Version:
Embargoed:
Flags:	vlours: needinfo-

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-api-provider-openstack pull 218	0	None	open	Bug 2033862: Ensure subnets belong to the queried network	2022-02-02 14:44:58 UTC
Red Hat Product Errata	RHSA-2022:5069	0	None	None	None	2022-08-10 10:41:00 UTC

Description Vincent Lours 2021-12-18 06:49:41 UTC

Description of problem:
As described in the BZ 1936511 (https://bugzilla.redhat.com/show_bug.cgi?id=1936511), one customer was facing an issue when trying to deploy new VM(s) in Openstack.

Version-Release number of selected component (if applicable):
RHOCP 4.8.20

How reproducible:
It seems reproducible in Openstack environments.

Actual results:
Provisionning a new VM failed to be provisionned.

Expected results:
Having the machineset able to create the desired VM using flags.

Additional info:
This is related to the BZ 1936511, which has been closed as duplicated by the BZ 1955969 (https://bugzilla.redhat.com/show_bug.cgi?id=1955969).
The patch was included in RHOCP 4.8.3 and should be already fixed.

Would it be possible to ensure that the fix has not be reverted, or correctly implemented?

Comment 2 Martin André 2021-12-20 16:09:39 UTC

Hi, while I can't say the problem isn't in CAPO, I do not believe the patch at https://github.com/openshift/cluster-api-provider-openstack/pull/181 is at fault - there must be another issue at play.

I can see from the attached customer case the issue started appearing after a migration from OSP13 to OSP16. That likely means they also switched from OVS to OVN for the openstack networking and it's possible they're hitting an OVN bug (such as https://bugzilla.redhat.com/show_bug.cgi?id=1947823). It's also possible that other openshift overlays could be causing this issue. I remember a similar issue with Cisco ACI (https://bugzilla.redhat.com/show_bug.cgi?id=2002295) which I believe this customer is using.

We would need more info to help us debugging. Could you provide us with a must-gather?

Comment 4 Vincent Lours 2021-12-21 01:16:29 UTC

Hi Martin,

Thank you for sharing the information.

The customer has updated the case saying that the workaround provided in the KCS is not in adequacy with an IPI install.
Based on your last comment I will request additional information to the customer.

As the Must-gather is available from the case, would it be possible to get someone assigned to this BZ?

Comment 5 Martin André 2021-12-21 09:55:59 UTC

Could you also provide the problematic MachineSet? The 4.8 must-gather I was looking at only included the `hub-2m8kz-worker-0` machineset that seems to work as expected where replicas == availableReplicas. I can also see machines from this machineset would in theory only be attached to 1 subnet, assuming the filter returns only one match.

Comment 29 Itzik Brown 2022-02-24 14:48:41 UTC

Since we don't have the specific setup the only way I could verify is to scale a worker and make sure it's becoming ready.

Used:
OCP 4.11.0-0.nightly-2022-02-23-185405 
RHOS-16.2-RHEL-8-20211129.n.1

Comment 33 errata-xmlrpc 2022-08-10 10:40:43 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069

Note You need to log in before you can comment on or make changes to this bug.