Bug 2033862
Summary: | MachineSet is not scaling up due to an OpenStack error trying to create multiple ports with the same MAC address | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Vincent Lours <vlours> |
Component: | Cloud Compute | Assignee: | Martin André <m.andre> |
Cloud Compute sub component: | OpenStack Provider | QA Contact: | Itzik Brown <itbrown> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | medium | ||
Priority: | high | CC: | aos-bugs, enothen, igarciam, itbrown, kurathod, ltamagno, m.andre, mbooth, mfedosin, openshift-bugs-escalate, pprinett, ssonigra |
Version: | 4.8 | Keywords: | Triaged |
Target Milestone: | --- | Flags: | vlours:
needinfo-
|
Target Release: | 4.11.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause: A bug in Cisco ACI's neutron implementation, present in RHOSP16, causes the query for subnets belonging to a given network to return unexpected results.
Consequence: The OpenStack cluster-api-provider could potentially try to provision instances with duplicated ports on the same subnet, leading to a failed provisioning.
Fix: Add additional filtering in the OpenStack cluster-api-provider to ensure there is no more than one port per subnet.
Result: It is now possible to deploy OCP on RHOSP16 with Cisco ACI.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2022-08-10 10:40:43 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 2050064 |
Description
Vincent Lours
2021-12-18 06:49:41 UTC
Hi, while I can't say the problem isn't in CAPO, I do not believe the patch at https://github.com/openshift/cluster-api-provider-openstack/pull/181 is at fault - there must be another issue at play. I can see from the attached customer case the issue started appearing after a migration from OSP13 to OSP16. That likely means they also switched from OVS to OVN for the openstack networking and it's possible they're hitting an OVN bug (such as https://bugzilla.redhat.com/show_bug.cgi?id=1947823). It's also possible that other openshift overlays could be causing this issue. I remember a similar issue with Cisco ACI (https://bugzilla.redhat.com/show_bug.cgi?id=2002295) which I believe this customer is using. We would need more info to help us debugging. Could you provide us with a must-gather? Hi Martin, Thank you for sharing the information. The customer has updated the case saying that the workaround provided in the KCS is not in adequacy with an IPI install. Based on your last comment I will request additional information to the customer. As the Must-gather is available from the case, would it be possible to get someone assigned to this BZ? Could you also provide the problematic MachineSet? The 4.8 must-gather I was looking at only included the `hub-2m8kz-worker-0` machineset that seems to work as expected where replicas == availableReplicas. I can also see machines from this machineset would in theory only be attached to 1 subnet, assuming the filter returns only one match. Since we don't have the specific setup the only way I could verify is to scale a worker and make sure it's becoming ready. Used: OCP 4.11.0-0.nightly-2022-02-23-185405 RHOS-16.2-RHEL-8-20211129.n.1 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 |