Bug 1943599

Summary: CAPO Port Re-Use logic lacks basic sanity checks, and prevents multiple NICs from being created on the same network
Product: OpenShift Container Platform Reporter: egarcia
Component: InstallerAssignee: Matthew Booth <mbooth>
Installer sub component: OpenShift on OpenStack QA Contact: Jon Uriarte <juriarte>
Status: CLOSED DUPLICATE Docs Contact:
Severity: low    
Priority: low CC: m.andre, mbooth, pprinett
Version: 4.8Keywords: Reopened
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-12-02 15:13:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description egarcia 2021-03-26 14:49:26 UTC
https://github.com/openshift/cluster-api-provider-openstack/commit/42ae205f8c9046f6b95490d39fb531221bddd274#diff-74426b0daa2349d4dffb9633a1a3ef807c0136fc398d0275213a4734c336f067R290-R313

There isn't a good way for CAPO to lock other async processes or other CAPO threads from taking the same ports. This could potentially lead to a race condition when trying to claim the same port that would result in a failed deployment. Instead CAPO should focus on making sure to create and destroy the resources it needs for a given machine.

Comment 1 egarcia 2021-03-26 14:53:25 UTC
ACKed by Kuryr, not a blocker for them.

Comment 3 egarcia 2021-03-29 15:15:12 UTC
2 new pieces of info:

1. CAPO and CAPI is completely sequential
2. The use pattern of looking up a resource by name and using it if found is used throughout the library, and even now in upstream

I think that the chance of a race is very low, it might be better off to leave this for now, and engage with upstream to figure this out if we want to change it.

Comment 4 egarcia 2021-04-01 15:32:59 UTC
Re-use logic caused the same port to be attached as an interface twice in a customer's system:

Duplicate entry 'fa:16:3e:3c:a7:ff/9291d07d-69d3-4f61-a6a3-e105dd5663e0-0' for key 'uniq_virtual_interfaces0address0deleted so Failed to allocate the network(s)

Comment 6 egarcia 2021-04-07 16:00:10 UTC
Steps to reproduce: Define a machine spec with 2 subnets that are in the same network.

Comment 8 egarcia 2021-04-07 16:01:26 UTC
Issue has been filed upstream as well: https://github.com/kubernetes-sigs/cluster-api-provider-openstack/issues/834

Comment 10 egarcia 2021-06-07 13:55:06 UTC
the naming duplication issues have been handled downstream by a separate patch. I will link it when I find it. This is not a blocker for 4.8.

Comment 13 ShiftStack Bugwatcher 2021-11-25 16:11:27 UTC
Removing the Triaged keyword because:
* the target release value is missing

* the QE automation assessment (flag qe_test_coverage) is missing

Comment 14 Matthew Booth 2021-12-02 15:13:08 UTC

*** This bug has been marked as a duplicate of bug 1955969 ***