Bug 1943599

Summary:	CAPO Port Re-Use logic lacks basic sanity checks, and prevents multiple NICs from being created on the same network
Product:	OpenShift Container Platform	Reporter:	egarcia
Component:	Installer	Assignee:	Matthew Booth <mbooth>
Installer sub component:	OpenShift on OpenStack	QA Contact:	Jon Uriarte <juriarte>
Status:	CLOSED DUPLICATE	Docs Contact:
Severity:	low
Priority:	low	CC:	m.andre, mbooth, pprinett
Version:	4.8	Keywords:	Reopened
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-12-02 15:13:08 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description egarcia 2021-03-26 14:49:26 UTC

https://github.com/openshift/cluster-api-provider-openstack/commit/42ae205f8c9046f6b95490d39fb531221bddd274#diff-74426b0daa2349d4dffb9633a1a3ef807c0136fc398d0275213a4734c336f067R290-R313

There isn't a good way for CAPO to lock other async processes or other CAPO threads from taking the same ports. This could potentially lead to a race condition when trying to claim the same port that would result in a failed deployment. Instead CAPO should focus on making sure to create and destroy the resources it needs for a given machine.

Comment 1 egarcia 2021-03-26 14:53:25 UTC

ACKed by Kuryr, not a blocker for them.

Comment 2 egarcia 2021-03-26 16:45:07 UTC

https://github.com/openshift/cluster-api-provider-openstack/pull/170

Comment 3 egarcia 2021-03-29 15:15:12 UTC

2 new pieces of info:

1. CAPO and CAPI is completely sequential
2. The use pattern of looking up a resource by name and using it if found is used throughout the library, and even now in upstream

I think that the chance of a race is very low, it might be better off to leave this for now, and engage with upstream to figure this out if we want to change it.

Comment 4 egarcia 2021-04-01 15:32:59 UTC

Re-use logic caused the same port to be attached as an interface twice in a customer's system:

Duplicate entry 'fa:16:3e:3c:a7:ff/9291d07d-69d3-4f61-a6a3-e105dd5663e0-0' for key 'uniq_virtual_interfaces0address0deleted so Failed to allocate the network(s)

Comment 6 egarcia 2021-04-07 16:00:10 UTC

Steps to reproduce: Define a machine spec with 2 subnets that are in the same network.

Comment 8 egarcia 2021-04-07 16:01:26 UTC

Issue has been filed upstream as well: https://github.com/kubernetes-sigs/cluster-api-provider-openstack/issues/834

Comment 10 egarcia 2021-06-07 13:55:06 UTC

the naming duplication issues have been handled downstream by a separate patch. I will link it when I find it. This is not a blocker for 4.8.

Comment 13 ShiftStack Bugwatcher 2021-11-25 16:11:27 UTC

Removing the Triaged keyword because:
* the target release value is missing

* the QE automation assessment (flag qe_test_coverage) is missing

Comment 14 Matthew Booth 2021-12-02 15:13:08 UTC


*** This bug has been marked as a duplicate of bug 1955969 ***