1943599 – CAPO Port Re-Use logic lacks basic sanity checks, and prevents multiple NICs from being created on the same network

Bug 1943599 - CAPO Port Re-Use logic lacks basic sanity checks, and prevents multiple NICs from being created on the same network

Summary: CAPO Port Re-Use logic lacks basic sanity checks, and prevents multiple NICs ...

Keywords:
Status:	CLOSED DUPLICATE of bug 1955969
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Installer
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	low
Target Milestone:	---
Target Release:	---
Assignee:	Matthew Booth
QA Contact:	Jon Uriarte
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-03-26 14:49 UTC by egarcia
Modified:	2021-12-02 15:13 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-12-02 15:13:08 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-api-provider-openstack pull 175	0	None	closed	Bug 1948546: Port create bugs	2021-05-25 14:32:08 UTC

Description egarcia 2021-03-26 14:49:26 UTC

https://github.com/openshift/cluster-api-provider-openstack/commit/42ae205f8c9046f6b95490d39fb531221bddd274#diff-74426b0daa2349d4dffb9633a1a3ef807c0136fc398d0275213a4734c336f067R290-R313

There isn't a good way for CAPO to lock other async processes or other CAPO threads from taking the same ports. This could potentially lead to a race condition when trying to claim the same port that would result in a failed deployment. Instead CAPO should focus on making sure to create and destroy the resources it needs for a given machine.

Comment 1 egarcia 2021-03-26 14:53:25 UTC

ACKed by Kuryr, not a blocker for them.

Comment 2 egarcia 2021-03-26 16:45:07 UTC

https://github.com/openshift/cluster-api-provider-openstack/pull/170

Comment 3 egarcia 2021-03-29 15:15:12 UTC

2 new pieces of info:

1. CAPO and CAPI is completely sequential
2. The use pattern of looking up a resource by name and using it if found is used throughout the library, and even now in upstream

I think that the chance of a race is very low, it might be better off to leave this for now, and engage with upstream to figure this out if we want to change it.

Comment 4 egarcia 2021-04-01 15:32:59 UTC

Re-use logic caused the same port to be attached as an interface twice in a customer's system:

Duplicate entry 'fa:16:3e:3c:a7:ff/9291d07d-69d3-4f61-a6a3-e105dd5663e0-0' for key 'uniq_virtual_interfaces0address0deleted so Failed to allocate the network(s)

Comment 6 egarcia 2021-04-07 16:00:10 UTC

Steps to reproduce: Define a machine spec with 2 subnets that are in the same network.

Comment 8 egarcia 2021-04-07 16:01:26 UTC

Issue has been filed upstream as well: https://github.com/kubernetes-sigs/cluster-api-provider-openstack/issues/834

Comment 10 egarcia 2021-06-07 13:55:06 UTC

the naming duplication issues have been handled downstream by a separate patch. I will link it when I find it. This is not a blocker for 4.8.

Comment 13 ShiftStack Bugwatcher 2021-11-25 16:11:27 UTC

Removing the Triaged keyword because:
* the target release value is missing

* the QE automation assessment (flag qe_test_coverage) is missing

Comment 14 Matthew Booth 2021-12-02 15:13:08 UTC


*** This bug has been marked as a duplicate of bug 1955969 ***

Note You need to log in before you can comment on or make changes to this bug.