2083389 – SRIOV stuck in sriovnetwork state draining

Bug 2083389 - SRIOV stuck in sriovnetwork state draining

Summary: SRIOV stuck in sriovnetwork state draining

Keywords:
Status:	CLOSED DUPLICATE of bug 2095210
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.8
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Sebastian Scheinkman
QA Contact:	zhaozhanqi
Docs Contact:
URL:
Whiteboard:
Depends On:	2095210
Blocks:
TreeView+	depends on / blocked

Reported:	2022-05-09 21:46 UTC by John Coleman
Modified:	2022-06-30 10:17 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-06-30 10:17:30 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Comment 2 Balazs Nemeth 2022-05-12 11:06:18 UTC

The reason why we have the drain lock is explained here: https://bugzilla.redhat.com/show_bug.cgi?id=1960103


If you still managed to get the cluster into a bad state (i.e. by rebooting at some intermediate state..), we should have a reproducer to specify how exactly to get into that state. I presume this is some kind of a race condition.  Only then can we try to make the code more robust for that edge case.

Comment 8 milti leonard 2022-05-19 17:07:00 UTC

@balazs i believe john shared the steps for reproducing earlier, can you pls give an update on next steps or thoughts as to what the issue here might be?

Comment 10 Balazs Nemeth 2022-05-19 17:42:43 UTC

@miltimilti



a) We do not know _why_ we get into this state currently. If we can avoid that in the first place, then we should never have had a problem.
b) I agree that there is a way to get into this state, and if you do, we have a state in which we are stuck. This we should be able to patch up.

My next step will be to provide a custom image that fixes b).

Note You need to log in before you can comment on or make changes to this bug.