1852648 – [sriov] Nodes are drained simultaneously if 2 policies are applied at the same time

Bug 1852648 - [sriov] Nodes are drained simultaneously if 2 policies are applied at the same time

Summary: [sriov] Nodes are drained simultaneously if 2 policies are applied at the sam...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.5.z
Assignee:	Peng Liu
QA Contact:	zhaozhanqi
Docs Contact:
URL:
Whiteboard:
Depends On:	1852647
Blocks:
TreeView+	depends on / blocked

Reported:	2020-07-01 03:01 UTC by Peng Liu
Modified:	2020-07-22 12:21 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1852647
Environment:
Last Closed:	2020-07-22 12:20:42 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift sriov-network-operator pull 278	0	None	closed	[release-4.5]Bug 1852648: Solve issue of nodes being drained simultaneously	2020-08-07 15:05:46 UTC
Red Hat Product Errata	RHBA-2020:2956	0	None	None	None	2020-07-22 12:21:05 UTC

Description Peng Liu 2020-07-01 03:01:31 UTC

+++ This bug was initially created as a clone of Bug #1852647 +++

Description of problem:
Nodes are drained in parallel if 2 policies are applied at the same time

Version-Release number of selected component (if applicable):
4.5

How reproducible:

Steps to Reproduce:
1. Deploy sriov network operator on a cluster with at least 2 sriov capable worker nodes

2. Apply the following policies together

```
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: policy-net-2
spec:
  resourceName: nic2
  nodeSelector:
    kubernetes.io/hostname: worker-0
    feature.node.kubernetes.io/network-sriov.capable: "true"
  priority: 99
  mtu: 9000
  numVfs: 4
  nicSelector:
    pfNames: ['ens803f0#0-0']
  isRdma: false
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: policy-net-2-vfio
spec:
  resourceName: nic2vfio
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  priority: 99
  mtu: 9000
  numVfs: 4
  nicSelector:
    vendor: "8086"
    pfNames: ['ens803f0#0-0']
  deviceType: vfio-pci
  isRdma: false
```
3.

Actual results:
More than 1 worker node was drained and set to 'unschedulable' in parallel.

Expected results:
All worker nodes shall be drained one by one in sequence. 

Additional info:
The workaround is to apply the second policy after the first one was fully synced on all the nodes.

--- Additional comment from Peng Liu on 2020-07-01 02:58:48 UTC ---

Fixed in PR https://github.com/openshift/sriov-network-operator/pull/249 and https://github.com/openshift/sriov-network-operator/pull/260

Comment 4 zhaozhanqi 2020-07-20 01:44:46 UTC

Verified this bug on 4.5.0-202007172106.p0

when creating policy with 2 nodes matched. the node will SchedulingDisabled one by one.

Comment 6 errata-xmlrpc 2020-07-22 12:20:42 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2956

Note You need to log in before you can comment on or make changes to this bug.