Bug 1852648

Summary:	[sriov] Nodes are drained simultaneously if 2 policies are applied at the same time
Product:	OpenShift Container Platform	Reporter:	Peng Liu <pliu>
Component:	Networking	Assignee:	Peng Liu <pliu>
Networking sub component:	SR-IOV	QA Contact:	zhaozhanqi <zzhao>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	high
Priority:	unspecified	CC:	zshi, zzhao
Version:	4.5
Target Milestone:	---
Target Release:	4.5.z
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	1852647	Environment:
Last Closed:	2020-07-22 12:20:42 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1852647
Bug Blocks:

Description Peng Liu 2020-07-01 03:01:31 UTC

+++ This bug was initially created as a clone of Bug #1852647 +++

Description of problem:
Nodes are drained in parallel if 2 policies are applied at the same time

Version-Release number of selected component (if applicable):
4.5

How reproducible:

Steps to Reproduce:
1. Deploy sriov network operator on a cluster with at least 2 sriov capable worker nodes

2. Apply the following policies together

```
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: policy-net-2
spec:
  resourceName: nic2
  nodeSelector:
    kubernetes.io/hostname: worker-0
    feature.node.kubernetes.io/network-sriov.capable: "true"
  priority: 99
  mtu: 9000
  numVfs: 4
  nicSelector:
    pfNames: ['ens803f0#0-0']
  isRdma: false
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: policy-net-2-vfio
spec:
  resourceName: nic2vfio
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  priority: 99
  mtu: 9000
  numVfs: 4
  nicSelector:
    vendor: "8086"
    pfNames: ['ens803f0#0-0']
  deviceType: vfio-pci
  isRdma: false
```
3.

Actual results:
More than 1 worker node was drained and set to 'unschedulable' in parallel.

Expected results:
All worker nodes shall be drained one by one in sequence. 

Additional info:
The workaround is to apply the second policy after the first one was fully synced on all the nodes.

--- Additional comment from Peng Liu on 2020-07-01 02:58:48 UTC ---

Fixed in PR https://github.com/openshift/sriov-network-operator/pull/249 and https://github.com/openshift/sriov-network-operator/pull/260

Comment 4 zhaozhanqi 2020-07-20 01:44:46 UTC

Verified this bug on 4.5.0-202007172106.p0

when creating policy with 2 nodes matched. the node will SchedulingDisabled one by one.

Comment 6 errata-xmlrpc 2020-07-22 12:20:42 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2956