Bug 1852648

Summary: [sriov] Nodes are drained simultaneously if 2 policies are applied at the same time
Product: OpenShift Container Platform Reporter: Peng Liu <pliu>
Component: NetworkingAssignee: Peng Liu <pliu>
Networking sub component: SR-IOV QA Contact: zhaozhanqi <zzhao>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: unspecified CC: zshi, zzhao
Version: 4.5   
Target Milestone: ---   
Target Release: 4.5.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1852647 Environment:
Last Closed: 2020-07-22 12:20:42 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1852647    
Bug Blocks:    

Description Peng Liu 2020-07-01 03:01:31 UTC
+++ This bug was initially created as a clone of Bug #1852647 +++

Description of problem:
Nodes are drained in parallel if 2 policies are applied at the same time

Version-Release number of selected component (if applicable):
4.5

How reproducible:

Steps to Reproduce:
1. Deploy sriov network operator on a cluster with at least 2 sriov capable worker nodes

2. Apply the following policies together

```
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: policy-net-2
spec:
  resourceName: nic2
  nodeSelector:
    kubernetes.io/hostname: worker-0
    feature.node.kubernetes.io/network-sriov.capable: "true"
  priority: 99
  mtu: 9000
  numVfs: 4
  nicSelector:
    pfNames: ['ens803f0#0-0']
  isRdma: false
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: policy-net-2-vfio
spec:
  resourceName: nic2vfio
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  priority: 99
  mtu: 9000
  numVfs: 4
  nicSelector:
    vendor: "8086"
    pfNames: ['ens803f0#0-0']
  deviceType: vfio-pci
  isRdma: false
```
3.

Actual results:
More than 1 worker node was drained and set to 'unschedulable' in parallel.

Expected results:
All worker nodes shall be drained one by one in sequence. 

Additional info:
The workaround is to apply the second policy after the first one was fully synced on all the nodes.

--- Additional comment from Peng Liu on 2020-07-01 02:58:48 UTC ---

Fixed in PR https://github.com/openshift/sriov-network-operator/pull/249 and https://github.com/openshift/sriov-network-operator/pull/260

Comment 4 zhaozhanqi 2020-07-20 01:44:46 UTC
Verified this bug on 4.5.0-202007172106.p0

when creating policy with 2 nodes matched. the node will SchedulingDisabled one by one.

Comment 6 errata-xmlrpc 2020-07-22 12:20:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2956