Bug 1852648 - [sriov] Nodes are drained simultaneously if 2 policies are applied at the same time
Summary: [sriov] Nodes are drained simultaneously if 2 policies are applied at the sam...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.5
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.5.z
Assignee: Peng Liu
QA Contact: zhaozhanqi
URL:
Whiteboard:
Depends On: 1852647
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-07-01 03:01 UTC by Peng Liu
Modified: 2020-07-22 12:21 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1852647
Environment:
Last Closed: 2020-07-22 12:20:42 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift sriov-network-operator pull 278 0 None closed [release-4.5]Bug 1852648: Solve issue of nodes being drained simultaneously 2020-08-07 15:05:46 UTC
Red Hat Product Errata RHBA-2020:2956 0 None None None 2020-07-22 12:21:05 UTC

Description Peng Liu 2020-07-01 03:01:31 UTC
+++ This bug was initially created as a clone of Bug #1852647 +++

Description of problem:
Nodes are drained in parallel if 2 policies are applied at the same time

Version-Release number of selected component (if applicable):
4.5

How reproducible:

Steps to Reproduce:
1. Deploy sriov network operator on a cluster with at least 2 sriov capable worker nodes

2. Apply the following policies together

```
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: policy-net-2
spec:
  resourceName: nic2
  nodeSelector:
    kubernetes.io/hostname: worker-0
    feature.node.kubernetes.io/network-sriov.capable: "true"
  priority: 99
  mtu: 9000
  numVfs: 4
  nicSelector:
    pfNames: ['ens803f0#0-0']
  isRdma: false
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: policy-net-2-vfio
spec:
  resourceName: nic2vfio
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  priority: 99
  mtu: 9000
  numVfs: 4
  nicSelector:
    vendor: "8086"
    pfNames: ['ens803f0#0-0']
  deviceType: vfio-pci
  isRdma: false
```
3.

Actual results:
More than 1 worker node was drained and set to 'unschedulable' in parallel.

Expected results:
All worker nodes shall be drained one by one in sequence. 

Additional info:
The workaround is to apply the second policy after the first one was fully synced on all the nodes.

--- Additional comment from Peng Liu on 2020-07-01 02:58:48 UTC ---

Fixed in PR https://github.com/openshift/sriov-network-operator/pull/249 and https://github.com/openshift/sriov-network-operator/pull/260

Comment 4 zhaozhanqi 2020-07-20 01:44:46 UTC
Verified this bug on 4.5.0-202007172106.p0

when creating policy with 2 nodes matched. the node will SchedulingDisabled one by one.

Comment 6 errata-xmlrpc 2020-07-22 12:20:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2956


Note You need to log in before you can comment on or make changes to this bug.