Bug 2027420 - [SNO] SR-IOV operator fails to install after CNV is installed
Summary: [SNO] SR-IOV operator fails to install after CNV is installed
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Virtualization
Version: 4.10.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.10.0
Assignee: Jed Lejosne
QA Contact: Kedar Bidarkar
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-11-29 15:36 UTC by Kedar Bidarkar
Modified: 2022-03-16 15:57 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-16 15:57:01 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2022:0947 0 None None None 2022-03-16 15:57:17 UTC

Description Kedar Bidarkar 2021-11-29 15:36:17 UTC
Description of problem:

Observation with installing SNO setup is SR-IOV operator installation:

]$ oc logs daemonset/sriov-network-config-daemon     --namespace=openshift-sriov-network-operator     --container=sriov-network-config-daemon
...
I1125 20:16:32.376476 2285905 daemon.go:133] evicting pod openshift-cnv/virt-api-5b9d4b6767-n8bc4
I1125 20:16:32.376576 2285905 daemon.go:133] evicting pod openshift-cnv/virt-controller-8464dfc565-9ch8s
E1125 20:16:32.384313 2285905 daemon.go:133] error when evicting pods/"virt-controller-8464dfc565-9ch8s" -n "openshift-cnv" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
E1125 20:16:32.384381 2285905 daemon.go:133] error when evicting pods/"virt-api-5b9d4b6767-n8bc4" -n "openshift-cnv" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
I1125 20:16:37.384840 2285905 daemon.go:133] evicting pod openshift-cnv/virt-api-5b9d4b6767-n8bc4
I1125 20:16:37.384907 2285905 daemon.go:133] evicting pod openshift-cnv/virt-controller-8464dfc565-9ch8s
E1125 20:16:37.391466 2285905 daemon.go:133] error when evicting pods/"virt-api-5b9d4b6767-n8bc4" -n "openshift-cnv" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
E1125 20:16:37.391480 2285905 daemon.go:133] error when evicting pods/"virt-controller-8464dfc565-9ch8s" -n "openshift-cnv" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
I1125 20:16:42.117093 2285905 daemon.go:312] Run(): period refresh
I1125 20:16:42.120240 2285905 daemon.go:972] tryCreateSwitchdevUdevRule()
I1125 20:16:42.120286 2285905 daemon.go:1030] tryCreateNMUdevRule()
I1125 20:16:42.392461 2285905 daemon.go:133] evicting pod openshift-cnv/virt-controller-8464dfc565-9ch8s
I1125 20:16:42.392560 2285905 daemon.go:133] evicting pod openshift-cnv/virt-api-5b9d4b6767-n8bc4
E1125 20:16:42.398659 2285905 daemon.go:133] error when evicting pods/"virt-api-5b9d4b6767-n8bc4" -n "openshift-cnv" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
E1125 20:16:42.398658 2285905 daemon.go:133] error when evicting pods/"virt-controller-8464dfc565-9ch8s" -n "openshift-cnv" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.


Version-Release number of selected component (if applicable):
CNV-4.10 on SNO

How reproducible:
Always

Steps to Reproduce:
1. Install OCP in SNO
2. Install CNV in SNO
3. Install SR-IOV operator

Actual results:
SR-IOV operator fails to install.

E1125 20:16:42.398659 2285905 daemon.go:133] error when evicting pods/"virt-api-5b9d4b6767-n8bc4" -n "openshift-cnv" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
E1125 20:16:42.398658 2285905 daemon.go:133] error when evicting pods/"virt-controller-8464dfc565-9ch8s" -n "openshift-cnv" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.

Expected results:
SR-IOV operator installation is successfully installed.



Additional info:

a) Installing the SR-IOV operator is successful before installation of CNV
b) Installing the SR-IOV operator after CNV Installation fails.

Comment 1 Petr Horáček 2021-11-29 15:40:19 UTC
The operator reboots nodes as part of the configuration. Due to virt components refusing to move out of the node, it cannot perform that operation.

Comment 2 sgott 2021-11-29 16:11:36 UTC
Jed, I assume this is due to the fact that we still have 2 replicas and PDB's present?

Comment 4 Kedar Bidarkar 2021-12-13 19:46:15 UTC
]$  oc logs daemonset/sriov-network-config-daemon     --namespace=openshift-sriov-network-operator     --container=sriov-network-config-daemon | grep "error when evicting"
]$ oc get nodes
NAME                                             STATUS   ROLES           AGE     VERSION
node-23.cnvqe.redhat.com   Ready    master,worker   3h18m   v1.22.1+6859754
]$ 

---
 
+ sed /home/kbidarka/git_world/cnv-qe-automation/ocp/bm/sriov/10_sriov_network_node_policy_cr.yaml -e 's/^\( \+pfNames\): .*/\1: ["eno1"]/' -e 's/^\( \+rootDevices\): .*/\1: ["0000:19:00.0"]/' -e 's/^\( \+numVfs\): .*/\1: 32/'
+ oc create --filename=-
sriovnetworknodepolicy.sriovnetwork.openshift.io/sriov-network-policy created
+ oc create --filename=/home/kbidarka/git_world/cnv-qe-automation/ocp/bm/sriov/11_sriov_network_cr.yaml
sriovnetwork.sriovnetwork.openshift.io/sriov-network created

---
 
]$ oc get vmi vm-rhel84-nfs3 -o yaml | grep -A 4 interfaces
  interfaces:
  - interfaceName: eth0
    mac: 02:0c:a5:00:00:00
    name: sriov-net

[kbidarka@localhost sriov]$ virtctl console vm-rhel84-nfs3
Successfully connected to vm-rhel84-nfs3 console. The escape sequence is ^]

Red Hat Enterprise Linux 8.4 (Ootpa)
Kernel 4.18.0-305.30.1.el8_4.x86_64 on an x86_64

Activate the web console with: systemctl enable --now cockpit.socket

vm-rhel84-nfs3 login: cloud-user
Password: 
[cloud-user@vm-rhel84-nfs3 ~]$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 02:0c:a5:00:00:00 brd ff:ff:ff:ff:ff:ff
    inet xx.yy.zz.aa/24 brd xx.yy.zz.255 scope global dynamic noprefixroute eth0
       valid_lft 1769sec preferred_lft 1769sec


[cloud-user@vm-rhel84-nfs3 ~]$ ping google.com -4
PING google.com (142.250.188.206) 56(84) bytes of data.
64 bytes from iad23s94-in-f14.1e100.net (142.250.188.206): icmp_seq=1 ttl=54 time=7.68 ms
64 bytes from iad23s94-in-f14.1e100.net (142.250.188.206): icmp_seq=2 ttl=54 time=7.60 ms

--- google.com ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3005ms
rtt min/avg/max/mdev = 7.601/7.683/7.755/0.054 ms
[cloud-user@vm-rhel84-nfs3 ~]$ 

---
Once this bug got fixed, https://bugzilla.redhat.com/show_bug.cgi?id=2026336
This bug automatically got fixed too.
---

VERIFIED with the build,
"image": "registry-proxy.engineering.redhat.com/rh-osbs/iib:146913",
"hcoVersion": "v4.10.0-464"

Comment 9 errata-xmlrpc 2022-03-16 15:57:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.10.0 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0947


Note You need to log in before you can comment on or make changes to this bug.