Bug 2030686 - must-gather | missing SRIOV namespace subdir under collected dir
Summary: must-gather | missing SRIOV namespace subdir under collected dir
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Networking
Version: 4.10.0
Hardware: Unspecified
OS: Unspecified
low
medium
Target Milestone: ---
: 4.10.0
Assignee: oshoval
QA Contact: Meni Yakove
URL:
Whiteboard:
Depends On: 2048960
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-12-09 13:37 UTC by ibesso
Modified: 2022-03-16 15:57 UTC (History)
4 users (show)

Fixed In Version: 4.10.0-103
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-16 15:57:01 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt must-gather pull 119 0 None Merged sriov: Update sriov namespace 2022-01-30 15:43:03 UTC
Red Hat Product Errata RHSA-2022:0947 0 None None None 2022-03-16 15:57:17 UTC

Description ibesso 2021-12-09 13:37:14 UTC
Description of problem:
----------------------
Following one of the automation tests examining SRIOV logs, looking for certain files under the must-gather dump dir and the namespaces dir with the SRIOV dedicated namespace.
Examining the namespaces dir, there is no "openshift-sriov-network-operator" subdir.


Version-Release number of selected component (if applicable):
------------------------------------------------------------
4.10.0-432


How reproducible:
----------------
100%


Steps to Reproduce:
------------------
1. Run the default must-gather command:
oc adm must-gather --image=registry.redhat.io/container-native-virtualization/cnv-must-gather-rhel8@sha256:4793d2331f033f734c12bb3d6784e2cf2efdbfde26d99dc5e77ce7c2e544c4c3 --dest-dir=/tmp/pytest/must_gather0

2. examine the following subdir:
..../must_gather0/quay-io-openshift-cnv-container-native-virtualization-cnv-must-gather-rhel8-sha256-71ea2a2d72e63642b6fdea3cead532d6a23376f3f9ecf023c6c2bed302c66bfe/namespaces/


Actual results:
--------------
1. No namespace subdir for the SRIOV namespace (openshift-sriov-network-operator).
2. The test attempts to locate the following file:
...../namespaces/openshift-sriov-network-operator/pods/sriov-device-plugin-jmsxb/sriov-device-plugin/sriov-device-plugin/logs/current.log


Expected results:
----------------
all SRIOV log files should be collected.


Additional info:
---------------
There are several sriov-related files found under the must-gather dump dir:
......../namespaces/default/k8s.cni.cncf.io/network-attachment-definitions/sriov-network.yaml
......../nodes/cnv-qe-infra-28.cnvqe2.lab.eng.rdu2.redhat.com/sys_sriov_numvfs
......../nodes/cnv-qe-infra-28.cnvqe2.lab.eng.rdu2.redhat.com/sys_sriov_totalvfs

the sriov operator pod is running as well as the related pods within the correct namespace (reproduced with a BM cluster running 4.10.0-439):

$ ll tests-collected-info/must_gather/registry-redhat-io-container-native-virtualization-cnv-must-gather-rhel8-sha256-71ea2a2d72e63642b6fdea3cead532d6a23376f3f9ecf023c6c2bed302c66bfe/namespaces/
total 16
drwxr-xr-x.  3 cnv-qe-jenkins cnv-qe-jenkins   29 Dec 19 15:00 default
drwxr-xr-x.  4 cnv-qe-jenkins cnv-qe-jenkins   36 Dec 19 15:00 node-gather-unprivileged
drwxr-xr-x.  4 cnv-qe-jenkins cnv-qe-jenkins   49 Dec 19 15:00 openshift
drwxr-xr-x. 23 cnv-qe-jenkins cnv-qe-jenkins 4096 Dec 19 15:00 openshift-cnv
drwxr-xr-x.  3 cnv-qe-jenkins cnv-qe-jenkins   34 Dec 19 15:00 openshift-machine-api
drwxr-xr-x. 19 cnv-qe-jenkins cnv-qe-jenkins 4096 Dec 19 15:00 openshift-marketplace
drwxr-xr-x. 18 cnv-qe-jenkins cnv-qe-jenkins 4096 Dec 19 15:00 openshift-operator-lifecycle-manager
drwxr-xr-x. 18 cnv-qe-jenkins cnv-qe-jenkins 4096 Dec 19 15:00 openshift-sdn
drwxr-xr-x.  3 cnv-qe-jenkins cnv-qe-jenkins   18 Dec 19 15:00 openshift-storage
drwxr-xr-x.  3 cnv-qe-jenkins cnv-qe-jenkins   32 Dec 19 15:00 openshift-virtualization-os-images

$ oc get pod -A |grep sriov
openshift-sriov-network-operator                   network-resources-injector-6xsdh                                   1/1     Running     0                5d22h
openshift-sriov-network-operator                   network-resources-injector-t72zn                                   1/1     Running     0                5d22h
openshift-sriov-network-operator                   network-resources-injector-xdcxm                                   1/1     Running     0                5d22h
openshift-sriov-network-operator                   sriov-device-plugin-4pt9p                                          1/1     Running     0                67m
openshift-sriov-network-operator                   sriov-device-plugin-z2qm7                                          1/1     Running     0                67m
openshift-sriov-network-operator                   sriov-network-config-daemon-24fnr                                  3/3     Running     0                5d21h
openshift-sriov-network-operator                   sriov-network-config-daemon-4s4kh                                  3/3     Running     6                5d22h
openshift-sriov-network-operator                   sriov-network-config-daemon-7c9mc                                  3/3     Running     0                5d21h
openshift-sriov-network-operator                   sriov-network-config-daemon-b7s8l                                  3/3     Running     9                5d22h
openshift-sriov-network-operator                   sriov-network-config-daemon-cwxpk                                  3/3     Running     0                5d21h
openshift-sriov-network-operator                   sriov-network-config-daemon-j4m89                                  3/3     Running     6                5d22h
openshift-sriov-network-operator                   sriov-network-operator-588f484747-s5hzb                            1/1     Running     0                5d22h
$

Comment 1 Petr Horáček 2021-12-16 09:48:53 UTC
Was the SR-IOV operator installed in the cluster?

Comment 2 ibesso 2021-12-16 10:01:01 UTC
@phoracek , it was (CSV - succeeded, pods - running) , as I had several must-gather tests passing.
I don't have the BM cluster or the terminal buffer I used to access it, but if necessary, I will have one redeployed and add the relevant output.

Comment 3 Petr Horáček 2021-12-16 12:34:25 UTC
To which CSV are you referring CNV or SR-IOV?

Comment 4 ibesso 2021-12-16 12:59:21 UTC
I referred to SR-IOV CSV, but CNV CSV was also in Succeeded.
Petr, please let me know if I should redeploy and update the bug.

Comment 5 Petr Horáček 2021-12-17 09:12:32 UTC
It would be helpful if you redeployed, confirmed that the operator is running the expected namespace (openshift-sriov-network-operator) and after running must-gather, checked the logs of it, to see what in the gathering has failed.

Comment 6 ibesso 2021-12-19 15:06:58 UTC
@phoracek , I reproduced and observed that the operator is running:

$ ll tests-collected-info/must_gather/registry-redhat-io-container-native-virtualization-cnv-must-gather-rhel8-sha256-71ea2a2d72e63642b6fdea3cead532d6a23376f3f9ecf023c6c2bed302c66bfe/namespaces/
total 16
drwxr-xr-x.  3 cnv-qe-jenkins cnv-qe-jenkins   29 Dec 19 15:00 default
drwxr-xr-x.  4 cnv-qe-jenkins cnv-qe-jenkins   36 Dec 19 15:00 node-gather-unprivileged
drwxr-xr-x.  4 cnv-qe-jenkins cnv-qe-jenkins   49 Dec 19 15:00 openshift
drwxr-xr-x. 23 cnv-qe-jenkins cnv-qe-jenkins 4096 Dec 19 15:00 openshift-cnv
drwxr-xr-x.  3 cnv-qe-jenkins cnv-qe-jenkins   34 Dec 19 15:00 openshift-machine-api
drwxr-xr-x. 19 cnv-qe-jenkins cnv-qe-jenkins 4096 Dec 19 15:00 openshift-marketplace
drwxr-xr-x. 18 cnv-qe-jenkins cnv-qe-jenkins 4096 Dec 19 15:00 openshift-operator-lifecycle-manager
drwxr-xr-x. 18 cnv-qe-jenkins cnv-qe-jenkins 4096 Dec 19 15:00 openshift-sdn
drwxr-xr-x.  3 cnv-qe-jenkins cnv-qe-jenkins   18 Dec 19 15:00 openshift-storage
drwxr-xr-x.  3 cnv-qe-jenkins cnv-qe-jenkins   32 Dec 19 15:00 openshift-virtualization-os-images
[cnv-qe-jenkins@cnvqe-01 master_cnv-tests]$ oc get pod -A |grep sriov
openshift-sriov-network-operator                   network-resources-injector-6xsdh                                   1/1     Running     0                5d22h
openshift-sriov-network-operator                   network-resources-injector-t72zn                                   1/1     Running     0                5d22h
openshift-sriov-network-operator                   network-resources-injector-xdcxm                                   1/1     Running     0                5d22h
openshift-sriov-network-operator                   sriov-device-plugin-4pt9p                                          1/1     Running     0                67m
openshift-sriov-network-operator                   sriov-device-plugin-z2qm7                                          1/1     Running     0                67m
openshift-sriov-network-operator                   sriov-network-config-daemon-24fnr                                  3/3     Running     0                5d21h
openshift-sriov-network-operator                   sriov-network-config-daemon-4s4kh                                  3/3     Running     6                5d22h
openshift-sriov-network-operator                   sriov-network-config-daemon-7c9mc                                  3/3     Running     0                5d21h
openshift-sriov-network-operator                   sriov-network-config-daemon-b7s8l                                  3/3     Running     9                5d22h
openshift-sriov-network-operator                   sriov-network-config-daemon-cwxpk                                  3/3     Running     0                5d21h
openshift-sriov-network-operator                   sriov-network-config-daemon-j4m89                                  3/3     Running     6                5d22h
openshift-sriov-network-operator                   sriov-network-operator-588f484747-s5hzb                            1/1     Running     0                5d22h
$ 

Editing my description with this update.

Comment 7 Petr Horáček 2022-01-06 14:22:03 UTC
I see. Thanks Issac.

Comment 8 oshoval 2022-01-24 07:32:31 UTC
Proposed https://github.com/kubevirt/must-gather/pull/119
updating the sriov namespace that we want to collect from sriov-network-operator
to the actual name openshift-sriov-network-operator

Comment 9 Yossi Segev 2022-02-01 11:40:20 UTC
Verified on a cluster with SR-IOV, with the following components
Client (oc) Version: 4.8.0-202106281541.p0.git.1077b05.assembly.stream-1077b05
Server Version: 4.10.0-fc.4
Kubernetes Version: v1.23.0+d30ebbc
CNV must-gather: cnv-must-gather-rhel8:v4.10.0-105
CNV: v4.10.0-636


1. Find the URL of the CNV must-gather image in CNV CSV:
$ oc get csv -n openshift-cnv kubevirt-hyperconverged-operator.v4.10.0 -oyaml | less

Search for the must-gather image:
registry.redhat.io/container-native-virtualization/cnv-must-gather-rhel8@sha256:e3bb1c448fb13cde927f4b7f4a200de6fab151928722e21aeabba1d127513874

3. Run must-gather using the CNV image:
$ oc adm must-gather --image=registry.redhat.io/container-native-virtualization/cnv-must-gather-rhel8@sha256:e3bb1c448fb13cde927f4b7f4a200de6fab151928722e21aeabba1d127513874 --dest-dir=yossi/mg-out

4. Verify the SRiIOV namespace diectory exists, and has contetns:
[cnv-qe-jenkins@cnv-qe-01 ~]$ ll yossi/mg-out/registry-redhat-io-container-native-virtualization-cnv-must-gather-rhel8-sha256-e3bb1c448fb13cde927f4b7f4a200de6fab151928722e21aeabba1d127513874/namespaces/openshift-sriov-network-operator/
total 8
drwxr-xr-x.  2 cnv-qe-jenkins cnv-qe-jenkins  102 Feb  1 11:30 apps
drwxr-xr-x.  2 cnv-qe-jenkins cnv-qe-jenkins   36 Feb  1 11:30 apps.openshift.io
drwxr-xr-x.  2 cnv-qe-jenkins cnv-qe-jenkins   43 Feb  1 11:30 autoscaling
drwxr-xr-x.  2 cnv-qe-jenkins cnv-qe-jenkins   44 Feb  1 11:30 batch
drwxr-xr-x.  2 cnv-qe-jenkins cnv-qe-jenkins   50 Feb  1 11:30 build.openshift.io
drwxr-xr-x.  2 cnv-qe-jenkins cnv-qe-jenkins   82 Feb  1 11:30 cdi.kubevirt.io
drwxr-xr-x.  2 cnv-qe-jenkins cnv-qe-jenkins  198 Feb  1 11:30 core
drwxr-xr-x.  2 cnv-qe-jenkins cnv-qe-jenkins   33 Feb  1 11:30 discovery.k8s.io
drwxr-xr-x.  2 cnv-qe-jenkins cnv-qe-jenkins   40 Feb  1 11:30 flavor.kubevirt.io
drwxr-xr-x.  2 cnv-qe-jenkins cnv-qe-jenkins   34 Feb  1 11:30 hco.kubevirt.io
drwxr-xr-x.  2 cnv-qe-jenkins cnv-qe-jenkins   31 Feb  1 11:30 image.openshift.io
drwxr-xr-x.  2 cnv-qe-jenkins cnv-qe-jenkins  225 Feb  1 11:30 kubevirt.io
-rwxr-xr-x.  1 cnv-qe-jenkins cnv-qe-jenkins  567 Feb  1 11:29 openshift-sriov-network-operator.yaml
drwxr-xr-x. 12 cnv-qe-jenkins cnv-qe-jenkins 4096 Feb  1 11:30 pods
drwxr-xr-x.  2 cnv-qe-jenkins cnv-qe-jenkins   39 Feb  1 11:30 policy
drwxr-xr-x.  2 cnv-qe-jenkins cnv-qe-jenkins   38 Feb  1 11:30 pool.kubevirt.io
drwxr-xr-x.  2 cnv-qe-jenkins cnv-qe-jenkins   25 Feb  1 11:30 route.openshift.io
drwxr-xr-x.  2 cnv-qe-jenkins cnv-qe-jenkins  120 Feb  1 11:30 snapshot.kubevirt.io
[cnv-qe-jenkins@cnv-qe-01 ~]$ 
[cnv-qe-jenkins@cnv-qe-01 ~]$ 
[cnv-qe-jenkins@cnv-qe-01 ~]$ du -hs yossi/mg-out/registry-redhat-io-container-native-virtualization-cnv-must-gather-rhel8-sha256-e3bb1c448fb13cde927f4b7f4a200de6fab151928722e21aeabba1d127513874/namespaces/openshift-sriov-network-operator/
62M	yossi/mg-out/registry-redhat-io-container-native-virtualization-cnv-must-gather-rhel8-sha256-e3bb1c448fb13cde927f4b7f4a200de6fab151928722e21aeabba1d127513874/namespaces/openshift-sriov-network-operator/
[cnv-qe-jenkins@cnv-qe-01 ~]$

Comment 14 errata-xmlrpc 2022-03-16 15:57:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.10.0 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0947


Note You need to log in before you can comment on or make changes to this bug.