Description of problem: ---------------------- Following one of the automation tests examining SRIOV logs, looking for certain files under the must-gather dump dir and the namespaces dir with the SRIOV dedicated namespace. Examining the namespaces dir, there is no "openshift-sriov-network-operator" subdir. Version-Release number of selected component (if applicable): ------------------------------------------------------------ 4.10.0-432 How reproducible: ---------------- 100% Steps to Reproduce: ------------------ 1. Run the default must-gather command: oc adm must-gather --image=registry.redhat.io/container-native-virtualization/cnv-must-gather-rhel8@sha256:4793d2331f033f734c12bb3d6784e2cf2efdbfde26d99dc5e77ce7c2e544c4c3 --dest-dir=/tmp/pytest/must_gather0 2. examine the following subdir: ..../must_gather0/quay-io-openshift-cnv-container-native-virtualization-cnv-must-gather-rhel8-sha256-71ea2a2d72e63642b6fdea3cead532d6a23376f3f9ecf023c6c2bed302c66bfe/namespaces/ Actual results: -------------- 1. No namespace subdir for the SRIOV namespace (openshift-sriov-network-operator). 2. The test attempts to locate the following file: ...../namespaces/openshift-sriov-network-operator/pods/sriov-device-plugin-jmsxb/sriov-device-plugin/sriov-device-plugin/logs/current.log Expected results: ---------------- all SRIOV log files should be collected. Additional info: --------------- There are several sriov-related files found under the must-gather dump dir: ......../namespaces/default/k8s.cni.cncf.io/network-attachment-definitions/sriov-network.yaml ......../nodes/cnv-qe-infra-28.cnvqe2.lab.eng.rdu2.redhat.com/sys_sriov_numvfs ......../nodes/cnv-qe-infra-28.cnvqe2.lab.eng.rdu2.redhat.com/sys_sriov_totalvfs the sriov operator pod is running as well as the related pods within the correct namespace (reproduced with a BM cluster running 4.10.0-439): $ ll tests-collected-info/must_gather/registry-redhat-io-container-native-virtualization-cnv-must-gather-rhel8-sha256-71ea2a2d72e63642b6fdea3cead532d6a23376f3f9ecf023c6c2bed302c66bfe/namespaces/ total 16 drwxr-xr-x. 3 cnv-qe-jenkins cnv-qe-jenkins 29 Dec 19 15:00 default drwxr-xr-x. 4 cnv-qe-jenkins cnv-qe-jenkins 36 Dec 19 15:00 node-gather-unprivileged drwxr-xr-x. 4 cnv-qe-jenkins cnv-qe-jenkins 49 Dec 19 15:00 openshift drwxr-xr-x. 23 cnv-qe-jenkins cnv-qe-jenkins 4096 Dec 19 15:00 openshift-cnv drwxr-xr-x. 3 cnv-qe-jenkins cnv-qe-jenkins 34 Dec 19 15:00 openshift-machine-api drwxr-xr-x. 19 cnv-qe-jenkins cnv-qe-jenkins 4096 Dec 19 15:00 openshift-marketplace drwxr-xr-x. 18 cnv-qe-jenkins cnv-qe-jenkins 4096 Dec 19 15:00 openshift-operator-lifecycle-manager drwxr-xr-x. 18 cnv-qe-jenkins cnv-qe-jenkins 4096 Dec 19 15:00 openshift-sdn drwxr-xr-x. 3 cnv-qe-jenkins cnv-qe-jenkins 18 Dec 19 15:00 openshift-storage drwxr-xr-x. 3 cnv-qe-jenkins cnv-qe-jenkins 32 Dec 19 15:00 openshift-virtualization-os-images $ oc get pod -A |grep sriov openshift-sriov-network-operator network-resources-injector-6xsdh 1/1 Running 0 5d22h openshift-sriov-network-operator network-resources-injector-t72zn 1/1 Running 0 5d22h openshift-sriov-network-operator network-resources-injector-xdcxm 1/1 Running 0 5d22h openshift-sriov-network-operator sriov-device-plugin-4pt9p 1/1 Running 0 67m openshift-sriov-network-operator sriov-device-plugin-z2qm7 1/1 Running 0 67m openshift-sriov-network-operator sriov-network-config-daemon-24fnr 3/3 Running 0 5d21h openshift-sriov-network-operator sriov-network-config-daemon-4s4kh 3/3 Running 6 5d22h openshift-sriov-network-operator sriov-network-config-daemon-7c9mc 3/3 Running 0 5d21h openshift-sriov-network-operator sriov-network-config-daemon-b7s8l 3/3 Running 9 5d22h openshift-sriov-network-operator sriov-network-config-daemon-cwxpk 3/3 Running 0 5d21h openshift-sriov-network-operator sriov-network-config-daemon-j4m89 3/3 Running 6 5d22h openshift-sriov-network-operator sriov-network-operator-588f484747-s5hzb 1/1 Running 0 5d22h $
Was the SR-IOV operator installed in the cluster?
@phoracek , it was (CSV - succeeded, pods - running) , as I had several must-gather tests passing. I don't have the BM cluster or the terminal buffer I used to access it, but if necessary, I will have one redeployed and add the relevant output.
To which CSV are you referring CNV or SR-IOV?
I referred to SR-IOV CSV, but CNV CSV was also in Succeeded. Petr, please let me know if I should redeploy and update the bug.
It would be helpful if you redeployed, confirmed that the operator is running the expected namespace (openshift-sriov-network-operator) and after running must-gather, checked the logs of it, to see what in the gathering has failed.
@phoracek , I reproduced and observed that the operator is running: $ ll tests-collected-info/must_gather/registry-redhat-io-container-native-virtualization-cnv-must-gather-rhel8-sha256-71ea2a2d72e63642b6fdea3cead532d6a23376f3f9ecf023c6c2bed302c66bfe/namespaces/ total 16 drwxr-xr-x. 3 cnv-qe-jenkins cnv-qe-jenkins 29 Dec 19 15:00 default drwxr-xr-x. 4 cnv-qe-jenkins cnv-qe-jenkins 36 Dec 19 15:00 node-gather-unprivileged drwxr-xr-x. 4 cnv-qe-jenkins cnv-qe-jenkins 49 Dec 19 15:00 openshift drwxr-xr-x. 23 cnv-qe-jenkins cnv-qe-jenkins 4096 Dec 19 15:00 openshift-cnv drwxr-xr-x. 3 cnv-qe-jenkins cnv-qe-jenkins 34 Dec 19 15:00 openshift-machine-api drwxr-xr-x. 19 cnv-qe-jenkins cnv-qe-jenkins 4096 Dec 19 15:00 openshift-marketplace drwxr-xr-x. 18 cnv-qe-jenkins cnv-qe-jenkins 4096 Dec 19 15:00 openshift-operator-lifecycle-manager drwxr-xr-x. 18 cnv-qe-jenkins cnv-qe-jenkins 4096 Dec 19 15:00 openshift-sdn drwxr-xr-x. 3 cnv-qe-jenkins cnv-qe-jenkins 18 Dec 19 15:00 openshift-storage drwxr-xr-x. 3 cnv-qe-jenkins cnv-qe-jenkins 32 Dec 19 15:00 openshift-virtualization-os-images [cnv-qe-jenkins@cnvqe-01 master_cnv-tests]$ oc get pod -A |grep sriov openshift-sriov-network-operator network-resources-injector-6xsdh 1/1 Running 0 5d22h openshift-sriov-network-operator network-resources-injector-t72zn 1/1 Running 0 5d22h openshift-sriov-network-operator network-resources-injector-xdcxm 1/1 Running 0 5d22h openshift-sriov-network-operator sriov-device-plugin-4pt9p 1/1 Running 0 67m openshift-sriov-network-operator sriov-device-plugin-z2qm7 1/1 Running 0 67m openshift-sriov-network-operator sriov-network-config-daemon-24fnr 3/3 Running 0 5d21h openshift-sriov-network-operator sriov-network-config-daemon-4s4kh 3/3 Running 6 5d22h openshift-sriov-network-operator sriov-network-config-daemon-7c9mc 3/3 Running 0 5d21h openshift-sriov-network-operator sriov-network-config-daemon-b7s8l 3/3 Running 9 5d22h openshift-sriov-network-operator sriov-network-config-daemon-cwxpk 3/3 Running 0 5d21h openshift-sriov-network-operator sriov-network-config-daemon-j4m89 3/3 Running 6 5d22h openshift-sriov-network-operator sriov-network-operator-588f484747-s5hzb 1/1 Running 0 5d22h $ Editing my description with this update.
I see. Thanks Issac.
Proposed https://github.com/kubevirt/must-gather/pull/119 updating the sriov namespace that we want to collect from sriov-network-operator to the actual name openshift-sriov-network-operator
Verified on a cluster with SR-IOV, with the following components Client (oc) Version: 4.8.0-202106281541.p0.git.1077b05.assembly.stream-1077b05 Server Version: 4.10.0-fc.4 Kubernetes Version: v1.23.0+d30ebbc CNV must-gather: cnv-must-gather-rhel8:v4.10.0-105 CNV: v4.10.0-636 1. Find the URL of the CNV must-gather image in CNV CSV: $ oc get csv -n openshift-cnv kubevirt-hyperconverged-operator.v4.10.0 -oyaml | less Search for the must-gather image: registry.redhat.io/container-native-virtualization/cnv-must-gather-rhel8@sha256:e3bb1c448fb13cde927f4b7f4a200de6fab151928722e21aeabba1d127513874 3. Run must-gather using the CNV image: $ oc adm must-gather --image=registry.redhat.io/container-native-virtualization/cnv-must-gather-rhel8@sha256:e3bb1c448fb13cde927f4b7f4a200de6fab151928722e21aeabba1d127513874 --dest-dir=yossi/mg-out 4. Verify the SRiIOV namespace diectory exists, and has contetns: [cnv-qe-jenkins@cnv-qe-01 ~]$ ll yossi/mg-out/registry-redhat-io-container-native-virtualization-cnv-must-gather-rhel8-sha256-e3bb1c448fb13cde927f4b7f4a200de6fab151928722e21aeabba1d127513874/namespaces/openshift-sriov-network-operator/ total 8 drwxr-xr-x. 2 cnv-qe-jenkins cnv-qe-jenkins 102 Feb 1 11:30 apps drwxr-xr-x. 2 cnv-qe-jenkins cnv-qe-jenkins 36 Feb 1 11:30 apps.openshift.io drwxr-xr-x. 2 cnv-qe-jenkins cnv-qe-jenkins 43 Feb 1 11:30 autoscaling drwxr-xr-x. 2 cnv-qe-jenkins cnv-qe-jenkins 44 Feb 1 11:30 batch drwxr-xr-x. 2 cnv-qe-jenkins cnv-qe-jenkins 50 Feb 1 11:30 build.openshift.io drwxr-xr-x. 2 cnv-qe-jenkins cnv-qe-jenkins 82 Feb 1 11:30 cdi.kubevirt.io drwxr-xr-x. 2 cnv-qe-jenkins cnv-qe-jenkins 198 Feb 1 11:30 core drwxr-xr-x. 2 cnv-qe-jenkins cnv-qe-jenkins 33 Feb 1 11:30 discovery.k8s.io drwxr-xr-x. 2 cnv-qe-jenkins cnv-qe-jenkins 40 Feb 1 11:30 flavor.kubevirt.io drwxr-xr-x. 2 cnv-qe-jenkins cnv-qe-jenkins 34 Feb 1 11:30 hco.kubevirt.io drwxr-xr-x. 2 cnv-qe-jenkins cnv-qe-jenkins 31 Feb 1 11:30 image.openshift.io drwxr-xr-x. 2 cnv-qe-jenkins cnv-qe-jenkins 225 Feb 1 11:30 kubevirt.io -rwxr-xr-x. 1 cnv-qe-jenkins cnv-qe-jenkins 567 Feb 1 11:29 openshift-sriov-network-operator.yaml drwxr-xr-x. 12 cnv-qe-jenkins cnv-qe-jenkins 4096 Feb 1 11:30 pods drwxr-xr-x. 2 cnv-qe-jenkins cnv-qe-jenkins 39 Feb 1 11:30 policy drwxr-xr-x. 2 cnv-qe-jenkins cnv-qe-jenkins 38 Feb 1 11:30 pool.kubevirt.io drwxr-xr-x. 2 cnv-qe-jenkins cnv-qe-jenkins 25 Feb 1 11:30 route.openshift.io drwxr-xr-x. 2 cnv-qe-jenkins cnv-qe-jenkins 120 Feb 1 11:30 snapshot.kubevirt.io [cnv-qe-jenkins@cnv-qe-01 ~]$ [cnv-qe-jenkins@cnv-qe-01 ~]$ [cnv-qe-jenkins@cnv-qe-01 ~]$ du -hs yossi/mg-out/registry-redhat-io-container-native-virtualization-cnv-must-gather-rhel8-sha256-e3bb1c448fb13cde927f4b7f4a200de6fab151928722e21aeabba1d127513874/namespaces/openshift-sriov-network-operator/ 62M yossi/mg-out/registry-redhat-io-container-native-virtualization-cnv-must-gather-rhel8-sha256-e3bb1c448fb13cde927f4b7f4a200de6fab151928722e21aeabba1d127513874/namespaces/openshift-sriov-network-operator/ [cnv-qe-jenkins@cnv-qe-01 ~]$
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Virtualization 4.10.0 Images security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0947