Description of problem: CNV must-gather image fails running on our (CNV QE) BM-RHCOS environment. The default must-gather image runs successfully on the env, and in addition - the same CNV image runs successfully on PSI clusters which are constructed of VM nodes. Version-Release number of selected component (if applicable): CNV v2.4.0 $ oc version Client Version: 4.4.3 Server Version: 4.5.0-rc.2 Kubernetes Version: v1.18.3+91d0edd CNV must-gather: registry-proxy.engineering.redhat.com/rh-osbs/container-native-virtualization-cnv-must-gather-rhel8:v2.4.0 How reproducible: Always Steps to Reproduce: 1. On CNV-QE's BM-RHCOS machine (10.0.98.16) - run CNV must-gather: [cnv-qe-jenkins@cnv-executor-bm-rhcos ~]$ oc adm must-gather --image=registry-proxy.engineering.redhat.com/rh-osbs/container-native-virtualization-cnv-must-gather-rhel8:v2.4.0 --dest-dir=/home/cnv-qe-jenkins/yossi/mg Actual results: The following output, and eventually nothing happens (no data is collected, not even dest-dir is created): [cnv-qe-jenkins@cnv-executor-bm-rhcos cnv-tests]$ oc adm must-gather --image=registry-proxy.engineering.redhat.com/rh-osbs/container-native-virtualization-cnv-must-gather-rhel8:v2.4.0 --dest-dir=/home/cnv-qe-jenkins/yossi/mg [must-gather ] OUT Using must-gather plugin-in image: registry-proxy.engineering.redhat.com/rh-osbs/container-native-virtualization-cnv-must-gather-rhel8:v2.4.0 [must-gather ] OUT namespace/openshift-must-gather-4zlql created [must-gather ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-24zmg created [must-gather ] OUT pod for plug-in image registry-proxy.engineering.redhat.com/rh-osbs/container-native-virtualization-cnv-must-gather-rhel8:v2.4.0 created [must-gather-8bmw5] POD Gathering data for ns/openshift-cnv... [must-gather-8bmw5] POD Wrote inspect data to must-gather. [must-gather-8bmw5] POD Error from server (NotFound): namespaces "kubevirt-hyperconverged" not found [must-gather-8bmw5] POD Gathering data for ns/openshift-operator-lifecycle-manager... [must-gather-8bmw5] POD Wrote inspect data to must-gather. [must-gather-8bmw5] POD Gathering data for ns/openshift-marketplace... [must-gather-8bmw5] POD Wrote inspect data to must-gather. [must-gather-8bmw5] POD Error from server (NotFound): namespaces "cluster-network-addons" not found [must-gather-8bmw5] POD Gathering data for ns/openshift-sdn... [must-gather-8bmw5] POD Wrote inspect data to must-gather. [must-gather-8bmw5] POD Error from server (NotFound): namespaces "sriov-network-operator" not found [must-gather-8bmw5] POD Error from server (NotFound): namespaces "kubevirt-web-ui" not found [must-gather-8bmw5] POD Wrote inspect data to must-gather. [must-gather-8bmw5] POD Wrote inspect data to must-gather. [must-gather-8bmw5] POD Wrote inspect data to must-gather. [must-gather-8bmw5] POD Wrote inspect data to must-gather. [must-gather-8bmw5] POD Wrote inspect data to must-gather. [must-gather-8bmw5] POD Wrote inspect data to must-gather. [must-gather-8bmw5] POD Wrote inspect data to must-gather. [must-gather-8bmw5] POD Wrote inspect data to must-gather. [must-gather-8bmw5] POD Wrote inspect data to must-gather. [must-gather-8bmw5] POD Wrote inspect data to must-gather. [must-gather-8bmw5] POD Error from server (NotFound): namespaces "cdi" not found [must-gather-8bmw5] POD Wrote inspect data to must-gather. [must-gather-8bmw5] POD Wrote inspect data to must-gather. [must-gather-8bmw5] POD Wrote inspect data to must-gather. [must-gather-8bmw5] POD Wrote inspect data to must-gather. [must-gather-8bmw5] POD Wrote inspect data to must-gather. [must-gather-8bmw5] POD Wrote inspect data to must-gather. [must-gather-8bmw5] POD Wrote inspect data to must-gather. [must-gather-8bmw5] POD Wrote inspect data to must-gather. [must-gather-8bmw5] POD Wrote inspect data to must-gather. [must-gather-8bmw5] POD Wrote inspect data to must-gather. [must-gather-8bmw5] POD Wrote inspect data to must-gather. [must-gather-8bmw5] POD Wrote inspect data to must-gather. [must-gather-8bmw5] POD Wrote inspect data to must-gather. [must-gather-8bmw5] POD Wrote inspect data to must-gather. [must-gather-8bmw5] POD Wrote inspect data to must-gather. [must-gather-8bmw5] POD Wrote inspect data to must-gather. [must-gather-8bmw5] POD Wrote inspect data to must-gather. [must-gather-8bmw5] POD Wrote inspect data to must-gather. [must-gather-8bmw5] POD Wrote inspect data to must-gather. [must-gather-8bmw5] POD Wrote inspect data to must-gather. [must-gather-8bmw5] POD Wrote inspect data to must-gather. [must-gather-8bmw5] POD Wrote inspect data to must-gather. [must-gather-8bmw5] POD Wrote inspect data to must-gather. [must-gather-8bmw5] POD Wrote inspect data to must-gather. [must-gather-8bmw5] POD Wrote inspect data to must-gather. [must-gather-8bmw5] POD No resources found [must-gather-8bmw5] POD No resources found [must-gather-8bmw5] POD Error from server (AlreadyExists): error when creating "/etc/node-gather-crd.yaml": namespaces "node-gather" already exists [must-gather-8bmw5] POD Error from server (AlreadyExists): error when creating "/etc/node-gather-crd.yaml": serviceaccounts "node-gather" already exists [must-gather-8bmw5] POD securitycontextconstraints.security.openshift.io/privileged added to: ["system:serviceaccount:node-gather:node-gather"] [must-gather-8bmw5] POD Error from server (AlreadyExists): error when creating "/etc/node-gather-ds.yaml": daemonsets.apps "node-gather-daemonset" already exists [must-gather-8bmw5] OUT gather logs unavailable: unexpected EOF [must-gather-8bmw5] OUT waiting for gather to complete [must-gather-8bmw5] OUT gather never finished: timed out waiting for the condition [must-gather ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-24zmg deleted [must-gather ] OUT namespace/openshift-must-gather-4zlql deleted error: gather never finished for pod must-gather-8bmw5: timed out waiting for the condition Expected results: 1. must-gather finishes successfully 2. dest-dir is created. 3. dest-dir is filled with collected log data Additional info: 1. No error is seen on the must-gather pod description. 2. This was already debugged by Yuval. I quote his resolution from our email correspondence: > Basically, gathering nodes data took too long due to many calls to `oc exec` in order to fetch the sriov information. The PR fixes this by using a single exec call for the sriov, and it also parallelizes the rest of node gathering. It looks like the timeouts happened actually in `oc must-gather` and not in our code simply because it didn't have any data to read for quite a long time (they have a timeout on read), so I added some verbosity in places that I couldn't see an obvious optimization. 3. Yuval has already submitted a fixing PR: https://github.com/kubevirt/must-gather/pull/68 He created a local image with this fix (quay.io/yuvalturg/must-gather:latest). I have tested this image (oc adm must-gather --image=quay.io/yuvalturg/must-gather:latest), and it seems to solve the issue. 4. Thank you very much Yuval!
Verified using the same setup as in the original reproduction scenario: On CNV-QE's bm-rhcos, Running the same command: cnv-qe-jenkins@cnv-executor-bm-rhcos ~]$ oc adm must-gather --image=registry-proxy.engineering.redhat.com/rh-osbs/container-native-virtualization-cnv-must-gather-rhel8:v2.4.0 --dest-dir=/home/cnv-qe-jenkins/yossi/mg
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:3194