Description of problem: Some of the node's info is not collected by cnv must gather Version-Release number of selected component (if applicable): Client Version: v4.2.0 Server Version: 4.2.0 Kubernetes Version: v1.14.6+2e5ed54 CNV2.1 How reproducible: Always Steps to Reproduce: 1. Run must gather with cnv must gather image #oc adm must-gather --image=registry-proxy.engineering.redhat.com/rh-osbs/container-native-virtualization-cnv-must-gather-rhel8:v2.1.1-22 2. Check the must gather log Actual results: Some of the node's info is not collected by cnv must gather, eg: lspci, ip, bridge, vlan, var-lib-cni-bin, dev_vfio, dmesg are not collected, only collected kubelet and NetworkManager log $ ls must-gather.local.236191780723349203/registry-proxy-engineering-redhat-com-rh-osbs-container-native-virtualization-cnv-must-gather-rhel8-sha256-4df332d18097850382150327af94935989b11051030dd387d70de9a9179fe544/nodes/cnv-executor-yadu-rhel-worker-1/ cnv-executor-yadu-rhel-worker-1_logs_kubelet cnv-executor-yadu-rhel-worker-1_logs_NetworkManager Expected results: All of node's info should be collected Additional info: Try to run must gather with quay image and it works. #oc adm must-gather --image=quay.io/kubevirt/must-gather $ ls must-gather-log-quay/quay-io-kubevirt-must-gather-sha256-9953dbc7e9ff24fb4089b576cce6603d68e47396f29337cff83fd832aff26877/nodes/cnv-executor-yadu-rhel-worker-1/ bridge dmesg opt-cni-bin cnv-executor-yadu-rhel-worker-1_logs_kubelet etc/ proc_cmdline cnv-executor-yadu-rhel-worker-1_logs_NetworkManager ip.txt var-lib-cni-bin dev_vfio lspci
I asked Marcin to look into it if he finds some spare time. It may be caused by a difference between U/S and D/S, but we don't know yet.
The image is corrupt. The /etc/node-gather-ds.yaml script has an empty image field for the node gathering daemonset: "image:"
There seems to be more than one problem, probably connected to a change in the base image. 1. It looks like the base image for d/s does not include "hostname" (available in u/s centos). This causes node_gather(l.11): POD_NAME=$(oc get pods --field-selector=status.podIP=$(hostname -I) -n $NAMESPACE -o'custom-columns=name:metadata.name' --no-headers) to fail. The field selector is probably redundant, as we only have one pod in this namespace. so this one could be fixed by just removing the field-selector, so line 11 should look like: POD_NAME=$(oc get pods -n $NAMESPACE -o'custom-columns=name:metadata.name' --no-headers)
I don't think this should block 2.1.1, despite being a regression. I hope that Avram could take a look at this when he is back.
Do you know whether all the nodes were running without any disruptions? We relaxed constrains on having node-gather pod running on all the nodes due to issue reported. We saw that when one of the nodes were not stable cnv-must-gather failed with timeout without providing any logs.
Created attachment 1636124 [details] debug instructions
HI, Piotr I checked all the nodes directory under must-gather_xxx/.../.../nodes/ and only logs_kubelet and logs_NetworkManager exist for all the nodes. And run must gather again with upstream image on the same cluster, then all the logs are collected normally. So I think the nodes were running well without any disruptions.
Thank you, Marcin M. mentioned that based image got changed and there are some tools missing which used to be part of the image. We need to understand which are missing and install them accordingly.
Building with the ubi8-minimal and adding hostname to the packages fixes it. Since there can be such a discrepancy between centos:7 (the u/s image) and ubi8-minimal/ubi8, would using the ubi (minimal or not) as both u/s and d/s image and ensuring that hostname is a dependency for both be better long term? ubi8 also is missing hostname so it's not merely an issue of minimal vs standard. If need be, I can easily compile a list of packages present in centos:7 that aren't also in ubi8/ubi8-minimal.
Client Version: 4.3.0-0.nightly-2020-01-11-070223 Server Version: 4.3.0-0.nightly-2020-01-11-070223 Kubernetes Version: v1.16.2 CNV2.2 registry-proxy.engineering.redhat.com/rh-osbs/container-native-virtualization-cnv-must-gather-rhel8:v2.2.0-13 Issue have been fixed. $ ls nodes/host-172-16-0-49/ bridge dev_vfio dmesg etc host-172-16-0-49_logs_kubelet host-172-16-0-49_logs_NetworkManager ip.txt lspci nft-ip-filter nft-ip-nat opt-cni-bin proc_cmdline var-lib-cni-bin vlan
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:0307
Test Environment: ================ [cloud-user@ocp-psi-executor must_gather]$ oc version Client Version: 4.4.0-0.nightly-2020-02-17-022408 Server Version: 4.4.0-0.nightly-2020-03-06-170328 Kubernetes Version: v1.17.1 Test Cases Affected: ==================== 1. https://polarion.engineering.redhat.com/polarion/#/project/CNV/workitem?id=CNV-2732 2. https://polarion.engineering.redhat.com/polarion/#/project/CNV/workitem?id=CNV-2809 Test Scenario: ============== Above two test cases, looks for more information under nodes while running, oc adm must-gather --image=registry-proxy.engineering.redhat.com/rh-osbs/container-native-virtualization-cnv-must-gather-rhel8:v2.3.0-32 --dest-dir=/tmp/pytest-of-cloud-user/pytest-36/must_gather But Nodes folder has NetworkManager and Kublet information. [cloud-user@ocp-psi-executor nodes]$ ls slave-gkapoor-nbrjn-master-0 slave-gkapoor-nbrjn-master-2 slave-gkapoor-nbrjn-worker-x7q8b slave-gkapoor-nbrjn-master-1 slave-gkapoor-nbrjn-worker-bbxzc slave-gkapoor-nbrjn-worker-z7hnj [cloud-user@ocp-psi-executor nodes]$ ls slave-gkapoor-nbrjn-worker-x7q8b slave-gkapoor-nbrjn-worker-x7q8b_logs_kubelet slave-gkapoor-nbrjn-worker-x7q8b_logs_NetworkManager Expected Result: =============== More information needed for ip, nft-ip-filter, nft-ip-nat.
Test Env: ======= $ oc version Client Version: 4.4.0-0.nightly-2020-03-03-195752 Server Version: 4.4.0-0.nightly-2020-03-02-011520 Kubernetes Version: v1.17.1 Few other files seems to be missing and thus lot of Test cases/automation are broken. "CNV-3042" failure message "FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pytest-of-cnv-qe-jenkins/pytest-4/must_gather0/registry-proxy-engineering-redhat-com-rh-osbs-container-native-virtualization-cnv-must-gather-rhel8-sha256-c71f96cec17db16095fc244cdba72587ece3056d8dddbb1ec70bdf4164817b85/cluster-scoped-resources/networkaddonsoperator.network.kubevirt.io/networkaddonsconfigs/cluster.yaml'">cnv_must_gather "CNV-2720" failure message "FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pytest-of-cnv-qe-jenkins/pytest-4/must_gather0/registry-proxy-engineering-redhat-com-rh-osbs-container-native-virtualization-cnv-must-gather-rhel8-sha256-c71f96cec17db16095fc244cdba72587ece3056d8dddbb1ec70bdf4164817b85/namespaces/openshift-cnv/k8s.cni.cncf.io/network-attachment-definitions/mgnad.yaml'">cnv_must_gather "CNV-3043" failure message "FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pytest-of-cnv-qe-jenkins/pytest-4/must_gather0/registry-proxy-engineering-redhat-com-rh-osbs-container-native-virtualization-cnv-must-gather-rhel8-sha256-c71f96cec17db16095fc244cdba72587ece3056d8dddbb1ec70bdf4164817b85/namespaces/node-gather-unprivileged/kubevirt.io/virtualmachines/vm.yaml'">cnv_must_gather "CNV-2721" failure message "FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pytest-of-cnv-qe-jenkins/pytest-4/must_gather0/registry-proxy-engineering-redhat-com-rh-osbs-container-native-virtualization-cnv-must-gather-rhel8-sha256-c71f96cec17db16095fc244cdba72587ece3056d8dddbb1ec70bdf4164817b85/namespaces/openshift-cnv/pods/bridge-marker-46m6c/bridge-marker-46m6c.yaml'">cnv_must_gather "CNV-2705" failure message "FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pytest-of-cnv-qe-jenkins/pytest-4/must_gather0/registry-proxy-engineering-redhat-com-rh-osbs-container-native-virtualization-cnv-must-gather-rhel8-sha256-c71f96cec17db16095fc244cdba72587ece3056d8dddbb1ec70bdf4164817b85/namespaces/openshift-cnv/pods/kube-cni-linux-bridge-plugin-6r9gt/kube-cni-linux-bridge-plugin-6r9gt.yaml'">cnv_must_gather "CNV-2983" failure message "FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pytest-of-cnv-qe-jenkins/pytest-4/must_gather0/registry-proxy-engineering-redhat-com-rh-osbs-container-native-virtualization-cnv-must-gather-rhel8-sha256-c71f96cec17db16095fc244cdba72587ece3056d8dddbb1ec70bdf4164817b85/namespaces/openshift-cnv/pods/kubemacpool-mac-controller-manager-578577887c-bzrhn/kubemacpool-mac-controller-manager-578577887c-bzrhn.yaml'">cnv_must_gather "CNV-2984" failure message "FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pytest-of-cnv-qe-jenkins/pytest-4/must_gather0/registry-proxy-engineering-redhat-com-rh-osbs-container-native-virtualization-cnv-must-gather-rhel8-sha256-c71f96cec17db16095fc244cdba72587ece3056d8dddbb1ec70bdf4164817b85/namespaces/openshift-cnv/pods/nmstate-handler-6h2fq/nmstate-handler-6h2fq.yaml'">cnv_must_gather "CNV-2985" failure message "FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pytest-of-cnv-qe-jenkins/pytest-4/must_gather0/registry-proxy-engineering-redhat-com-rh-osbs-container-native-virtualization-cnv-must-gather-rhel8-sha256-c71f96cec17db16095fc244cdba72587ece3056d8dddbb1ec70bdf4164817b85/namespaces/openshift-cnv/pods/cluster-network-addons-operator-685958bc9c-gkkfc/cluster-network-addons-operator-685958bc9c-gkkfc.yaml'">cnv_must_gather "CNV-2986" failure message "FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pytest-of-cnv-qe-jenkins/pytest-4/must_gather0/registry-proxy-engineering-redhat-com-rh-osbs-container-native-virtualization-cnv-must-gather-rhel8-sha256-c71f96cec17db16095fc244cdba72587ece3056d8dddbb1ec70bdf4164817b85/namespaces/openshift-cnv/pods/ovs-cni-amd64-pv69k/ovs-cni-amd64-pv69k.yaml'">cnv_must_gather "CNV-2718" failure message "FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pytest-of-cnv-qe-jenkins/pytest-4/must_gather0/registry-proxy-engineering-redhat-com-rh-osbs-container-native-virtualization-cnv-must-gather-rhel8-sha256-c71f96cec17db16095fc244cdba72587ece3056d8dddbb1ec70bdf4164817b85/namespaces/openshift-cnv/pods/kubemacpool-mac-controller-manager-578577887c-bzrhn/kubemacpool-mac-controller-manager-578577887c-bzrhn.yaml'">cnv_must_gather "CNV-2715" failure message "FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pytest-of-cnv-qe-jenkins/pytest-4/must_gather0/registry-proxy-engineering-redhat-com-rh-osbs-container-native-virtualization-cnv-must-gather-rhel8-sha256-c71f96cec17db16095fc244cdba72587ece3056d8dddbb1ec70bdf4164817b85/namespaces/openshift-cnv/pods/kube-cni-linux-bridge-plugin-6r9gt/cni-plugins/cni-plugins/logs/previous.log'">cnv_must_gather
Geetika, sorry for noticing this only now, but typically you should not reopen a bug that was already verified, closed and delivered to customers. Please open a fresh specific bug about the regression that you are seeing.
Move bug to Closed due to #comment18