Bug 1771916 - [must gather] Some of the node's info is not collected by cnv must gather
Summary: [must gather] Some of the node's info is not collected by cnv must gather
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Providers
Version: 2.1.1
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 2.4.0
Assignee: Avram Levitter
QA Contact: Tareq Alayan
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-11-13 08:55 UTC by Yan Du
Modified: 2020-06-22 06:59 UTC (History)
8 users (show)

Fixed In Version: cnv-must-gather-container-v2.2.0-6
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-06-22 06:58:27 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
debug instructions (1.72 KB, text/plain)
2019-11-14 12:39 UTC, Marcin Mirecki
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2020:0307 0 None None None 2020-01-30 16:27:42 UTC

Description Yan Du 2019-11-13 08:55:42 UTC
Description of problem:
Some of the node's info is not collected by cnv must gather


Version-Release number of selected component (if applicable):
Client Version: v4.2.0
Server Version: 4.2.0
Kubernetes Version: v1.14.6+2e5ed54
CNV2.1


How reproducible:
Always


Steps to Reproduce:
1. Run must gather with cnv must gather image
#oc adm must-gather --image=registry-proxy.engineering.redhat.com/rh-osbs/container-native-virtualization-cnv-must-gather-rhel8:v2.1.1-22
2. Check the must gather log


Actual results:
Some of the node's info is not collected by cnv must gather, 
eg: lspci, ip, bridge, vlan, var-lib-cni-bin, dev_vfio, dmesg are not collected, only collected kubelet and NetworkManager log

$ ls must-gather.local.236191780723349203/registry-proxy-engineering-redhat-com-rh-osbs-container-native-virtualization-cnv-must-gather-rhel8-sha256-4df332d18097850382150327af94935989b11051030dd387d70de9a9179fe544/nodes/cnv-executor-yadu-rhel-worker-1/
cnv-executor-yadu-rhel-worker-1_logs_kubelet  cnv-executor-yadu-rhel-worker-1_logs_NetworkManager


Expected results:
All of node's info should be collected


Additional info:
Try to run must gather with quay image and it works.
#oc adm must-gather --image=quay.io/kubevirt/must-gather


$ ls must-gather-log-quay/quay-io-kubevirt-must-gather-sha256-9953dbc7e9ff24fb4089b576cce6603d68e47396f29337cff83fd832aff26877/nodes/cnv-executor-yadu-rhel-worker-1/
bridge                                               dmesg                                                opt-cni-bin
cnv-executor-yadu-rhel-worker-1_logs_kubelet         etc/                                                 proc_cmdline
cnv-executor-yadu-rhel-worker-1_logs_NetworkManager  ip.txt                                               var-lib-cni-bin
dev_vfio                                             lspci

Comment 1 Petr Horáček 2019-11-13 13:21:45 UTC
I asked Marcin to look into it if he finds some spare time. It may be caused by a difference between U/S and D/S, but we don't know yet.

Comment 2 Marcin Mirecki 2019-11-13 15:02:14 UTC
The image is corrupt.
The /etc/node-gather-ds.yaml script has an empty image field for the node gathering daemonset: "image:"

Comment 3 Marcin Mirecki 2019-11-14 09:30:39 UTC
There seems to be more than one problem, probably connected to a change in the base image.

1.
It looks like the base image for d/s does not include "hostname" (available in u/s centos).
This causes node_gather(l.11): POD_NAME=$(oc get pods --field-selector=status.podIP=$(hostname -I) -n $NAMESPACE -o'custom-columns=name:metadata.name' --no-headers)
to fail.
The field selector is probably redundant, as we only have one pod in this namespace.
so this one could be fixed by just removing the field-selector, so line 11 should look like:
POD_NAME=$(oc get pods -n $NAMESPACE -o'custom-columns=name:metadata.name' --no-headers)

Comment 6 Dan Kenigsberg 2019-11-14 12:14:06 UTC
I don't think this should block 2.1.1, despite being a regression.

I hope that Avram could take a look at this when he is back.

Comment 7 Piotr Kliczewski 2019-11-14 12:37:24 UTC
Do you know whether all the nodes were running without any disruptions? We relaxed constrains on having node-gather pod running on all the nodes due to issue reported. We saw that when one of the nodes were not stable cnv-must-gather failed with timeout without providing any logs.

Comment 8 Marcin Mirecki 2019-11-14 12:39:33 UTC
Created attachment 1636124 [details]
debug instructions

Comment 9 Yan Du 2019-11-18 03:04:38 UTC
HI, Piotr

I checked all the nodes directory under must-gather_xxx/.../.../nodes/ and only logs_kubelet  and logs_NetworkManager exist for all the nodes.  And run must gather again with upstream image on the same cluster, then all the logs are collected normally. So I think the nodes were running well without any disruptions.

Comment 10 Piotr Kliczewski 2019-11-18 08:31:15 UTC
Thank you, Marcin M. mentioned that based image got changed and there are some tools missing which used to be part of the image.
We need to understand which are missing and install them accordingly.

Comment 11 Avram Levitter 2019-11-26 14:44:07 UTC
Building with the ubi8-minimal and adding hostname to the packages fixes it.
Since there can be such a discrepancy between centos:7 (the u/s image) and ubi8-minimal/ubi8, would using the ubi (minimal or not) as both u/s and d/s image and ensuring that hostname is a dependency for both be better long term? ubi8 also is missing hostname so it's not merely an issue of minimal vs standard.
If need be, I can easily compile a list of packages present in centos:7 that aren't also in ubi8/ubi8-minimal.

Comment 12 Yan Du 2020-01-13 07:17:54 UTC
Client Version: 4.3.0-0.nightly-2020-01-11-070223
Server Version: 4.3.0-0.nightly-2020-01-11-070223
Kubernetes Version: v1.16.2
CNV2.2
registry-proxy.engineering.redhat.com/rh-osbs/container-native-virtualization-cnv-must-gather-rhel8:v2.2.0-13

Issue have been fixed.
$ ls nodes/host-172-16-0-49/
bridge  dev_vfio  dmesg  etc  host-172-16-0-49_logs_kubelet  host-172-16-0-49_logs_NetworkManager  ip.txt  lspci  nft-ip-filter  nft-ip-nat  opt-cni-bin  proc_cmdline  var-lib-cni-bin  vlan

Comment 14 errata-xmlrpc 2020-01-30 16:27:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:0307

Comment 15 Geetika Kapoor 2020-03-13 11:30:04 UTC
Test Environment:
================

[cloud-user@ocp-psi-executor must_gather]$ oc version
Client Version: 4.4.0-0.nightly-2020-02-17-022408
Server Version: 4.4.0-0.nightly-2020-03-06-170328
Kubernetes Version: v1.17.1


Test Cases Affected:
====================

1. https://polarion.engineering.redhat.com/polarion/#/project/CNV/workitem?id=CNV-2732
2. https://polarion.engineering.redhat.com/polarion/#/project/CNV/workitem?id=CNV-2809


Test Scenario:
==============

Above two test cases, looks for more information under nodes while running, 

oc adm must-gather --image=registry-proxy.engineering.redhat.com/rh-osbs/container-native-virtualization-cnv-must-gather-rhel8:v2.3.0-32 --dest-dir=/tmp/pytest-of-cloud-user/pytest-36/must_gather

But Nodes folder has NetworkManager and Kublet information.

[cloud-user@ocp-psi-executor nodes]$ ls
slave-gkapoor-nbrjn-master-0  slave-gkapoor-nbrjn-master-2      slave-gkapoor-nbrjn-worker-x7q8b
slave-gkapoor-nbrjn-master-1  slave-gkapoor-nbrjn-worker-bbxzc  slave-gkapoor-nbrjn-worker-z7hnj

[cloud-user@ocp-psi-executor nodes]$ ls slave-gkapoor-nbrjn-worker-x7q8b
slave-gkapoor-nbrjn-worker-x7q8b_logs_kubelet  slave-gkapoor-nbrjn-worker-x7q8b_logs_NetworkManager


Expected Result:
===============

More information needed for ip, nft-ip-filter, nft-ip-nat.

Comment 16 Geetika Kapoor 2020-03-23 14:36:21 UTC
Test Env:
=======

$ oc version
Client Version: 4.4.0-0.nightly-2020-03-03-195752
Server Version: 4.4.0-0.nightly-2020-03-02-011520
Kubernetes Version: v1.17.1


Few other files seems to be missing and thus lot of Test cases/automation are broken.

	"CNV-3042"  failure message	"FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pytest-of-cnv-qe-jenkins/pytest-4/must_gather0/registry-proxy-engineering-redhat-com-rh-osbs-container-native-virtualization-cnv-must-gather-rhel8-sha256-c71f96cec17db16095fc244cdba72587ece3056d8dddbb1ec70bdf4164817b85/cluster-scoped-resources/networkaddonsoperator.network.kubevirt.io/networkaddonsconfigs/cluster.yaml'">cnv_must_gather 
	"CNV-2720"  failure message	"FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pytest-of-cnv-qe-jenkins/pytest-4/must_gather0/registry-proxy-engineering-redhat-com-rh-osbs-container-native-virtualization-cnv-must-gather-rhel8-sha256-c71f96cec17db16095fc244cdba72587ece3056d8dddbb1ec70bdf4164817b85/namespaces/openshift-cnv/k8s.cni.cncf.io/network-attachment-definitions/mgnad.yaml'">cnv_must_gather 
	"CNV-3043"  failure message	"FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pytest-of-cnv-qe-jenkins/pytest-4/must_gather0/registry-proxy-engineering-redhat-com-rh-osbs-container-native-virtualization-cnv-must-gather-rhel8-sha256-c71f96cec17db16095fc244cdba72587ece3056d8dddbb1ec70bdf4164817b85/namespaces/node-gather-unprivileged/kubevirt.io/virtualmachines/vm.yaml'">cnv_must_gather 

	"CNV-2721"  failure message	"FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pytest-of-cnv-qe-jenkins/pytest-4/must_gather0/registry-proxy-engineering-redhat-com-rh-osbs-container-native-virtualization-cnv-must-gather-rhel8-sha256-c71f96cec17db16095fc244cdba72587ece3056d8dddbb1ec70bdf4164817b85/namespaces/openshift-cnv/pods/bridge-marker-46m6c/bridge-marker-46m6c.yaml'">cnv_must_gather 
	"CNV-2705"  failure message	"FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pytest-of-cnv-qe-jenkins/pytest-4/must_gather0/registry-proxy-engineering-redhat-com-rh-osbs-container-native-virtualization-cnv-must-gather-rhel8-sha256-c71f96cec17db16095fc244cdba72587ece3056d8dddbb1ec70bdf4164817b85/namespaces/openshift-cnv/pods/kube-cni-linux-bridge-plugin-6r9gt/kube-cni-linux-bridge-plugin-6r9gt.yaml'">cnv_must_gather 
	"CNV-2983"  failure message	"FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pytest-of-cnv-qe-jenkins/pytest-4/must_gather0/registry-proxy-engineering-redhat-com-rh-osbs-container-native-virtualization-cnv-must-gather-rhel8-sha256-c71f96cec17db16095fc244cdba72587ece3056d8dddbb1ec70bdf4164817b85/namespaces/openshift-cnv/pods/kubemacpool-mac-controller-manager-578577887c-bzrhn/kubemacpool-mac-controller-manager-578577887c-bzrhn.yaml'">cnv_must_gather 
	"CNV-2984"  failure message	"FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pytest-of-cnv-qe-jenkins/pytest-4/must_gather0/registry-proxy-engineering-redhat-com-rh-osbs-container-native-virtualization-cnv-must-gather-rhel8-sha256-c71f96cec17db16095fc244cdba72587ece3056d8dddbb1ec70bdf4164817b85/namespaces/openshift-cnv/pods/nmstate-handler-6h2fq/nmstate-handler-6h2fq.yaml'">cnv_must_gather 
	"CNV-2985"  failure message	"FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pytest-of-cnv-qe-jenkins/pytest-4/must_gather0/registry-proxy-engineering-redhat-com-rh-osbs-container-native-virtualization-cnv-must-gather-rhel8-sha256-c71f96cec17db16095fc244cdba72587ece3056d8dddbb1ec70bdf4164817b85/namespaces/openshift-cnv/pods/cluster-network-addons-operator-685958bc9c-gkkfc/cluster-network-addons-operator-685958bc9c-gkkfc.yaml'">cnv_must_gather 
	"CNV-2986"  failure message	"FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pytest-of-cnv-qe-jenkins/pytest-4/must_gather0/registry-proxy-engineering-redhat-com-rh-osbs-container-native-virtualization-cnv-must-gather-rhel8-sha256-c71f96cec17db16095fc244cdba72587ece3056d8dddbb1ec70bdf4164817b85/namespaces/openshift-cnv/pods/ovs-cni-amd64-pv69k/ovs-cni-amd64-pv69k.yaml'">cnv_must_gather 
	"CNV-2718"  failure message	"FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pytest-of-cnv-qe-jenkins/pytest-4/must_gather0/registry-proxy-engineering-redhat-com-rh-osbs-container-native-virtualization-cnv-must-gather-rhel8-sha256-c71f96cec17db16095fc244cdba72587ece3056d8dddbb1ec70bdf4164817b85/namespaces/openshift-cnv/pods/kubemacpool-mac-controller-manager-578577887c-bzrhn/kubemacpool-mac-controller-manager-578577887c-bzrhn.yaml'">cnv_must_gather 

	"CNV-2715"  failure message	"FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pytest-of-cnv-qe-jenkins/pytest-4/must_gather0/registry-proxy-engineering-redhat-com-rh-osbs-container-native-virtualization-cnv-must-gather-rhel8-sha256-c71f96cec17db16095fc244cdba72587ece3056d8dddbb1ec70bdf4164817b85/namespaces/openshift-cnv/pods/kube-cni-linux-bridge-plugin-6r9gt/cni-plugins/cni-plugins/logs/previous.log'">cnv_must_gather

Comment 17 Dan Kenigsberg 2020-06-18 06:04:27 UTC
Geetika, sorry for noticing this only now, but typically you should not reopen a bug that was already verified, closed and delivered to customers. Please open a fresh specific bug about the regression that you are seeing.

Comment 19 Yan Du 2020-06-22 06:58:27 UTC
Move bug to Closed due to #comment18


Note You need to log in before you can comment on or make changes to this bug.