Bug 2161937 - collect kernel and journal logs from all worker nodes
Summary: collect kernel and journal logs from all worker nodes
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: must-gather
Version: unspecified
Hardware: Unspecified
OS: Unspecified
high
unspecified
Target Milestone: ---
: ODF 4.13.0
Assignee: yati padia
QA Contact: Oded
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-01-18 11:25 UTC by yati padia
Modified: 2023-12-08 04:31 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-06-21 15:23:05 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github red-hat-storage odf-must-gather pull 21 0 None open removes label to collect logs from all nodes 2023-04-12 10:50:43 UTC
Github red-hat-storage odf-must-gather pull 22 0 None open Bug 2161937: [release-4.13] removes label to collect logs from all nodes 2023-04-13 09:36:35 UTC
Red Hat Product Errata RHBA-2023:3742 0 None None None 2023-06-21 15:24:13 UTC

Description yati padia 2023-01-18 11:25:18 UTC
Description of problem (please be detailed as possible and provide log
snippests):
The collection of kernel and journal logs which was added recently in must-gather is incomplete.The must-gather doesn't collect logs of all the worker nodes.


Actual results:

[🎩ī¸Ž]mrajanna@fedora rook $]oc get nodes
> NAME              STATUS   ROLES                  AGE   VERSION
> compute-0         Ready    worker                 8d    v1.25.4+77bec7a
> compute-1         Ready    worker                 8d    v1.25.4+77bec7a
> compute-2         Ready    worker                 8d    v1.25.4+77bec7a
> compute-3         Ready    worker                 8d    v1.25.4+77bec7a
> compute-4         Ready    worker                 8d    v1.25.4+77bec7a
> compute-5         Ready    worker                 8d    v1.25.4+77bec7a


[DIR]	journal_compute-0/ 	2023-01-16 00:39 	- 	 
[DIR]	journal_compute-3/ 	2023-01-16 00:39 	- 	 
[DIR]	journal_compute-5/ 	2023-01-16 00:40 	- 	 
[DIR]	kernel_compute-0/ 	2023-01-16 00:40 	- 	 
[DIR]	kernel_compute-3/ 	2023-01-16 00:40 	- 	 
[DIR]	kernel_compute-5/ 	2023-01-16 00:40 	- 	


Expected results:

Logs from all the nodes must be collected.


Additional info:
Refer https://bugzilla.redhat.com/show_bug.cgi?id=2160034#c24 for more details.

Comment 9 Mudit Agarwal 2023-03-06 14:06:53 UTC
Moved it back, automated query

Comment 16 Mudit Agarwal 2023-04-13 09:35:18 UTC
PR was never backported to 4.13, do not move it to ON_QA until there is a build available with the fix.

Comment 20 Oded 2023-04-24 12:01:40 UTC
Bug Fixed:
OCP Version:4.13.0-0.nightly-2023-04-21-084440
ODF Version: odf-operator.v4.13.0-172.stable
Platform:Vsphere

Test Procedure:
1.Collect mg:
oc adm must-gather --image=quay.io/rhceph-dev/ocs-must-gather:latest-4.13

2.Verify kernel and journal logs from all worker nodes exist:

$ oc get nodes
NAME              STATUS   ROLES                  AGE   VERSION
compute-0         Ready    worker                 26h   v1.26.3+379cd9f
compute-1         Ready    worker                 26h   v1.26.3+379cd9f
compute-2         Ready    worker                 26h   v1.26.3+379cd9f
control-plane-0   Ready    control-plane,master   26h   v1.26.3+379cd9f
control-plane-1   Ready    control-plane,master   26h   v1.26.3+379cd9f
control-plane-2   Ready    control-plane,master   26h   v1.26.3+379cd9f

oviner:quay-io-rhceph-dev-ocs-must-gather-sha256-79f522ddb035becf5878305c4af24de6d83610b42e849505b5159ab20b8bb5fa$ find -name "*kernel_*"
./ceph/kernel_compute-0
./ceph/kernel_compute-0/kernel_compute-0.gz
./ceph/kernel_compute-1
./ceph/kernel_compute-1/kernel_compute-1.gz
./ceph/kernel_compute-2
./ceph/kernel_compute-2/kernel_compute-2.gz

oviner:quay-io-rhceph-dev-ocs-must-gather-sha256-79f522ddb035becf5878305c4af24de6d83610b42e849505b5159ab20b8bb5fa$ find -name "*journal_*"
./ceph/journal_compute-0
./ceph/journal_compute-0/journal_compute-0.gz
./ceph/journal_compute-1
./ceph/journal_compute-1/journal_compute-1.gz
./ceph/journal_compute-2
./ceph/journal_compute-2/journal_compute-2.gz

Comment 23 Oded 2023-05-16 08:48:41 UTC
Hi Ilya,
In this setup, we collected journal and kernel logs on all worker nodes.

SetUP:
ODF4.13, OCP4.13, VSPHERE UPI.

OCS MG DIR: https://url.corp.redhat.com/53c1088

JOB Link: https://url.corp.redhat.com/48355c2

@ypadia Do we need to collect this logs on worker nodes without ocs label [cluster.ocs.openshift.io/openshift-storage]?

compute-5         Ready    worker                 62m   v1.26.2+22308ca   10.1.112.80   10.1.112.80   Red Hat Enterprise Linux CoreOS 413.92.202304101935-0 (Plow)   5.14.0-284.10.1.el9_2.x86_64   cri-o://1.26.3-2.rhaos4.13.gitafec31f.el9   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=vsphere-vm.cpu-16.mem-64gb.os-unknown,beta.kubernetes.io/os=linux,cluster.ocs.openshift.io/openshift-storage=,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-5,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=vsphere-vm.cpu-16.mem-64gb.os-unknown,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=data-2

Comment 26 Oded 2023-05-17 11:41:23 UTC
Only journal_compute-4 and kernel_compute-4 logs were collected on setup with 6 worker nodes and 3 worker nodes with ocs-lable

Test Process:
1. Deploy OCP cluster with 6 worker nodes
2. Install ODF opertor
3. Install storage cluster [enable ocs on 3 worker nodes]

$ oc get nodes --show-labels 
NAME              STATUS   ROLES                  AGE   VERSION           LABELS
compute-0         Ready    worker                 21h   v1.26.3+b404935   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=vsphere-vm.cpu-16.mem-64gb.os-unknown,beta.kubernetes.io/os=linux,cluster.ocs.openshift.io/openshift-storage=,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-0,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=vsphere-vm.cpu-16.mem-64gb.os-unknown,node.openshift.io/os_id=rhcos,topology.rook.io/rack=rack1
compute-1         Ready    worker                 21h   v1.26.3+b404935   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=vsphere-vm.cpu-16.mem-64gb.os-unknown,beta.kubernetes.io/os=linux,cluster.ocs.openshift.io/openshift-storage=,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-1,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=vsphere-vm.cpu-16.mem-64gb.os-unknown,node.openshift.io/os_id=rhcos,topology.rook.io/rack=rack0
compute-2         Ready    worker                 21h   v1.26.3+b404935   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=vsphere-vm.cpu-16.mem-64gb.os-unknown,beta.kubernetes.io/os=linux,cluster.ocs.openshift.io/openshift-storage=,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-2,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=vsphere-vm.cpu-16.mem-64gb.os-unknown,node.openshift.io/os_id=rhcos,topology.rook.io/rack=rack2
compute-3         Ready    worker                 21h   v1.26.3+b404935   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=vsphere-vm.cpu-16.mem-64gb.os-unknown,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-3,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=vsphere-vm.cpu-16.mem-64gb.os-unknown,node.openshift.io/os_id=rhcos
compute-4         Ready    worker                 22h   v1.26.3+b404935   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=vsphere-vm.cpu-16.mem-64gb.os-unknown,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-4,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=vsphere-vm.cpu-16.mem-64gb.os-unknown,node.openshift.io/os_id=rhcos
compute-5         Ready    worker                 21h   v1.26.3+b404935   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=vsphere-vm.cpu-16.mem-64gb.os-unknown,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-5,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=vsphere-vm.cpu-16.mem-64gb.os-unknown,node.openshift.io/os_id=rhcos
control-plane-0   Ready    control-plane,master   22h   v1.26.3+b404935   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=vsphere-vm.cpu-4.mem-16gb.os-unknown,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=control-plane-0,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=,node.kubernetes.io/instance-type=vsphere-vm.cpu-4.mem-16gb.os-unknown,node.openshift.io/os_id=rhcos
control-plane-1   Ready    control-plane,master   22h   v1.26.3+b404935   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=vsphere-vm.cpu-4.mem-16gb.os-unknown,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=control-plane-1,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=,node.kubernetes.io/instance-type=vsphere-vm.cpu-4.mem-16gb.os-unknown,node.openshift.io/os_id=rhcos
control-plane-2   Ready    control-plane,master   22h   v1.26.3+b404935   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=vsphere-vm.cpu-4.mem-16gb.os-unknown,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=control-plane-2,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=,node.kubernetes.io/instance-type=vsphere-vm.cpu-4.mem-16gb.os-unknown,node.openshift.io/os_id=rhcos


oviner:ClusterPath$ oc get nodes --show-labels | grep ocs
compute-0         Ready    worker                 21h   v1.26.3+b404935   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=vsphere-vm.cpu-16.mem-64gb.os-unknown,beta.kubernetes.io/os=linux,cluster.ocs.openshift.io/openshift-storage=,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-0,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=vsphere-vm.cpu-16.mem-64gb.os-unknown,node.openshift.io/os_id=rhcos,topology.rook.io/rack=rack1
compute-1         Ready    worker                 21h   v1.26.3+b404935   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=vsphere-vm.cpu-16.mem-64gb.os-unknown,beta.kubernetes.io/os=linux,cluster.ocs.openshift.io/openshift-storage=,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-1,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=vsphere-vm.cpu-16.mem-64gb.os-unknown,node.openshift.io/os_id=rhcos,topology.rook.io/rack=rack0
compute-2         Ready    worker                 21h   v1.26.3+b404935   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=vsphere-vm.cpu-16.mem-64gb.os-unknown,beta.kubernetes.io/os=linux,cluster.ocs.openshift.io/openshift-storage=,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-2,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=vsphere-vm.cpu-16.mem-64gb.os-unknown,node.openshift.io/os_id=rhcos,topology.rook.io/rack=rack2
 
4.Collect OCS MG:
$ oc adm must-gather --image=quay.io/rhceph-dev/ocs-must-gather:latest-4.13

5.Check the content of MG DIR:
oviner:ceph$ ls -la
total 8
drwxr-xr-x. 1 oviner oviner  250 May 17 14:36 .
drwxrwxrwx. 1 oviner oviner  172 May 17 14:36 ..
-rw-r--r--. 1 oviner oviner 3336 May 17 14:27 event-filter.html
drwxr-xr-x. 1 oviner oviner   40 May 17 14:36 journal_compute-4
drwxr-xr-x. 1 oviner oviner   38 May 17 14:36 kernel_compute-4
drwxr-xr-x. 1 oviner oviner  666 May 17 14:36 logs
drwxr-xr-x. 1 oviner oviner 2378 May 17 14:36 must_gather_commands
drwxr-xr-x. 1 oviner oviner 4254 May 17 14:36 must_gather_commands_json_output
drwxr-xr-x. 1 oviner oviner   34 May 17 14:36 namespaces
-rw-r--r--. 1 oviner oviner  768 May 17 14:27 timestamp

Comment 29 Oded 2023-05-25 15:55:28 UTC
 Fixed:


1.Deploy cluster with 6 woker nodes
2.Label 3 worker nodes with OCS label
3.Collect MG   
oc adm must-gather --image=quay.io/rhceph-dev/ocs-must-gather:latest-4.13
4.Verify journal and kernel logs collected for all 6 worker nodes [with OCS label and without OCS label]

$ oc get nodes --show-labels 
NAME              STATUS   ROLES                  AGE     VERSION           LABELS
compute-0         Ready    worker                 7h51m   v1.26.3+b404935   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=vsphere-vm.cpu-16.mem-64gb.os-unknown,beta.kubernetes.io/os=linux,cluster.ocs.openshift.io/openshift-storage=,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-0,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=vsphere-vm.cpu-16.mem-64gb.os-unknown,node.openshift.io/os_id=rhcos,topology.rook.io/rack=rack0
compute-1         Ready    worker                 7h50m   v1.26.3+b404935   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=vsphere-vm.cpu-16.mem-64gb.os-unknown,beta.kubernetes.io/os=linux,cluster.ocs.openshift.io/openshift-storage=,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-1,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=vsphere-vm.cpu-16.mem-64gb.os-unknown,node.openshift.io/os_id=rhcos,topology.rook.io/rack=rack1
compute-2         Ready    worker                 7h51m   v1.26.3+b404935   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=vsphere-vm.cpu-16.mem-64gb.os-unknown,beta.kubernetes.io/os=linux,cluster.ocs.openshift.io/openshift-storage=,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-2,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=vsphere-vm.cpu-16.mem-64gb.os-unknown,node.openshift.io/os_id=rhcos,topology.rook.io/rack=rack2
compute-3         Ready    worker                 7h52m   v1.26.3+b404935   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=vsphere-vm.cpu-16.mem-64gb.os-unknown,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-3,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=vsphere-vm.cpu-16.mem-64gb.os-unknown,node.openshift.io/os_id=rhcos
compute-4         Ready    worker                 7h53m   v1.26.3+b404935   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=vsphere-vm.cpu-16.mem-64gb.os-unknown,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-4,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=vsphere-vm.cpu-16.mem-64gb.os-unknown,node.openshift.io/os_id=rhcos
compute-5         Ready    worker                 7h51m   v1.26.3+b404935   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=vsphere-vm.cpu-16.mem-64gb.os-unknown,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-5,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=vsphere-vm.cpu-16.mem-64gb.os-unknown,node.openshift.io/os_id=rhcos
control-plane-0   Ready    control-plane,master   8h      v1.26.3+b404935   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=vsphere-vm.cpu-4.mem-16gb.os-unknown,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=control-plane-0,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=,node.kubernetes.io/instance-type=vsphere-vm.cpu-4.mem-16gb.os-unknown,node.openshift.io/os_id=rhcos
control-plane-1   Ready    control-plane,master   8h      v1.26.3+b404935   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=vsphere-vm.cpu-4.mem-16gb.os-unknown,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=control-plane-1,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=,node.kubernetes.io/instance-type=vsphere-vm.cpu-4.mem-16gb.os-unknown,node.openshift.io/os_id=rhcos
control-plane-2   Ready    control-plane,master   8h      v1.26.3+b404935   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=vsphere-vm.cpu-4.mem-16gb.os-unknown,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=control-plane-2,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=,node.kubernetes.io/instance-type=vsphere-vm.cpu-4.mem-16gb.os-unknown,node.openshift.io/os_id=rhcos

$ pwd
/home/oviner/ClusterPath/must-gather.local.4357408987461476974/quay-io-rhceph-dev-ocs-must-gather-sha256-10071ddc29383af01d60eadfa4d6f2bd631cfd4c06fcdf7efdb655a84b13a4f1/ceph
oviner:ceph$ ls -l
total 8
drwxr-xr-x. 1 oviner oviner    6 May 25 18:47 ceph_daemon_log_compute-0
drwxr-xr-x. 1 oviner oviner    6 May 25 18:47 ceph_daemon_log_compute-1
drwxr-xr-x. 1 oviner oviner    6 May 25 18:47 ceph_daemon_log_compute-2
-rw-r--r--. 1 oviner oviner 3336 May 25 18:46 event-filter.html
drwxr-xr-x. 1 oviner oviner   40 May 25 18:47 journal_compute-0
drwxr-xr-x. 1 oviner oviner   40 May 25 18:47 journal_compute-1
drwxr-xr-x. 1 oviner oviner   40 May 25 18:47 journal_compute-2
drwxr-xr-x. 1 oviner oviner   40 May 25 18:47 journal_compute-3
drwxr-xr-x. 1 oviner oviner   40 May 25 18:47 journal_compute-4
drwxr-xr-x. 1 oviner oviner   40 May 25 18:47 journal_compute-5
drwxr-xr-x. 1 oviner oviner   38 May 25 18:47 kernel_compute-0
drwxr-xr-x. 1 oviner oviner   38 May 25 18:47 kernel_compute-1
drwxr-xr-x. 1 oviner oviner   38 May 25 18:47 kernel_compute-2
drwxr-xr-x. 1 oviner oviner   38 May 25 18:47 kernel_compute-3
drwxr-xr-x. 1 oviner oviner   38 May 25 18:47 kernel_compute-4
drwxr-xr-x. 1 oviner oviner   38 May 25 18:47 kernel_compute-5
drwxr-xr-x. 1 oviner oviner  666 May 25 18:47 logs
drwxr-xr-x. 1 oviner oviner 2648 May 25 18:47 must_gather_commands
drwxr-xr-x. 1 oviner oviner 4254 May 25 18:47 must_gather_commands_json_output
drwxr-xr-x. 1 oviner oviner   34 May 25 18:47 namespaces
-rw-r--r--. 1 oviner oviner  770 May 25 18:46 timestamp

Comment 33 errata-xmlrpc 2023-06-21 15:23:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Data Foundation 4.13.0 enhancement and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:3742

Comment 34 Red Hat Bugzilla 2023-12-08 04:31:58 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.