Bug 2076379 - must-gather: ruletables and qemu logs collected as a part of gather_vm_details scripts are zero bytes file
Summary: must-gather: ruletables and qemu logs collected as a part of gather_vm_detail...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Logging
Version: 4.11.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 4.11.0
Assignee: Simone Tiraboschi
QA Contact: Debarati Basu-Nag
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-04-18 20:53 UTC by Debarati Basu-Nag
Modified: 2023-11-13 08:11 UTC (History)
5 users (show)

Fixed In Version: hco-bundle-registry-container-v4.11.0-557
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-09-14 19:30:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
vm gather with namespace specified. (151.37 KB, text/plain)
2022-04-18 20:53 UTC, Debarati Basu-Nag
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt must-gather pull 148 0 None Merged gather_vms_details: adapt to rootless virt-launcher 2022-06-08 08:49:04 UTC
Github kubevirt must-gather pull 155 0 None Merged Try to collect qemu logs also from another path 2022-07-06 13:02:28 UTC
Red Hat Issue Tracker CNV-17667 0 None None None 2023-11-13 08:11:05 UTC
Red Hat Product Errata RHSA-2022:6526 0 None None None 2022-09-14 19:32:19 UTC

Description Debarati Basu-Nag 2022-04-18 20:53:42 UTC
Created attachment 1873398 [details]
vm gather with namespace specified.

Description of problem: On running gather_vm_details against 4.11, we are seeing the following warning on console:
===================
Warning: would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false (containers "gather", "copy" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (containers "gather", "copy" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or containers "gather", "copy" must set securityContext.runAsNonRoot=true), seccompProfile (pod or containers "gather", "copy" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
===================
And rulestable, qemu log information are no longer getting collected.



Version-Release number of selected component (if applicable):
4.11.0-224

How reproducible:
100%

Steps to Reproduce:
1. Run vm gather commmand: oc adm must-gather --image=registry.redhat.io/container-native-virtualization/cnv-must-gather-rhel8@sha256:e49438a284e8e9121137132f3810ecf49909892574668cb9cd2b19b8b46d08a0 --dest-dir=/tmp/pytest-EX4WtwouaHUoyZacyicyzk/must_gather_test_vm_gather_alternate_namespace0 -- NS=must-gather-alternate gather_vms_details 

or 
oc adm must-gather --image=registry.redhat.io/container-native-virtualization/cnv-must-gather-rhel8@sha256:e49438a284e8e9121137132f3810ecf49909892574668cb9cd2b19b8b46d08a0 --dest-dir=/tmp/pytest-EX4WtwouaHUoyZacyicyzk/must_gather_scope_module1 -- gather_vms_details
2.
3.

Actual results:
rulestable and qemu logs collected are empty.
Following error message is seen on mustgather output:
=======================
[must-gather-9vccn] POD 2022-04-18T20:43:26.119439364Z cat: /var/log/libvirt/qemu/must-gather-alternate_must-gather-vm-2-1650314512-3492553.log: Permission denied
[must-gather-9vccn] POD 2022-04-18T20:43:26.128515074Z command terminated with exit code 1
[must-gather-9vccn] POD 2022-04-18T20:43:26.193723643Z cat: /var/log/libvirt/qemu/must-gather-alternate_must-gather-vm-3-1650314512-5202754.log: Permission denied
[must-gather-9vccn] POD 2022-04-18T20:43:26.195080289Z command terminated with exit code 1
[must-gather-9vccn] POD 2022-04-18T20:43:26.277463123Z cat: /var/log/libvirt/qemu/must-gather-alternate_must-gather-vm-0-1650314511-7316048.log: Permission denied
[must-gather-9vccn] POD 2022-04-18T20:43:26.289674838Z command terminated with exit code 1
[must-gather-9vccn] POD 2022-04-18T20:43:26.322750933Z cat: /var/log/libvirt/qemu/must-gather-alternate_must-gather-vm-1-1650314511-9379501.log: Permission denied
[must-gather-9vccn] POD 2022-04-18T20:43:26.325656916Z command terminated with exit code 1
[must-gather-9vccn] POD 2022-04-18T20:43:26.370199606Z cat: /var/log/libvirt/qemu/must-gather-alternate_must-gather-vm-4-1650314512-6850836.log: Permission denied
[must-gather-9vccn] POD 2022-04-18T20:43:26.371093484Z command terminated with exit code 1
[must-gather-9vccn] POD 2022-04-18T20:43:27.814192970Z Defaulted container "virt-handler" out of: virt-handler, virt-launcher (init)
[must-gather-9vccn] POD 2022-04-18T20:43:28.049858518Z command terminated with exit code 1
[must-gather-9vccn] POD 2022-04-18T20:43:28.178720318Z Defaulted container "virt-handler" out of: virt-handler, virt-launcher (init)
[must-gather-9vccn] POD 2022-04-18T20:43:28.257773254Z Defaulted container "virt-handler" out of: virt-handler, virt-launcher (init)
[must-gather-9vccn] POD 2022-04-18T20:43:28.310118977Z Defaulted container "virt-handler" out of: virt-handler, virt-launcher (init)
[must-gather-9vccn] POD 2022-04-18T20:43:28.670392002Z Defaulted container "virt-handler" out of: virt-handler, virt-launcher (init)
[must-gather-9vccn] POD 2022-04-18T20:43:28.911945137Z command terminated with exit code 1
[must-gather-9vccn] POD 2022-04-18T20:43:28.919012468Z command terminated with exit code 1
[must-gather-9vccn] POD 2022-04-18T20:43:28.982849444Z command terminated with exit code 1
[must-gather-9vccn] POD 2022-04-18T20:43:28.995467881Z command terminated with exit code 1
=======================

Expected results:
All valid must-gather information should be collected.

Additional info:
Will attach vm gather output.

Comment 2 Simone Tiraboschi 2022-04-22 09:30:47 UTC
The issue is clearly reproducible:
must-gather execute this command to fetch qemu logs: 
https://github.com/kubevirt/must-gather/blob/main/collection-scripts/gather_vms_details#L24

but it fails when executed against CNV 4.11 virt-launcher images:

stirabos@t14s:~$ oc exec virt-launcher-testvm-r9bml -n default -c compute -- cat "/var/log/libvirt/qemu/default_virt-launcher-testvm-r9bml.log" > "virt-launcher-testvm-r9bml.qemu.log"
cat: /var/log/libvirt/qemu/default_virt-launcher-testvm-r9bml.log: Permission denied
command terminated with exit code 1

the reason is that now the pod is running as qemu user:

stirabos@t14s:~$ oc exec virt-launcher-testvm-r9bml -n default -c compute -- whoami
qemu


while /var/log/libvirt is accessible only by root:

stirabos@t14s:~$ oc exec virt-launcher-testvm-r9bml -n default -c compute -- ls -l "/var/log" 
total 88
-rw-r--r--. 1 root root 20961 Apr 14 01:52 dnf.librepo.log
-rw-r--r--. 1 root root 50816 Apr 14 01:52 dnf.log
-rw-r--r--. 1 root root  7809 Apr 14 01:52 dnf.rpm.log
-rw-r--r--. 1 root root   600 Apr 14 01:52 hawkey.log
drwx------. 3 root root    18 Apr 14 01:52 libvirt
drwxr-xr-x. 1 root root    22 Apr 14 01:50 rhsm
drwxr-xr-x. 3 root root    21 Apr 14 01:52 swtpm

I think that this is a side effect of the NonRoot feature gate which is now enabled by default:
https://github.com/kubevirt/hyperconverged-cluster-operator/pull/1846

@Stu, should this be fixed on the virt-launcher image?

Comment 6 Ohad 2022-06-14 13:45:15 UTC
The is still failing with this error against version v4.11.0-459

AssertionError: Following errors found in must-gather dict_keys(['file_not_found']). defaultdict(<class 'dict'>, {'file_not_found': {'must-gather-vm-0-1655142241-572693': ['qemu.log'], 'must-gather-vm-1-1655142241-7908125': ['qemu.log'], 'must-gather-vm-2-1655142242-0629196': ['qemu.log'], 'must-gather-vm-3-1655142242-3891873': ['qemu.log'], 'must-gather-vm-4-1655142242-6430938': ['qemu.log'] 


This is from must_gather output:
13:46:08  [must-gather-wqxnv] POD 2022-06-13T17:45:47.910589030Z tar: Removing leading `/' from member names
13:46:08  [must-gather-wqxnv] POD 2022-06-13T17:45:47.910589030Z tar: /var/log/libvirt/qemu: Warning: Cannot stat: Permission denied
13:46:08  [must-gather-wqxnv] POD 2022-06-13T17:45:47.910589030Z tar: /var/run/libvirt/qemu/log: Warning: Cannot stat: No such file or directory
13:46:08  [must-gather-wqxnv] POD 2022-06-13T17:45:47.995736930Z tar: Removing leading `/' from member names
13:46:08  [must-gather-wqxnv] POD 2022-06-13T17:45:47.996125170Z tar: /var/log/libvirt/qemu: Warning: Cannot stat: Permission denied
13:46:08  [must-gather-wqxnv] POD 2022-06-13T17:45:47.996125170Z tar: /var/run/libvirt/qemu/log: Warning: Cannot stat: No such file or directory
13:46:08  [must-gather-wqxnv] POD 2022-06-13T17:45:48.111512128Z tar: Removing leading `/' from member names
13:46:08  [must-gather-wqxnv] POD 2022-06-13T17:45:48.111512128Z tar: /var/log/libvirt/qemu: Warning: Cannot stat: Permission denied
13:46:08  [must-gather-wqxnv] POD 2022-06-13T17:45:48.111512128Z tar: /var/run/libvirt/qemu/log: Warning: Cannot stat: No such file or directory
13:46:08  [must-gather-wqxnv] POD 2022-06-13T17:45:48.141320316Z tar: Removing leading `/' from member names
13:46:08  [must-gather-wqxnv] POD 2022-06-13T17:45:48.141320316Z tar: /var/log/libvirt/qemu: Warning: Cannot stat: Permission denied
13:46:08  [must-gather-wqxnv] POD 2022-06-13T17:45:48.141320316Z tar: /var/run/libvirt/qemu/log: Warning: Cannot stat: No such file or directory
13:46:08  [must-gather-wqxnv] POD 2022-06-13T17:45:48.209867896Z tar: Removing leading `/' from member names

Comment 7 Simone Tiraboschi 2022-06-15 10:12:26 UTC
Now we are using tar to collect the whole content of /var/run/libvirt/qemu/log/ (for rootless virt-launcher) and /var/log/libvirt/qemu/ (for backward compatibility).
This provides a few advantages:
- a single command can work for for the scenarios
- we honor the original file names
- we also collect log-rotated files (.log.0 log.1) for long running VMs

Since only one of /var/run/libvirt/qemu/log/ or /var/log/libvirt/qemu/ is actually containing logs (depending on how virt-launcher is configured) the other is supposed to produce a warning that we can safely ignore.
We can easily suppress tar stderr to keep the collection logs clean but this can potentially hide real problems.

Moving back to ON_QA for further verification.

Comment 8 Debarati Basu-Nag 2022-06-27 20:42:52 UTC
@stirabos Did the location and name of the qemu log changed? I still don't see any qemu related files under the previous location:
==================
[cnv-qe-jenkins@c01-dbn-411-qgqm5-executor must_gather_scope_module0]$ ls /tmp/pytest-DN4RLd9XwiVKgWkAX885t8/must_gather_scope_module0/registry-redhat-io-container-native-virtualization-cnv-must-gather-rhel8-sha256-64915e55c37ab9167e96eb81998aa081c751a0dc0957c6833cb8752a99d4ae9e/namespaces/node-gather-unprivileged/vms/must-gather-vm-2-1656360982-6516314/
virt-launcher-must-gather-vm-2-1656360982-6516314-4vfbz.bridge.txt   virt-launcher-must-gather-vm-2-1656360982-6516314-4vfbz.ip.txt
virt-launcher-must-gather-vm-2-1656360982-6516314-4vfbz.dumpxml.xml  virt-launcher-must-gather-vm-2-1656360982-6516314-4vfbz.ruletables.txt
[cnv-qe-jenkins@c01-dbn-411-qgqm5-executor must_gather_scope_module0]$ 


Attaching the current command and output files.

Comment 10 Simone Tiraboschi 2022-07-06 06:42:09 UTC
With https://github.com/kubevirt/kubevirt/pull/7713 the path of the logs in virt-launcher instances changed again,
we need also https://github.com/kubevirt/must-gather/pull/155 to collect also from the latest path.

Comment 11 Debarati Basu-Nag 2022-07-13 18:37:01 UTC
Verified against build CNV-v4.11.0-566, file is getting collected:

===============
[cnv-qe-jenkins@c01-db-411-8xdfz-executor must-gather-vm-2-1657737005-1774642]$ ls -lt
total 36
drwxr-xr-x. 2 cnv-qe-jenkins cnv-qe-jenkins    6 Jul 13 18:32 log
-rw-r--r--. 1 cnv-qe-jenkins cnv-qe-jenkins 1496 Jul 13 18:32 virt-launcher-must-gather-vm-2-1657737005-1774642-ltmz7.ruletables.txt
-rw-r--r--. 1 cnv-qe-jenkins cnv-qe-jenkins 2008 Jul 13 18:32 virt-launcher-must-gather-vm-2-1657737005-1774642-ltmz7.bridge.txt
-rw-r--r--. 1 cnv-qe-jenkins cnv-qe-jenkins 2179 Jul 13 18:32 virt-launcher-must-gather-vm-2-1657737005-1774642-ltmz7.ip.txt
-rw-r--r--. 1 cnv-qe-jenkins cnv-qe-jenkins 9449 Jul 13 18:32 virt-launcher-must-gather-vm-2-1657737005-1774642-ltmz7.dumpxml.xml
-rw-------. 1 cnv-qe-jenkins cnv-qe-jenkins 8264 Jul 13 18:32 node-gather-unprivileged_must-gather-vm-2-1657737005-1774642.log
[cnv-qe-jenkins@c01-db-411-8xdfz-executor must-gather-vm-2-1657737005-1774642]$

Comment 14 errata-xmlrpc 2022-09-14 19:30:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Virtualization 4.11.0 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:6526


Note You need to log in before you can comment on or make changes to this bug.