Bug 1716387
Summary: | internal error: Failed to parse bitmap '' when starting guest | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Andrea Bolognani <abologna> |
Component: | libvirt | Assignee: | Andrea Bolognani <abologna> |
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 7.7 | CC: | abologna, dzheng, jdenemar, jiyan, jomurphy, jsuchane, mdeng, mtessun, ngu, qzhang, virt-bugs |
Target Milestone: | rc | Keywords: | Automation |
Target Release: | --- | ||
Hardware: | ppc64le | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | libvirt-4.5.0-19.el7 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | 1703661 | Environment: | |
Last Closed: | 2019-08-06 13:14:55 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Andrea Bolognani
2019-06-03 11:56:24 UTC
Alright, I finally managed to reproduce this issue as well, and looking at the log the cause is pretty clear: debug : virCommandRunAsync:2499 : About to run /usr/bin/numad -w 8:1024 debug : virFileClose:111 : Closed fd 21 debug : virFileClose:111 : Closed fd 24 debug : virFileClose:111 : Closed fd 26 debug : virCommandRunAsync:2502 : Command result 0, with PID 83032 error : virCommandWait:2623 : internal error: Child process (/usr/bin/numad -w 8:1024) unexpected exit status 1: Mon Jun 3 07:22:27 2019: Cannot open numad log file (errno: 13) -- using stderr Mon Jun 3 07:22:27 2019: msgget failed debug : virCommandRun:2350 : Result status 0, stdout: '' stderr: 'Mon Jun 3 07:22:27 2019: Cannot open numad log file (errno: 13) -- using stderr Mon Jun 3 07:22:27 2019: msgget failed ' debug : virFileClose:111 : Closed fd 25 error : virNumaGetAutoPlacementAdvice:72 : internal error: Failed to query numad for the advisory nodeset debug : virFileClose:111 : Closed fd 22 debug : qemuProcessPrepareDomainNUMAPlacement:5812 : Nodeset returned from numad: error : virBitmapParseSeparator:581 : invalid argument: Failed to parse bitmap '' libvirt is calling numad asking it to calculate a nodeset for the guest, but numad fails and prints nothing on stdout; libvirt tries to parse the empty output as a bitmap and fails, which is expected because '' is indeed not a valid bitmap. There are two issues involved here: the first one is that libvirt is clearly not reporting a good error on numad failure, and the second one is that numad fails in the first place. I'll cook up patches for the libvirt issue in a minute; as for the numad issue, the stderr output provides a clue about what's going on and audit.log confirms it: type=VIRT_MACHINE_ID msg=audit(1559565132.493:29425): pid=81775 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:virtd_t:s0-s0:c0.c1023 msg='virt=kvm vm="avocado-vt-vm1" uuid=2a946185-d7ad-4cc0-be1b-642b34591465 vm-ctx=system_u:system_r:svirt_t:s0:c528,c731 img-ctx=system_u:object_r:svirt_image_t:s0:c528,c731 model=selinux exe="/usr/sbin/libvirtd" hostname=? addr=? terminal=? res=success' type=VIRT_MACHINE_ID msg=audit(1559565132.493:29426): pid=81775 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:virtd_t:s0-s0:c0.c1023 msg='virt=kvm vm="avocado-vt-vm1" uuid=2a946185-d7ad-4cc0-be1b-642b34591465 vm-ctx=+107:+107 img-ctx=+107:+107 model=dac exe="/usr/sbin/libvirtd" hostname=? addr=? terminal=? res=success' type=AVC msg=audit(1559565132.523:29427): avc: denied { open } for pid=85288 comm="numad" path="/var/log/numad.log" dev="dm-0" ino=101386427 scontext=system_u:system_r:numad_t:s0-s0:c0.c1023 tcontext=system_u:object_r:var_log_t:s0 tclass=file permissive=0 type=SYSCALL msg=audit(1559565132.523:29427): arch=c0000015 syscall=5 success=no exit=-13 a0=13b01a7b8 a1=441 a2=1b6 a3=1 items=0 ppid=81775 pid=85288 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="numad" exe="/usr/bin/numad" subj=system_u:system_r:numad_t:s0-s0:c0.c1023 key=(null) type=PROCTITLE msg=audit(1559565132.523:29427): proctitle=2F7573722F62696E2F6E756D6164002D7700383A31303234 type=AVC msg=audit(1559565132.523:29428): avc: denied { unix_read unix_write } for pid=85288 comm="numad" key=-559038737 scontext=system_u:system_r:numad_t:s0-s0:c0.c1023 tcontext=system_u:system_r:unconfined_service_t:s0 tclass=msgq permissive=0 type=SYSCALL msg=audit(1559565132.523:29428): arch=c0000015 syscall=117 success=no exit=-13 a0=d a1=ffffffffdeadbeef a2=3b0 a3=0 items=0 ppid=81775 pid=85288 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="numad" exe="/usr/bin/numad" subj=system_u:system_r:numad_t:s0-s0:c0.c1023 key=(null) type=PROCTITLE msg=audit(1559565132.523:29428): proctitle=2F7573722F62696E2F6E756D6164002D7700383A31303234 type=VIRT_RESOURCE msg=audit(1559565132.543:29429): pid=81775 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:virtd_t:s0-s0:c0.c1023 msg='virt=kvm resrc=disk reason=start vm="avocado-vt-vm1" uuid=2a946185-d7ad-4cc0-be1b-642b34591465 old-disk="?" new-disk="/var/lib/avocado/data/avocado-vt/images/jeos-27-ppc64le.qcow2" exe="/usr/sbin/libvirtd" hostname=? addr=? terminal=? res=success' type=VIRT_RESOURCE msg=audit(1559565132.543:29430): pid=81775 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:virtd_t:s0-s0:c0.c1023 msg='virt=kvm resrc=net reason=start vm="avocado-vt-vm1" uuid=2a946185-d7ad-4cc0-be1b-642b34591465 old-net="?" new-net="52:54:00:22:d8:18" exe="/usr/sbin/libvirtd" hostname=? addr=? terminal=? res=success' type=VIRT_RESOURCE msg=audit(1559565132.543:29431): pid=81775 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:virtd_t:s0-s0:c0.c1023 msg='virt=kvm resrc=rng reason=start vm="avocado-vt-vm1" uuid=2a946185-d7ad-4cc0-be1b-642b34591465 old-rng="?" new-rng="/dev/urandom" exe="/usr/sbin/libvirtd" hostname=? addr=? terminal=? res=success' type=VIRT_RESOURCE msg=audit(1559565132.543:29432): pid=81775 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:virtd_t:s0-s0:c0.c1023 msg='virt=kvm resrc=mem reason=start vm="avocado-vt-vm1" uuid=2a946185-d7ad-4cc0-be1b-642b34591465 old-mem=0 new-mem=1048576 exe="/usr/sbin/libvirtd" hostname=? addr=? terminal=? res=success' type=VIRT_RESOURCE msg=audit(1559565132.543:29433): pid=81775 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:virtd_t:s0-s0:c0.c1023 msg='virt=kvm resrc=vcpu reason=start vm="avocado-vt-vm1" uuid=2a946185-d7ad-4cc0-be1b-642b34591465 old-vcpu=0 new-vcpu=8 exe="/usr/sbin/libvirtd" hostname=? addr=? terminal=? res=success' type=VIRT_CONTROL msg=audit(1559565132.543:29434): pid=81775 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:virtd_t:s0-s0:c0.c1023 msg='virt=kvm op=start reason=booted vm="avocado-vt-vm1" uuid=2a946185-d7ad-4cc0-be1b-642b34591465 vm-pid=-1 exe="/usr/sbin/libvirtd" hostname=? addr=? terminal=? res=failed' Indeed, after running 'setenforce 0' the guest can be started successfully, so it would look like selinux-policy might need to be amended so that the above is permitted. I'll report a separate bug for that. Incidentally, the machine I was testing this on originally had SELinux in permissive mode, and that's why I could not reproduce the issue there. Patch posted upstream. https://www.redhat.com/archives/libvir-list/2019-June/msg00054.html Fix merged upstream. commit b34fb1fb6f99628932ad68db1ce4985a06def17f Author: Andrea Bolognani <abologna> Date: Mon Jun 3 18:45:27 2019 +0200 util: Propagate numad failures correctly Right now, if numad fails, we raise an error but return an empty string to the caller instead of a NULL pointer, which means processing will continue and the user will see # virsh start guest error: Failed to start domain guest error: invalid argument: Failed to parse bitmap '' instead of a more reasonable # virsh start guest error: Failed to start domain guest error: operation failed: Failed to query numad for the advisory nodeset Make sure the user gets a better error message. https://bugzilla.redhat.com/show_bug.cgi?id=1716387 Signed-off-by: Andrea Bolognani <abologna> Reviewed-by: Daniel Henrique Barboza <danielhb413> Reviewed-by: Ján Tomko <jtomko> v5.4.0-46-gb34fb1fb6f Reproduce: env: # rpm -q libvirt libvirt-4.5.0-18.el7.ppc64le step: 1. Edit the guest xml with "<numatune><memory mode="strict" placement="auto" /></numatune>" 2. Try to start the guest result: report the following error message: error: Failed to start domain avocado-vt-vm1 error: invalid argument: Failed to parse bitmap '' Verify: env: # rpm -q libvirt libvirt-4.5.0-19.el7.ppc64le step: 1. Edit the guest xml with "<numatune><memory mode="strict" placement="auto" /></numatune>" 2. Try to start the guest result: report the following error message: error: Failed to start domain avocado-vt-vm1 error: cannot set CPU affinity on process 156584: Invalid argument In summary, this problem has been fixed. (In reply to Junxiang Li from comment #7) > Reproduce: > env: > # rpm -q libvirt > libvirt-4.5.0-18.el7.ppc64le > step: > 1. Edit the guest xml with "<numatune><memory mode="strict" > placement="auto" /></numatune>" > 2. Try to start the guest > result: > report the following error message: > error: Failed to start domain avocado-vt-vm1 > error: invalid argument: Failed to parse bitmap '' > > Verify: > env: > # rpm -q libvirt > libvirt-4.5.0-19.el7.ppc64le > step: > 1. Edit the guest xml with "<numatune><memory mode="strict" > placement="auto" /></numatune>" > 2. Try to start the guest > result: > report the following error message: > error: Failed to start domain avocado-vt-vm1 > error: cannot set CPU affinity on process 156584: Invalid argument > > In summary, this problem has been fixed. This is not exactly the outcome I expected. Did you update libvirt only between the two test runs? As per the commit message included in Comment 4, libvirt should now report error: operation failed: Failed to query numad for the advisory nodeset whereas the error message you have above would indicate that numad managed to run successfully the second time, and you're now hitting Bug 1703661 (which makes sense, since that bug is fixed by the very next libvirt build). Can you confirm that you didn't update packages other than libvirt between the two test runs, and that SELinux was in enforce mode in both cases? It would also be nice to get libvirtd logs such as the one included in Comment 2. I reprovisioned a machine with RHEL-7.7-20190403.0 and installed the known problematic libvirt-4.5.0-15.el7, as described in Comment 0 and also as I remember doing to reproduce the issue before writing Comment 2, but I can't get numad to fail despite having SELinux in enforcing mode, so I'm just hitting Bug 1703661 as well. I'm kind of out of ideas at the moment, so I did something nasty to force the bug to reproduce: # cat >/usr/bin/numad <<EOF > #!/bin/sh > exit 1 > EOF # restorecon /usr/bin/numad Sure enough, the bug reproduces after that: # virsh start guest error: Failed to start domain guest error: invalid argument: Failed to parse bitmap '' and goes away after upgrading to libvirt-4.5.0-20.el7.ppc64le: # virsh start guest error: Failed to start domain guest error: operation failed: Failed to query numad for the advisory nodeset That's good enough to mark the bug as VERIFIED in my book. It's weird that we can't seem to make the original, SELinux-related failure show up again, but at this point chasing that specific ghost doesn't seem like the best use of anyone's time. Due to the numad problem could not reproduce stable, so use the Comment 9's way to reproduce and verify it again. Prepare: # cat >/usr/bin/numad <<EOF > #!/bin/sh > exit 1 > EOF # restorecon /usr/bin/numad Reproduce: env: # rpm -q libvirt libvirt-4.5.0-18.el7.ppc64le step: 1. Edit the guest xml with "<numatune><memory mode="strict" placement="auto" /></numatune>" 2. Try to start the guest result: report the following error message: error: Failed to start domain test2 error: invalid argument: Failed to parse bitmap '' Verify: env: # rpm -q libvirt libvirt-4.5.0-19.el7.ppc64le step: 1. Edit the guest xml with "<numatune><memory mode="strict" placement="auto" /></numatune>" 2. Try to start the guest result: report the following error message: error: Failed to start domain test2 error: operation failed: Failed to query numad for the advisory nodeset Reproduce again: env(downgrade from 4.5.0-19): # rpm -q libvirt libvirt-4.5.0-18.el7.ppc64le step: 1. Edit the guest xml with "<numatune><memory mode="strict" placement="auto" /></numatune>" 2. Try to start the guest result: report the following error message: error: Failed to start domain test2 error: invalid argument: Failed to parse bitmap '' The bug is verified. Also reproduced this bug on libvirt-4.5.0-18.el7.x86_64. Version: libvirt-4.5.0-18.el7.x86_64 kernel-3.10.0-1057.el7.x86_64 qemu-kvm-rhev-2.12.0-33.el7.x86_64 Steps: # cat /usr/bin/numad #!/bin/sh exit 1 # restorecon /usr/bin/numad # virsh domstate avocado-vt-vm1 shut off # virsh dumpxml avocado-vt-vm1 --inactive |grep "<numatune" -A2 <numatune> <memory mode='strict' placement='auto'/> </numatune> # virsh start avocado-vt-vm1 error: Failed to start domain avocado-vt-vm1 error: invalid argument: Failed to parse bitmap '' Also verified this bug on libvirt-4.5.0-19.el7.x86_64 Version: libvirt-4.5.0-19.el7.x86_64 kernel-3.10.0-1057.el7.x86_64 qemu-kvm-rhev-2.12.0-33.el7.x86_64 Steps: # virsh domstate avocado-vt-vm1 shut off # virsh dumpxml avocado-vt-vm1 --inactive |grep "<numatune" -A2 <numatune> <memory mode='strict' placement='auto'/> </numatune> # virsh start avocado-vt-vm1 error: Failed to start domain avocado-vt-vm1 error: operation failed: Failed to query numad for the advisory nodeset # yum downgrade libvirt* -y # systemctl restart libvirtd # rpm -qa libvirt libvirt-4.5.0-18.el7.x86_64 # virsh start avocado-vt-vm1 error: Failed to start domain avocado-vt-vm1 error: invalid argument: Failed to parse bitmap '' Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2019:2294 |