Bug 1777212
Summary: | internal error for domxml-to-native qemu-argv with vsock | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | smitterl |
Component: | libvirt | Assignee: | Ján Tomko <jtomko> |
libvirt sub component: | General | QA Contact: | Lili Zhu <lizhu> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | low | ||
Priority: | medium | CC: | berrange, chwen, dzheng, jdenemar, jsuchane, jtomko, lmen, rbalakri, smitterl, thuth, virt-maint, xuzhang, yalzhang |
Version: | 9.0 | Keywords: | Automation, Reopened, Triaged |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | libvirt-8.9.0-1.el9 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2023-05-09 07:26:10 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | 8.9.0 |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1776036 |
Description
smitterl
2019-11-27 07:32:21 UTC
This problem does not happen to x86_64. It is s390x only. Run on x86_64, Test packages: kernel-4.18.0-151.el8.x86_64 qemu-kvm-2.12.0-88.module+el8.1.0+4233+bc44be3f.x86_64 libvirt-4.5.0-35.module+el8.1.0+4227+b2722cb3.x86_64 # virsh domxml-to-native qemu-argv --domain vm2 LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name guest=vm2,debug-threads=on -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain--1-vm2/master-key.aes -machine pc-q35-rhel7.6.0,accel=kvm,usb=off,dump-guest-core=off -m 1024 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 3da05f15-6445-4249-923b-79b2042c5c8f -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain--1-vm2/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 -device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 -device pcie-root-port,port=0x12,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 -device pcie-root-port,port=0x13,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 -device pcie-root-port,port=0x14,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4 -device qemu-xhci,id=usb,bus=pci.1,addr=0x0 -drive file=/tmp/RHEL-8.2-x86_64-latest.qcow2,format=qcow2,if=none,id=drive-virtio-disk0,cache=none -device virtio-blk-pci,scsi=off,bus=pci.2,addr=0x0,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1,write-cache=on -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:0 -device cirrus-vga,id=video0,bus=pcie.0,addr=0x1 -device virtio-balloon-pci,id=balloon0,bus=pci.4,addr=0x0 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on Can you provide the XML of the guest you are feeding to the command as there's likely some specific feature in the XML that's tripping it up. smitterl is on PTO this week. I copied the guest xml from his job. <domain type="kvm"> <name>{name}</name> <memory unit="KiB">1048576</memory> <currentMemory unit="KiB">1048576</currentMemory> <vcpu placement="static">2</vcpu> <resource> <partition>/machine</partition> </resource> <os> <type arch="s390x" machine="s390-ccw-virtio">hvm</type> <boot dev="hd"/> </os> <clock offset="utc"/> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>destroy</on_crash> <devices> <emulator>/usr/libexec/qemu-kvm</emulator> <disk type="file" device="disk"> <driver name="qemu" type="qcow2"/> <source file="{image_path}"/> <backingStore/> <target dev="vda" bus="virtio"/> </disk> <controller type="scsi" index="0" model="virtio-scsi"> </controller> <controller type="virtio-serial" index="0"> </controller> <interface type="bridge"> <source bridge="virbr0"/> <target dev="vnet0"/> <model type="virtio"/> </interface> <serial type='pty'> </serial> <console type="pty" tty="/dev/pts/0"> <source path="/dev/pts/0"/> <target type="virtio" port="0"/> </console> <input type='keyboard' bus='virtio'/> <input type='mouse' bus='virtio'/> <graphics type='vnc' port='-1' autoport='yes'> <listen type='address'/> </graphics> <video> <model type='virtio' heads='1' primary='yes'/> </video> <vsock model='virtio'> <cid auto='yes'/> <address type='ccw'/> </vsock> <memballoon model="virtio"> </memballoon> <panic model="s390"/> </devices> <seclabel type="dynamic" model="selinux" relabel="yes"> <label>system_u:system_r:svirt_t:s0:c508,c708</label> <imagelabel>system_u:object_r:svirt_image_t:s0:c508,c708</imagelabel> </seclabel> <seclabel type="dynamic" model="dac" relabel="yes"> <label>+107:+107</label> <imagelabel>+107:+107</imagelabel> </seclabel> </domain> Thank you @dzheng @berrange, Culprit seems to be //devices/vsocket; when I remove it the command succeeds. Original domain xml to reproduce issue: <domain type='kvm'> <name>avocado-vt-vm1</name> <uuid>178d1bb1-4103-45a4-9926-410ee1c4bb5b</uuid> <memory unit='KiB'>1048576</memory> <currentMemory unit='KiB'>1048576</currentMemory> <vcpu placement='static'>2</vcpu> <resource> <partition>/machine</partition> </resource> <os> <type arch='s390x' machine='s390-ccw-virtio-rhel7.6.0'>hvm</type> <boot dev='hd'/> </os> <clock offset='utc'/> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>destroy</on_crash> <devices> <emulator>/usr/libexec/qemu-kvm</emulator> <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/var/lib/avocado/data/avocado-vt/images/jeos-27-s390x.qcow2'/> <target dev='vda' bus='virtio'/> <address type='ccw' cssid='0xfe' ssid='0x0' devno='0x0000'/> </disk> <controller type='scsi' index='0' model='virtio-scsi'> <address type='ccw' cssid='0xfe' ssid='0x0' devno='0x0003'/> </controller> <controller type='virtio-serial' index='0'> <address type='ccw' cssid='0xfe' ssid='0x0' devno='0x0004'/> </controller> <controller type='pci' index='0' model='pci-root'/> <interface type='bridge'> <mac address='52:54:00:71:f6:a0'/> <source bridge='virbr0'/> <model type='virtio'/> <address type='ccw' cssid='0xfe' ssid='0x0' devno='0x0001'/> </interface> <serial type='pty'> <target type='sclp-serial' port='0'> <model name='sclpconsole'/> </target> </serial> <console type='pty'> <target type='serial' port='0'/> </console> <console type='pty'> <target type='virtio' port='1'/> </console> <input type='keyboard' bus='virtio'> <address type='ccw' cssid='0xfe' ssid='0x0' devno='0x0005'/> </input> <input type='mouse' bus='virtio'> <address type='ccw' cssid='0xfe' ssid='0x0' devno='0x0006'/> </input> <graphics type='vnc' port='-1' autoport='yes'> <listen type='address'/> </graphics> <video> <model type='virtio' heads='1' primary='yes'/> <address type='ccw' cssid='0xfe' ssid='0x0' devno='0x0002'/> </video> <memballoon model='virtio'> <address type='ccw' cssid='0xfe' ssid='0x0' devno='0x0007'/> </memballoon> <panic model='s390'/> <vsock model='virtio'> <cid auto='yes'/> <address type='ccw' cssid='0xfe' ssid='0x0' devno='0x0008'/> </vsock> </devices> <seclabel type='dynamic' model='selinux' relabel='yes'/> <seclabel type='dynamic' model='dac' relabel='yes'/> </domain> For vsock, libvirt pre-opens the vsock file descriptor on domain startup and passes it to QEMU. While QEMU is capable of opening the fd on its own (this might be racy for the <cid auto='yes'/> case), this possibility is not implementing in libvirt. I'm not convinced implementing it is worthwhile, but this case definitely deserves a better error message. I can reproduce the problem on x86, too, by simply adding a "<vsock model='virtio'/>" to my guest definition ==> Changing "Hardware" to "All". (In reply to Ján Tomko from comment #6) > For vsock, libvirt pre-opens the vsock file descriptor on domain startup and > passes it to QEMU. > > While QEMU is capable of opening the fd on its own (this might be racy for > the <cid auto='yes'/> case), > this possibility is not implementing in libvirt. > > I'm not convinced implementing it is worthwhile, but this case definitely > deserves a better error message. If this isn't fixed I believe we also should update the manpage for it. What do you think? Are there other devices that are likely to be incompatible with the `virsh domxml-to-native` call? tp-libvirt would need an update to remove any incompatible devices before calling the command in https://github.com/autotest/tp-libvirt/blob/master/libvirt/tests/src/virsh_cmd/domain/virsh_domxml_to_native.py (In reply to smitterl from comment #10) > If this isn't fixed I believe we also should update the manpage for it. What > do you think? I can't say much about the details of libvirt, but IMHO there should definitely be a better error message in this case. Thus I'm bumping the approaching stale date, hoping that Ján could comment on how this could be improved on the upstream libvirt side... (In reply to Thomas Huth from comment #11) > (In reply to smitterl from comment #10) > > If this isn't fixed I believe we also should update the manpage for it. What > > do you think? > > I can't say much about the details of libvirt, but IMHO there should > definitely be a better error message in this case. Thus I'm bumping the > approaching stale date, hoping that > Ján could comment on how this could be improved on the upstream libvirt > side... Thanks Thomas! Regrading the manpage, it currently only says "domxml-to-native format { [--xml] xml | --domain domain-name-or-id-or-uuid } Convert the file xml into domain XML format or convert an existing --domain to the native guest configuration format named by format. The xml and --domain arguments are mutually exclusive. For the types of format argument, refer to domxml-from-native. " I think a new paragraph like the following makes sense: " This feature doesn't support all domain definitions. The following definitions are not supported: <vsock>. " Or - if the error message is improved - a more generic: " This feature doesn't support all domain definitions. If you run into issues, please try removing elements from the definition and run again. " But then, what if I really was interested in the very call that's made to QEMU/KVM for the <vsock> device? Would it make more sense to add a "This feature is experimental and doesn't support all domain definitions." to set user expectations? I've sent a proposal for a slightly better error message for the vsock case, as well as a disclaimer in the API and virsh command for domxml-to-native: https://listman.redhat.com/archives/libvir-list/2021-July/msg00803.html I suspect that especially for network devices, the usefulness of domxml-to-native is limited. Keeping the list of unsupported devices in the man page seems like a sure recipe to give users stale information. After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. reopen it since the patch is ready. Pushed as: commit 0b1da01ef2fb0f00b8df5967e14c3dd10396680d Author: Ján Tomko <jtomko> CommitDate: 2022-10-24 15:36:33 +0200 qemu: do not attempt to pass unopened vsock FD git describe: v8.8.0-178-g0b1da01ef2 Test this bug with: libvirt-8.9.0-2.el9.x86_64 1. prepare a guest with the following xml snippet <vsock model='virtio'> <cid auto='yes' address='3'/> <alias name='vsock0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/> </vsock> 2. Convert domain xml to the native guest configuration # virsh domxml-to-native qemu-argv --domain avocado-vt-vm1 LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin HOME=/var/lib/libvirt/qemu/domain--1-avocado-vt-vm1 XDG_DATA_HOME=/var/lib/libvirt/qemu/domain--1-avocado-vt-vm1/.local/share XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain--1-avocado-vt- ... device '{"driver":"vhost-vsock-pci","id":"vsock0","guest-cid":3,"vhostfd":"-1","bus":"pci.0","addr":"0x9"}' -msg timestamp=on The qemu cmdline is displayed This issue existed RHEL 8.8(libvirt version :8.0.0) as well. The snippet of output follow as: (.libvirt-ci-venv-ci-runtest-m5nsn3) [root@s390x-kvm-virtqez1 ~]# virsh domxml-to-native qemu-argv --domain avocado-vt-vm1 error: internal error: invalid use of command API The VM: avocado-vt-vm1 has below snippet of xml: .libvirt-ci-venv-ci-runtest-m5nsn3) [root@s390x-kvm-virtqez1 ~]# virsh dumpxml avocado-vt-vm1 |grep -A10 vsock <vsock model='virtio'> <cid auto='yes'/> <address type='ccw' cssid='0xfe' ssid='0x0' devno='0x0008'/> </vsock> </devices> <seclabel type='dynamic' model='selinux' relabel='yes'/> <seclabel type='dynamic' model='dac' relabel='yes'/> </domain> Verify this bug with: libvirt-8.9.0-2.el9.x86_64 The testing steps are the same with those in #Comment 20, mark this bug as verified. (In reply to chunfu wen from comment #23) > This issue existed RHEL 8.8(libvirt version :8.0.0) as well. > The snippet of output follow as: > (.libvirt-ci-venv-ci-runtest-m5nsn3) [root@s390x-kvm-virtqez1 ~]# virsh > domxml-to-native qemu-argv --domain avocado-vt-vm1 > error: internal error: invalid use of command API Hi, Jan Do we need to backport this patch to libvirt-8.0? Thanks Hello, Lili I don't think it's important enough to be backported Confirmed it works on s390x, too. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (libvirt bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:2171 |