Created attachment 1743383 [details] domain.xml Description of problem: I have problems with getting memory stats via https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainMemoryStats when setting the ballooning model to `virtio-transitional` or `virtio-non-transitional`. Only `virtio` seems to work. QEMU commandline: > {"component":"virt-launcher","level":"info","msg":"LC_ALL=C \\PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \\HOME=/var/lib/libvirt/qemu/domain-1-default_vmi-fedora \\XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-1-default_vmi-fedora/.local/share \\XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-1-default_vmi-fedora/.cache \\XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-1-default_vmi-fedora/.config \\QEMU_AUDIO_DRV=none \\/usr/libexec/qemu-kvm \\-name guest=default_vmi-fedora,debug-threads=on \\-S \\-object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-default_vmi-fedora/master-key.aes \\-machine pc-q35-rhel8.3.0,accel=kvm,usb=off,dump-guest-core=off \\-cpu Skylake-Client-IBRS,ss=on,vmx=on,hypervisor=on,tsc-adjust=on,clflushopt=on,umip=on,md-clear=on,stibp=on,arch-capabilities=on,ssbd=on,xsaves=on,pdpe1gb=on,ibpb=on,amd-stibp=on,amd-ssbd=on,skip-l1dfl-vmentry=on,pschange-mc-no=on \\-m 977 \\-overcommit mem-lock=off \\-smp 1,sockets=1,dies=1,cores=1,threads=1 \\-object iothread,id=iothread1 \\-uuid 01847eb3-5f3b-4c22-95de-c70369fdeafb \\-smbios type=1,manufacturer=KubeVirt,product=None,uuid=01847eb3-5f3b-4c22-95de-c70369fdeafb,family=KubeVirt \\-no-user-config \\-nodefaults \\-chardev socket,id=charmonitor,fd=18,server,nowait \\-mon chardev=charmonitor,id=monitor,mode=control \\-rtc base=utc \\-no-shutdown \\-boot strict=on \\-device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 \\-device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 \\-device pcie-root-port,port=0x12,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 \\-device pcie-root-port,port=0x13,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 \\-device pcie-root-port,port=0x14,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4 \\-device pcie-root-port,port=0x15,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x5 \\-device pcie-root-port,port=0x16,chassis=7,id=pci.7,bus=pcie.0,addr=0x2.0x6 \\-device pcie-root-port,port=0x17,chassis=8,id=pci.8,bus=pcie.0,addr=0x2.0x7 \\-device virtio-scsi-pci,id=scsi0,bus=pci.2,addr=0x0 \\-device virtio-serial-pci-non-transitional,id=virtio-serial0,bus=pci.3,addr=0x0 \\-blockdev '{\"driver\":\"file\",\"filename\":\"/var/run/kubevirt/container-disks/disk_0.img\",\"node-name\":\"libvirt-3-storage\",\"cache\":{\"direct\":true,\"no-flush\":false},\"auto-read-only\":true,\"discard\":\"unmap\"}' \\-blockdev '{\"node-name\":\"libvirt-3-format\",\"read-only\":true,\"cache\":{\"direct\":true,\"no-flush\":false},\"driver\":\"qcow2\",\"file\":\"libvirt-3-storage\",\"backing\":null}' \\-blockdev '{\"driver\":\"file\",\"filename\":\"/var/run/kubevirt-ephemeral-disks/disk-data/containerdisk/disk.qcow2\",\"node-name\":\"libvirt-2-storage\",\"cache\":{\"direct\":true,\"no-flush\":false},\"auto-read-only\":true,\"discard\":\"unmap\"}' \\-blockdev '{\"node-name\":\"libvirt-2-format\",\"read-only\":false,\"cache\":{\"direct\":true,\"no-flush\":false},\"driver\":\"qcow2\",\"file\":\"libvirt-2-storage\",\"backing\":\"libvirt-3-format\"}' \\-device virtio-blk-pci-non-transitional,bus=pci.4,addr=0x0,drive=libvirt-2-format,id=ua-containerdisk,bootindex=1,write-cache=on \\-blockdev '{\"driver\":\"file\",\"filename\":\"/var/run/kubevirt-ephemeral-disks/cloud-init-data/default/vmi-fedora/noCloud.iso\",\"node-name\":\"libvirt-1-storage\",\"cache\":{\"direct\":true,\"no-flush\":false},\"auto-read-only\":true,\"discard\":\"unmap\"}' \\-blockdev '{\"node-name\":\"libvirt-1-format\",\"read-only\":false,\"cache\":{\"direct\":true,\"no-flush\":false},\"driver\":\"raw\",\"file\":\"libvirt-1-storage\"}' \\-device virtio-blk-pci-non-transitional,bus=pci.5,addr=0x0,drive=libvirt-1-format,id=ua-cloudinitdisk,write-cache=on \\-netdev tap,fd=20,id=hostua-default,vhost=on,vhostfd=21 \\-device virtio-net-pci-non-transitional,host_mtu=1440,netdev=hostua-default,id=ua-default,mac=1a:a3:3d:b3:65:0a,bus=pci.1,addr=0x0,romfile= \\-chardev socket,id=charserial0,fd=22,server,nowait \\-device isa-serial,chardev=charserial0,id=serial0 \\-chardev socket,id=charchannel0,fd=23,server,nowait \\-device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 \\-vnc vnc=unix:/var/run/kubevirt-private/ce5be773-07e1-41b8-874f-6b6218b2859c/virt-vnc \\-device VGA,id=video0,vgamem_mb=16,bus=pcie.0,addr=0x1 \\-device virtio-balloon-pci-non-transitional,id=balloon0,bus=pci.6,addr=0x0 \\-object rng-random,id=objrng0,filename=/dev/urandom \\-device virtio-rng-pci-non-transitional,rng=objrng0,id=rng0,bus=pci.7,addr=0x0 \\-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \\-msg timestamp=on","subcomponent":"qemu","timestamp":"2020-12-31T08:38:25.442441Z"} The failure log line: > {"component":"virt-launcher","level":"error","msg":"internal error: Cannot determine balloon device path","pos":"qemuMonitorInitBalloonObjectPath:1022","subcomponent":"libvirt","thread":"32","timestamp":"2020-12-31T08:40:38.917000Z"} Version-Release number of selected component (if applicable): libvirt version: 6.6.0 qemu version: qemu-kvm-5.1.0 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Reproduced on: libvirt-6.6.0-11.module+el8.3.1+9196+74a80ca4.x86_64 qemu-kvm-5.1.0-17.module+el8.3.1+9213+7ace09c3.x86_64 Reproduced Steps: check the memory stats after the guest boot fully 1. With virtio-transitional memballoon. # virsh dumpxml lmn | grep /memballoon -B3 <memballoon model='virtio-transitional'> <alias name='balloon0'/> <address type='pci' domain='0x0000' bus='0x09' slot='0x01' function='0x0'/> </memballoon> # virsh dommemstat lmn actual 1572864 rss 635228 --------Can't get expected stats 2. With virtio-non-transitional memballoon. # virsh dumpxml lmn | grep /memballoon -B3 <memballoon model='virtio-non-transitional'> <alias name='balloon0'/> <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/> </memballoon> # virsh dommemstat lmn actual 1572864 rss 530152 --------Can't get expected stats 3. With virtio memballoon. # virsh dumpxml lmn | grep /memballoon -B3 <memballoon model='virtio'> <alias name='balloon0'/> <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/> </memballoon> # virsh dommemstat lmn actual 1572864 swap_in 0 swap_out 0 major_fault 269 minor_fault 138820 unused 1186396 available 1344380 usable 1145904 last_update 1609903086 disk_caches 61452 hugetlb_pgalloc 0 hugetlb_pgfail 0 rss 523996 -------Can get expected stats
I am working around this by sticking with `virtio`. Giving it priority high since I don't know what the default is on q35 if I simply choose `virtio`. The priority can be lowered if `virtio` defaults to `virtio-transitional` because then we have no issue on older guests. Feel free to lower the priority if `virtio` defaults to `virtio-transitional` in this case and let me know.
Andrea, can you please investigate what is the behavior behind this model and reply to comment 3? Adding Cole to CC, as he made the original changes I believe. Thanks.
(In reply to Jaroslav Suchanek from comment #4) > Andrea, can you please investigate what is the behavior behind this model > and reply to comment 3? Adding Cole to CC, as he made the original changes I > believe. Thanks. I'll dig further tomorrow, but the issue seems to be that in qemuMonitorInitBalloonObjectPath() we look for the memballoon based on two pieces of data: its alias and its type, where the latter is expected to be either virtio-balloon-pci or virtio-balloon-ccw based on the device's address type. https://gitlab.com/libvirt/libvirt/-/blob/master/src/qemu/qemu_monitor.c#L990 However, the (non-)transitional devices have different QOM types: # <memballoon model='virtio'> $ virsh qemu-monitor-command test --hmp qom-list /machine/peripheral/ | grep balloon balloon0 (child<virtio-balloon-pci>) # <memballoon model='virtio-non-transitional'> $ virsh qemu-monitor-command test --hmp qom-list /machine/peripheral/ | grep balloon balloon0 (child<virtio-balloon-pci-non-transitional>) Since libvirt always expects the type to be virtio-balloon-pci, it can't find the memballoon when (non)-transitional devices are used. (In reply to Roman Mohr from comment #3) > I am working around this by sticking with `virtio`. > > Giving it priority high since I don't know what the default is on q35 if I > simply choose `virtio`. > The priority can be lowered if `virtio` defaults to `virtio-transitional` > because then we have no issue on older guests. > Feel free to lower the priority if `virtio` defaults to > `virtio-transitional` in this case and let me know. As you can see above, virtio is just virtio :) It's not an alias for either one of the other options, which QEMU considers to be completely separate devices - though obviously they share most of the code. When leaving PCI address assignment to libvirt, on a q35 machine type the memballoon will be placed behind a pcie-root-port and so it will behave like the non-transitional device. This is consistent with all other virtio devices.
Patch posted upstream. https://www.redhat.com/archives/libvir-list/2021-January/msg00621.html
Fix pushed upstream. commit 0a6cb05e953d315b7a05103d707cff4d36221211 Author: Andrea Bolognani <abologna> Date: Tue Jan 12 17:17:44 2021 +0100 qemu: Fix memstat for (non-)transitional memballoon Depending on the memballoon model, the corresponding QOM node will have a different type and we need to account for this when searching for it in the QOM tree. https://bugzilla.redhat.com/show_bug.cgi?id=1911786 Signed-off-by: Andrea Bolognani <abologna> Reviewed-by: Daniel Henrique Barboza <danielhb413> Reviewed-by: Michal Privoznik <mprivozn> v7.0.0-rc2-2-g0a6cb05e95
Thanks Andrea. As discussed on github, it would be great to get a backport to 8.3, so that we can consume it in CNV 2.6 which is based on 8.3.
Verified Version: libvirt-7.0.0-2.module+el8.4.0+9520+ef609c5f.x86_64 qemu-kvm-5.2.0-3.module+el8.4.0+9499+42e58f08.x86_64 Verified Steps: 1. Start a guest with virtio-non-transitional memballoon device. # virsh dumpxml lmn | grep /memballoon -B3 <memballoon model='virtio-non-transitional'> <alias name='balloon0'/> <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/> </memballoon> # virsh start lmn Domain 'lmn' started 2. Check memory statistic. # virsh dommemstat lmn actual 1572864 swap_in 0 swap_out 0 major_fault 249 minor_fault 134381 unused 1197908 available 1343988 usable 1156248 last_update 1611553595 disk_caches 59292 hugetlb_pgalloc 0 hugetlb_pgfail 0 rss 604824 3. Start a guest with virtio-transitional membaloon device. # virsh dumpxml lmn | grep /memballoon -B3 <memballoon model='virtio-transitional'> <alias name='balloon0'/> <address type='pci' domain='0x0000' bus='0x09' slot='0x01' function='0x0'/> </memballoon> # virsh start lmn Domain 'lmn' started 4. Check memory statistic. # virsh dommemstat lmn actual 1572864 swap_in 0 swap_out 0 major_fault 248 minor_fault 135906 unused 1199488 available 1344332 usable 1157784 last_update 1611553887 disk_caches 58996 hugetlb_pgalloc 0 hugetlb_pgfail 0 rss 608008 The test results are expected.
(In reply to Roman Mohr from comment #12) > Thanks Andrea. As discussed on github, it would be great to get a backport > to 8.3, so that we can consume it in CNV 2.6 which is based on 8.3. This has been handled with Bug 1918364, but on second thought there's something that I should confirm: will AV 8.3.1 work for CNV? Or does the fix need to be backported to AV 8.3.0.z before you can consume it?
(In reply to Andrea Bolognani from comment #14) > (In reply to Roman Mohr from comment #12) > > Thanks Andrea. As discussed on github, it would be great to get a backport > > to 8.3, so that we can consume it in CNV 2.6 which is based on 8.3. > > This has been handled with Bug 1918364, but on second thought there's > something that I should confirm: will AV 8.3.1 work for CNV? Or does > the fix need to be backported to AV 8.3.0.z before you can consume it? Created https://bugzilla.redhat.com/show_bug.cgi?id=1927853 now to track this separately on the CNV level. Dan, without this, el6 guests work properly but some memory stats are not reported since the guest does not recognize the ballooning device. I am not sure how important you judge this. CNV 2.6, 2.6.1 or 2.7?
(In reply to Roman Mohr from comment #15) > (In reply to Andrea Bolognani from comment #14) > > (In reply to Roman Mohr from comment #12) > > > Thanks Andrea. As discussed on github, it would be great to get a backport > > > to 8.3, so that we can consume it in CNV 2.6 which is based on 8.3. > > > > This has been handled with Bug 1918364, but on second thought there's > > something that I should confirm: will AV 8.3.1 work for CNV? Or does > > the fix need to be backported to AV 8.3.0.z before you can consume it? @phoracek had the very same question, I hope he has an answer by now. > > Created https://bugzilla.redhat.com/show_bug.cgi?id=1927853 now to track > this separately on the CNV level. > > Dan, without this, el6 guests work properly but some memory stats are not > reported since the guest does not recognize the ballooning device. > I am not sure how important you judge this. CNV 2.6, 2.6.1 or 2.7? It is clearly not a 2.6.0 blocker. Depending on the work and risk for upstream, I trust your decision whether this should go to 2.6.1 or to 4.8.0 (there's no CNV-2.7, btw, we renumber to match OCP)
(In reply to Roman Mohr from comment #15) > (In reply to Andrea Bolognani from comment #14) > > (In reply to Roman Mohr from comment #12) > > > Thanks Andrea. As discussed on github, it would be great to get a backport > > > to 8.3, so that we can consume it in CNV 2.6 which is based on 8.3. > > > > This has been handled with Bug 1918364, but on second thought there's > > something that I should confirm: will AV 8.3.1 work for CNV? Or does > > the fix need to be backported to AV 8.3.0.z before you can consume it? > > Created https://bugzilla.redhat.com/show_bug.cgi?id=1927853 now to track > this separately on the CNV level. > > Dan, without this, el6 guests work properly but some memory stats are not > reported since the guest does not recognize the ballooning device. > I am not sure how important you judge this. CNV 2.6, 2.6.1 or 2.7? Are you sure memory stats are the only thing that's not going to work when using a non-transitional virtio-memballoon with a RHEL 6 guest? As far as I understand, the entire device will be non-functional, which is not limited to not relying stats but also to the actual ballooning functionality (increasing and decreasing the amount of memory available to the guest OS) not working.
(In reply to Andrea Bolognani from comment #17) > (In reply to Roman Mohr from comment #15) > > (In reply to Andrea Bolognani from comment #14) > > > (In reply to Roman Mohr from comment #12) > > > > Thanks Andrea. As discussed on github, it would be great to get a backport > > > > to 8.3, so that we can consume it in CNV 2.6 which is based on 8.3. > > > > > > This has been handled with Bug 1918364, but on second thought there's > > > something that I should confirm: will AV 8.3.1 work for CNV? Or does > > > the fix need to be backported to AV 8.3.0.z before you can consume it? > > > > Created https://bugzilla.redhat.com/show_bug.cgi?id=1927853 now to track > > this separately on the CNV level. > > > > Dan, without this, el6 guests work properly but some memory stats are not > > reported since the guest does not recognize the ballooning device. > > I am not sure how important you judge this. CNV 2.6, 2.6.1 or 2.7? > > Are you sure memory stats are the only thing that's not going to > work when using a non-transitional virtio-memballoon with a RHEL 6 > guest? > > As far as I understand, the entire device will be non-functional, > which is not limited to not relying stats but also to the actual > ballooning functionality (increasing and decreasing the amount of > memory available to the guest OS) not working. You are absolutely right. It is just that CNV right now only supports reading from the ballooning device at this stage. Therefore only metrics collection is affected.
(In reply to Dan Kenigsberg from comment #16) > (In reply to Roman Mohr from comment #15) > > (In reply to Andrea Bolognani from comment #14) > > > (In reply to Roman Mohr from comment #12) > > > > Thanks Andrea. As discussed on github, it would be great to get a backport > > > > to 8.3, so that we can consume it in CNV 2.6 which is based on 8.3. > > > > > > This has been handled with Bug 1918364, but on second thought there's > > > something that I should confirm: will AV 8.3.1 work for CNV? Or does > > > the fix need to be backported to AV 8.3.0.z before you can consume it? > > @phoracek had the very same question, I hope he has an answer by > now. > > > > > Created https://bugzilla.redhat.com/show_bug.cgi?id=1927853 now to track > > this separately on the CNV level. > > > > Dan, without this, el6 guests work properly but some memory stats are not > > reported since the guest does not recognize the ballooning device. > > I am not sure how important you judge this. CNV 2.6, 2.6.1 or 2.7? > > It is clearly not a 2.6.0 blocker. Depending on the work and risk for > upstream, I trust your decision whether this should go to 2.6.1 or to 4.8.0 > (there's no CNV-2.7, btw, we renumber to match OCP) Ok, I think it there is little risk, but we are pretty full right now. I would say it can wait for 4.8. Thanks.
> > This has been handled with Bug 1918364, but on second thought there's > > something that I should confirm: will AV 8.3.1 work for CNV? Or does > > the fix need to be backported to AV 8.3.0.z before you can consume it? IIUIC 8.3.1 is enough. We always consume the latest AV version in CNV. We stick to 8.Y.0.z only with EUS CNV.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (virt:av bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2098