Bug 1911786 - Can't connect to ballooning device when using virtio-transitional or virtio-non-transitional
Summary: Can't connect to ballooning device when using virtio-transitional or virtio-n...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: libvirt
Version: 8.3
Hardware: Unspecified
OS: Unspecified
high
unspecified
Target Milestone: rc
: 8.4
Assignee: Andrea Bolognani
QA Contact: Meina Li
URL:
Whiteboard:
Depends On:
Blocks: 1918364
TreeView+ depends on / blocked
 
Reported: 2020-12-31 08:53 UTC by Roman Mohr
Modified: 2021-05-25 06:46 UTC (History)
11 users (show)

Fixed In Version: libvirt-7.0.0-1.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1918364 (view as bug list)
Environment:
Last Closed: 2021-05-25 06:46:31 UTC
Type: Bug
Target Upstream Version: 7.0.0
Embargoed:


Attachments (Terms of Use)
domain.xml (8.34 KB, text/plain)
2020-12-31 08:53 UTC, Roman Mohr
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1911662 0 unspecified CLOSED el6 guests don't work properly if virtio bus is specified on various devices 2021-08-18 09:26:23 UTC

Internal Links: 1927853

Description Roman Mohr 2020-12-31 08:53:05 UTC
Created attachment 1743383 [details]
domain.xml

Description of problem:

I have problems with getting memory stats via https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainMemoryStats when setting the ballooning model to `virtio-transitional` or `virtio-non-transitional`. Only `virtio` seems to work.


QEMU commandline: 

> {"component":"virt-launcher","level":"info","msg":"LC_ALL=C \\PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \\HOME=/var/lib/libvirt/qemu/domain-1-default_vmi-fedora \\XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-1-default_vmi-fedora/.local/share \\XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-1-default_vmi-fedora/.cache \\XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-1-default_vmi-fedora/.config \\QEMU_AUDIO_DRV=none \\/usr/libexec/qemu-kvm \\-name guest=default_vmi-fedora,debug-threads=on \\-S \\-object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-default_vmi-fedora/master-key.aes \\-machine pc-q35-rhel8.3.0,accel=kvm,usb=off,dump-guest-core=off \\-cpu Skylake-Client-IBRS,ss=on,vmx=on,hypervisor=on,tsc-adjust=on,clflushopt=on,umip=on,md-clear=on,stibp=on,arch-capabilities=on,ssbd=on,xsaves=on,pdpe1gb=on,ibpb=on,amd-stibp=on,amd-ssbd=on,skip-l1dfl-vmentry=on,pschange-mc-no=on \\-m 977 \\-overcommit mem-lock=off \\-smp 1,sockets=1,dies=1,cores=1,threads=1 \\-object iothread,id=iothread1 \\-uuid 01847eb3-5f3b-4c22-95de-c70369fdeafb \\-smbios type=1,manufacturer=KubeVirt,product=None,uuid=01847eb3-5f3b-4c22-95de-c70369fdeafb,family=KubeVirt \\-no-user-config \\-nodefaults \\-chardev socket,id=charmonitor,fd=18,server,nowait \\-mon chardev=charmonitor,id=monitor,mode=control \\-rtc base=utc \\-no-shutdown \\-boot strict=on \\-device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 \\-device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 \\-device pcie-root-port,port=0x12,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 \\-device pcie-root-port,port=0x13,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 \\-device pcie-root-port,port=0x14,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4 \\-device pcie-root-port,port=0x15,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x5 \\-device pcie-root-port,port=0x16,chassis=7,id=pci.7,bus=pcie.0,addr=0x2.0x6 \\-device pcie-root-port,port=0x17,chassis=8,id=pci.8,bus=pcie.0,addr=0x2.0x7 \\-device virtio-scsi-pci,id=scsi0,bus=pci.2,addr=0x0 \\-device virtio-serial-pci-non-transitional,id=virtio-serial0,bus=pci.3,addr=0x0 \\-blockdev '{\"driver\":\"file\",\"filename\":\"/var/run/kubevirt/container-disks/disk_0.img\",\"node-name\":\"libvirt-3-storage\",\"cache\":{\"direct\":true,\"no-flush\":false},\"auto-read-only\":true,\"discard\":\"unmap\"}' \\-blockdev '{\"node-name\":\"libvirt-3-format\",\"read-only\":true,\"cache\":{\"direct\":true,\"no-flush\":false},\"driver\":\"qcow2\",\"file\":\"libvirt-3-storage\",\"backing\":null}' \\-blockdev '{\"driver\":\"file\",\"filename\":\"/var/run/kubevirt-ephemeral-disks/disk-data/containerdisk/disk.qcow2\",\"node-name\":\"libvirt-2-storage\",\"cache\":{\"direct\":true,\"no-flush\":false},\"auto-read-only\":true,\"discard\":\"unmap\"}' \\-blockdev '{\"node-name\":\"libvirt-2-format\",\"read-only\":false,\"cache\":{\"direct\":true,\"no-flush\":false},\"driver\":\"qcow2\",\"file\":\"libvirt-2-storage\",\"backing\":\"libvirt-3-format\"}' \\-device virtio-blk-pci-non-transitional,bus=pci.4,addr=0x0,drive=libvirt-2-format,id=ua-containerdisk,bootindex=1,write-cache=on \\-blockdev '{\"driver\":\"file\",\"filename\":\"/var/run/kubevirt-ephemeral-disks/cloud-init-data/default/vmi-fedora/noCloud.iso\",\"node-name\":\"libvirt-1-storage\",\"cache\":{\"direct\":true,\"no-flush\":false},\"auto-read-only\":true,\"discard\":\"unmap\"}' \\-blockdev '{\"node-name\":\"libvirt-1-format\",\"read-only\":false,\"cache\":{\"direct\":true,\"no-flush\":false},\"driver\":\"raw\",\"file\":\"libvirt-1-storage\"}' \\-device virtio-blk-pci-non-transitional,bus=pci.5,addr=0x0,drive=libvirt-1-format,id=ua-cloudinitdisk,write-cache=on \\-netdev tap,fd=20,id=hostua-default,vhost=on,vhostfd=21 \\-device virtio-net-pci-non-transitional,host_mtu=1440,netdev=hostua-default,id=ua-default,mac=1a:a3:3d:b3:65:0a,bus=pci.1,addr=0x0,romfile= \\-chardev socket,id=charserial0,fd=22,server,nowait \\-device isa-serial,chardev=charserial0,id=serial0 \\-chardev socket,id=charchannel0,fd=23,server,nowait \\-device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 \\-vnc vnc=unix:/var/run/kubevirt-private/ce5be773-07e1-41b8-874f-6b6218b2859c/virt-vnc \\-device VGA,id=video0,vgamem_mb=16,bus=pcie.0,addr=0x1 \\-device virtio-balloon-pci-non-transitional,id=balloon0,bus=pci.6,addr=0x0 \\-object rng-random,id=objrng0,filename=/dev/urandom \\-device virtio-rng-pci-non-transitional,rng=objrng0,id=rng0,bus=pci.7,addr=0x0 \\-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \\-msg timestamp=on","subcomponent":"qemu","timestamp":"2020-12-31T08:38:25.442441Z"}

The failure log line:

> {"component":"virt-launcher","level":"error","msg":"internal error: Cannot determine balloon device path","pos":"qemuMonitorInitBalloonObjectPath:1022","subcomponent":"libvirt","thread":"32","timestamp":"2020-12-31T08:40:38.917000Z"}


Version-Release number of selected component (if applicable):

libvirt version: 6.6.0
qemu version: qemu-kvm-5.1.0


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Meina Li 2021-01-06 03:19:17 UTC
Reproduced on:
libvirt-6.6.0-11.module+el8.3.1+9196+74a80ca4.x86_64
qemu-kvm-5.1.0-17.module+el8.3.1+9213+7ace09c3.x86_64

Reproduced Steps: check the memory stats after the guest boot fully
1. With virtio-transitional memballoon.
# virsh dumpxml lmn | grep /memballoon -B3
    <memballoon model='virtio-transitional'>
      <alias name='balloon0'/>
      <address type='pci' domain='0x0000' bus='0x09' slot='0x01' function='0x0'/>
    </memballoon>
# virsh dommemstat lmn
actual 1572864
rss 635228
--------Can't get expected stats

2. With virtio-non-transitional memballoon.
# virsh dumpxml lmn | grep /memballoon -B3
    <memballoon model='virtio-non-transitional'>
      <alias name='balloon0'/>
      <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
    </memballoon>
# virsh dommemstat lmn
actual 1572864
rss 530152
--------Can't get expected stats

3.  With virtio memballoon.
# virsh dumpxml lmn | grep /memballoon -B3
    <memballoon model='virtio'>
      <alias name='balloon0'/>
      <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
    </memballoon>
# virsh dommemstat lmn
actual 1572864
swap_in 0
swap_out 0
major_fault 269
minor_fault 138820
unused 1186396
available 1344380
usable 1145904
last_update 1609903086
disk_caches 61452
hugetlb_pgalloc 0
hugetlb_pgfail 0
rss 523996
-------Can get expected stats

Comment 3 Roman Mohr 2021-01-07 12:34:43 UTC
I am working around this by sticking with `virtio`.

Giving it priority high since I don't know what the default is on q35 if I simply choose `virtio`.
The priority can be lowered if `virtio` defaults to `virtio-transitional` because then we have no issue on older guests.
Feel free to lower the priority if `virtio` defaults to `virtio-transitional` in this case and let me know.

Comment 4 Jaroslav Suchanek 2021-01-07 16:18:33 UTC
Andrea, can you please investigate what is the behavior behind this model and reply to comment 3? Adding Cole to CC, as he made the original changes I believe. Thanks.

Comment 5 Andrea Bolognani 2021-01-11 18:29:40 UTC
(In reply to Jaroslav Suchanek from comment #4)
> Andrea, can you please investigate what is the behavior behind this model
> and reply to comment 3? Adding Cole to CC, as he made the original changes I
> believe. Thanks.

I'll dig further tomorrow, but the issue seems to be that in
qemuMonitorInitBalloonObjectPath() we look for the memballoon based
on two pieces of data: its alias and its type, where the latter is
expected to be either virtio-balloon-pci or virtio-balloon-ccw based
on the device's address type.

  https://gitlab.com/libvirt/libvirt/-/blob/master/src/qemu/qemu_monitor.c#L990

However, the (non-)transitional devices have different QOM types:

  # <memballoon model='virtio'>
  $ virsh qemu-monitor-command test --hmp qom-list /machine/peripheral/ | grep balloon
  balloon0 (child<virtio-balloon-pci>)

  # <memballoon model='virtio-non-transitional'>
  $ virsh qemu-monitor-command test --hmp qom-list /machine/peripheral/ | grep balloon
  balloon0 (child<virtio-balloon-pci-non-transitional>)

Since libvirt always expects the type to be virtio-balloon-pci, it
can't find the memballoon when (non)-transitional devices are used.

(In reply to Roman Mohr from comment #3)
> I am working around this by sticking with `virtio`.
>
> Giving it priority high since I don't know what the default is on q35 if I
> simply choose `virtio`.
> The priority can be lowered if `virtio` defaults to `virtio-transitional`
> because then we have no issue on older guests.
> Feel free to lower the priority if `virtio` defaults to
> `virtio-transitional` in this case and let me know.

As you can see above, virtio is just virtio :) It's not an alias for
either one of the other options, which QEMU considers to be
completely separate devices - though obviously they share most of the
code.

When leaving PCI address assignment to libvirt, on a q35 machine type
the memballoon will be placed behind a pcie-root-port and so it will
behave like the non-transitional device. This is consistent with all
other virtio devices.

Comment 6 Andrea Bolognani 2021-01-12 17:48:36 UTC
Patch posted upstream.

  https://www.redhat.com/archives/libvir-list/2021-January/msg00621.html

Comment 8 Andrea Bolognani 2021-01-13 14:19:12 UTC
Fix pushed upstream.

  commit 0a6cb05e953d315b7a05103d707cff4d36221211
  Author: Andrea Bolognani <abologna>
  Date:   Tue Jan 12 17:17:44 2021 +0100

    qemu: Fix memstat for (non-)transitional memballoon
    
    Depending on the memballoon model, the corresponding QOM node
    will have a different type and we need to account for this
    when searching for it in the QOM tree.
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1911786
    
    Signed-off-by: Andrea Bolognani <abologna>
    Reviewed-by: Daniel Henrique Barboza <danielhb413>
    Reviewed-by: Michal Privoznik <mprivozn>

  v7.0.0-rc2-2-g0a6cb05e95

Comment 12 Roman Mohr 2021-01-19 17:20:03 UTC
Thanks Andrea. As discussed on github, it would be great to get a backport to 8.3, so that we can consume it in CNV 2.6 which is based on 8.3.

Comment 13 Meina Li 2021-01-25 05:53:40 UTC
Verified Version:
libvirt-7.0.0-2.module+el8.4.0+9520+ef609c5f.x86_64
qemu-kvm-5.2.0-3.module+el8.4.0+9499+42e58f08.x86_64


Verified Steps:
1. Start a guest with virtio-non-transitional memballoon device.
# virsh dumpxml lmn | grep /memballoon -B3
    <memballoon model='virtio-non-transitional'>
      <alias name='balloon0'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </memballoon>
# virsh start lmn
Domain 'lmn' started

2. Check memory statistic.
# virsh dommemstat lmn
actual 1572864
swap_in 0
swap_out 0
major_fault 249
minor_fault 134381
unused 1197908
available 1343988
usable 1156248
last_update 1611553595
disk_caches 59292
hugetlb_pgalloc 0
hugetlb_pgfail 0
rss 604824

3. Start a guest with virtio-transitional membaloon device.
# virsh dumpxml lmn | grep /memballoon -B3
    <memballoon model='virtio-transitional'>
      <alias name='balloon0'/>
      <address type='pci' domain='0x0000' bus='0x09' slot='0x01' function='0x0'/>
    </memballoon>
# virsh start lmn
Domain 'lmn' started

4. Check memory statistic.
# virsh dommemstat lmn
actual 1572864
swap_in 0
swap_out 0
major_fault 248
minor_fault 135906
unused 1199488
available 1344332
usable 1157784
last_update 1611553887
disk_caches 58996
hugetlb_pgalloc 0
hugetlb_pgfail 0
rss 608008

The test results are expected.

Comment 14 Andrea Bolognani 2021-01-27 18:25:23 UTC
(In reply to Roman Mohr from comment #12)
> Thanks Andrea. As discussed on github, it would be great to get a backport
> to 8.3, so that we can consume it in CNV 2.6 which is based on 8.3.

This has been handled with Bug 1918364, but on second thought there's
something that I should confirm: will AV 8.3.1 work for CNV? Or does
the fix need to be backported to AV 8.3.0.z before you can consume it?

Comment 15 Roman Mohr 2021-02-11 16:59:09 UTC
(In reply to Andrea Bolognani from comment #14)
> (In reply to Roman Mohr from comment #12)
> > Thanks Andrea. As discussed on github, it would be great to get a backport
> > to 8.3, so that we can consume it in CNV 2.6 which is based on 8.3.
> 
> This has been handled with Bug 1918364, but on second thought there's
> something that I should confirm: will AV 8.3.1 work for CNV? Or does
> the fix need to be backported to AV 8.3.0.z before you can consume it?

Created https://bugzilla.redhat.com/show_bug.cgi?id=1927853 now to track this separately on the CNV level.

Dan, without this, el6 guests work properly but some memory stats are not reported since the guest does not recognize the ballooning device.
I am not sure how important you judge this. CNV 2.6, 2.6.1 or 2.7?

Comment 16 Dan Kenigsberg 2021-02-11 18:17:15 UTC
(In reply to Roman Mohr from comment #15)
> (In reply to Andrea Bolognani from comment #14)
> > (In reply to Roman Mohr from comment #12)
> > > Thanks Andrea. As discussed on github, it would be great to get a backport
> > > to 8.3, so that we can consume it in CNV 2.6 which is based on 8.3.
> > 
> > This has been handled with Bug 1918364, but on second thought there's
> > something that I should confirm: will AV 8.3.1 work for CNV? Or does
> > the fix need to be backported to AV 8.3.0.z before you can consume it?

@phoracek had the very same question, I hope he has an answer by now.

> 
> Created https://bugzilla.redhat.com/show_bug.cgi?id=1927853 now to track
> this separately on the CNV level.
> 
> Dan, without this, el6 guests work properly but some memory stats are not
> reported since the guest does not recognize the ballooning device.
> I am not sure how important you judge this. CNV 2.6, 2.6.1 or 2.7?

It is clearly not a 2.6.0 blocker. Depending on the work and risk for upstream, I trust your decision whether this should go to 2.6.1 or to 4.8.0 (there's no CNV-2.7, btw, we renumber to match OCP)

Comment 17 Andrea Bolognani 2021-02-12 11:08:56 UTC
(In reply to Roman Mohr from comment #15)
> (In reply to Andrea Bolognani from comment #14)
> > (In reply to Roman Mohr from comment #12)
> > > Thanks Andrea. As discussed on github, it would be great to get a backport
> > > to 8.3, so that we can consume it in CNV 2.6 which is based on 8.3.
> > 
> > This has been handled with Bug 1918364, but on second thought there's
> > something that I should confirm: will AV 8.3.1 work for CNV? Or does
> > the fix need to be backported to AV 8.3.0.z before you can consume it?
> 
> Created https://bugzilla.redhat.com/show_bug.cgi?id=1927853 now to track
> this separately on the CNV level.
> 
> Dan, without this, el6 guests work properly but some memory stats are not
> reported since the guest does not recognize the ballooning device.
> I am not sure how important you judge this. CNV 2.6, 2.6.1 or 2.7?

Are you sure memory stats are the only thing that's not going to
work when using a non-transitional virtio-memballoon with a RHEL 6
guest?

As far as I understand, the entire device will be non-functional,
which is not limited to not relying stats but also to the actual
ballooning functionality (increasing and decreasing the amount of
memory available to the guest OS) not working.

Comment 18 Roman Mohr 2021-02-12 12:02:57 UTC
(In reply to Andrea Bolognani from comment #17)
> (In reply to Roman Mohr from comment #15)
> > (In reply to Andrea Bolognani from comment #14)
> > > (In reply to Roman Mohr from comment #12)
> > > > Thanks Andrea. As discussed on github, it would be great to get a backport
> > > > to 8.3, so that we can consume it in CNV 2.6 which is based on 8.3.
> > > 
> > > This has been handled with Bug 1918364, but on second thought there's
> > > something that I should confirm: will AV 8.3.1 work for CNV? Or does
> > > the fix need to be backported to AV 8.3.0.z before you can consume it?
> > 
> > Created https://bugzilla.redhat.com/show_bug.cgi?id=1927853 now to track
> > this separately on the CNV level.
> > 
> > Dan, without this, el6 guests work properly but some memory stats are not
> > reported since the guest does not recognize the ballooning device.
> > I am not sure how important you judge this. CNV 2.6, 2.6.1 or 2.7?
> 
> Are you sure memory stats are the only thing that's not going to
> work when using a non-transitional virtio-memballoon with a RHEL 6
> guest?
> 
> As far as I understand, the entire device will be non-functional,
> which is not limited to not relying stats but also to the actual
> ballooning functionality (increasing and decreasing the amount of
> memory available to the guest OS) not working.

You are absolutely right. It is just that CNV right now only supports reading from the ballooning device at this stage.
Therefore only metrics collection is affected.

Comment 19 Roman Mohr 2021-02-12 12:04:07 UTC
(In reply to Dan Kenigsberg from comment #16)
> (In reply to Roman Mohr from comment #15)
> > (In reply to Andrea Bolognani from comment #14)
> > > (In reply to Roman Mohr from comment #12)
> > > > Thanks Andrea. As discussed on github, it would be great to get a backport
> > > > to 8.3, so that we can consume it in CNV 2.6 which is based on 8.3.
> > > 
> > > This has been handled with Bug 1918364, but on second thought there's
> > > something that I should confirm: will AV 8.3.1 work for CNV? Or does
> > > the fix need to be backported to AV 8.3.0.z before you can consume it?
> 
> @phoracek had the very same question, I hope he has an answer by
> now.
> 
> > 
> > Created https://bugzilla.redhat.com/show_bug.cgi?id=1927853 now to track
> > this separately on the CNV level.
> > 
> > Dan, without this, el6 guests work properly but some memory stats are not
> > reported since the guest does not recognize the ballooning device.
> > I am not sure how important you judge this. CNV 2.6, 2.6.1 or 2.7?
> 
> It is clearly not a 2.6.0 blocker. Depending on the work and risk for
> upstream, I trust your decision whether this should go to 2.6.1 or to 4.8.0
> (there's no CNV-2.7, btw, we renumber to match OCP)

Ok, I think it there is little risk, but we are pretty full right now. I would say it can wait for 4.8. Thanks.

Comment 20 Petr Horáček 2021-02-15 08:23:13 UTC
> > This has been handled with Bug 1918364, but on second thought there's
> > something that I should confirm: will AV 8.3.1 work for CNV? Or does
> > the fix need to be backported to AV 8.3.0.z before you can consume it?

IIUIC 8.3.1 is enough. We always consume the latest AV version in CNV. We stick to 8.Y.0.z only with EUS CNV.

Comment 22 errata-xmlrpc 2021-05-25 06:46:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (virt:av bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2098


Note You need to log in before you can comment on or make changes to this bug.