Bug 1911786
| Summary: | Can't connect to ballooning device when using virtio-transitional or virtio-non-transitional | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux Advanced Virtualization | Reporter: | Roman Mohr <rmohr> | ||||
| Component: | libvirt | Assignee: | Andrea Bolognani <abologna> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Meina Li <meili> | ||||
| Severity: | unspecified | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 8.3 | CC: | abologna, crobinso, danken, fdeutsch, jdenemar, jinqi, jsuchane, lmen, phoracek, virt-maint, xuzhang | ||||
| Target Milestone: | rc | Keywords: | Triaged | ||||
| Target Release: | 8.4 | Flags: | pm-rhel:
mirror+
|
||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | libvirt-7.0.0-1.el8 | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | |||||||
| : | 1918364 (view as bug list) | Environment: | |||||
| Last Closed: | 2021-05-25 06:46:31 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | 7.0.0 | ||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1918364 | ||||||
| Attachments: |
|
||||||
|
Description
Roman Mohr
2020-12-31 08:53:05 UTC
Reproduced on:
libvirt-6.6.0-11.module+el8.3.1+9196+74a80ca4.x86_64
qemu-kvm-5.1.0-17.module+el8.3.1+9213+7ace09c3.x86_64
Reproduced Steps: check the memory stats after the guest boot fully
1. With virtio-transitional memballoon.
# virsh dumpxml lmn | grep /memballoon -B3
<memballoon model='virtio-transitional'>
<alias name='balloon0'/>
<address type='pci' domain='0x0000' bus='0x09' slot='0x01' function='0x0'/>
</memballoon>
# virsh dommemstat lmn
actual 1572864
rss 635228
--------Can't get expected stats
2. With virtio-non-transitional memballoon.
# virsh dumpxml lmn | grep /memballoon -B3
<memballoon model='virtio-non-transitional'>
<alias name='balloon0'/>
<address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
</memballoon>
# virsh dommemstat lmn
actual 1572864
rss 530152
--------Can't get expected stats
3. With virtio memballoon.
# virsh dumpxml lmn | grep /memballoon -B3
<memballoon model='virtio'>
<alias name='balloon0'/>
<address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
</memballoon>
# virsh dommemstat lmn
actual 1572864
swap_in 0
swap_out 0
major_fault 269
minor_fault 138820
unused 1186396
available 1344380
usable 1145904
last_update 1609903086
disk_caches 61452
hugetlb_pgalloc 0
hugetlb_pgfail 0
rss 523996
-------Can get expected stats
I am working around this by sticking with `virtio`. Giving it priority high since I don't know what the default is on q35 if I simply choose `virtio`. The priority can be lowered if `virtio` defaults to `virtio-transitional` because then we have no issue on older guests. Feel free to lower the priority if `virtio` defaults to `virtio-transitional` in this case and let me know. Andrea, can you please investigate what is the behavior behind this model and reply to comment 3? Adding Cole to CC, as he made the original changes I believe. Thanks. (In reply to Jaroslav Suchanek from comment #4) > Andrea, can you please investigate what is the behavior behind this model > and reply to comment 3? Adding Cole to CC, as he made the original changes I > believe. Thanks. I'll dig further tomorrow, but the issue seems to be that in qemuMonitorInitBalloonObjectPath() we look for the memballoon based on two pieces of data: its alias and its type, where the latter is expected to be either virtio-balloon-pci or virtio-balloon-ccw based on the device's address type. https://gitlab.com/libvirt/libvirt/-/blob/master/src/qemu/qemu_monitor.c#L990 However, the (non-)transitional devices have different QOM types: # <memballoon model='virtio'> $ virsh qemu-monitor-command test --hmp qom-list /machine/peripheral/ | grep balloon balloon0 (child<virtio-balloon-pci>) # <memballoon model='virtio-non-transitional'> $ virsh qemu-monitor-command test --hmp qom-list /machine/peripheral/ | grep balloon balloon0 (child<virtio-balloon-pci-non-transitional>) Since libvirt always expects the type to be virtio-balloon-pci, it can't find the memballoon when (non)-transitional devices are used. (In reply to Roman Mohr from comment #3) > I am working around this by sticking with `virtio`. > > Giving it priority high since I don't know what the default is on q35 if I > simply choose `virtio`. > The priority can be lowered if `virtio` defaults to `virtio-transitional` > because then we have no issue on older guests. > Feel free to lower the priority if `virtio` defaults to > `virtio-transitional` in this case and let me know. As you can see above, virtio is just virtio :) It's not an alias for either one of the other options, which QEMU considers to be completely separate devices - though obviously they share most of the code. When leaving PCI address assignment to libvirt, on a q35 machine type the memballoon will be placed behind a pcie-root-port and so it will behave like the non-transitional device. This is consistent with all other virtio devices. Patch posted upstream. https://www.redhat.com/archives/libvir-list/2021-January/msg00621.html Fix pushed upstream.
commit 0a6cb05e953d315b7a05103d707cff4d36221211
Author: Andrea Bolognani <abologna>
Date: Tue Jan 12 17:17:44 2021 +0100
qemu: Fix memstat for (non-)transitional memballoon
Depending on the memballoon model, the corresponding QOM node
will have a different type and we need to account for this
when searching for it in the QOM tree.
https://bugzilla.redhat.com/show_bug.cgi?id=1911786
Signed-off-by: Andrea Bolognani <abologna>
Reviewed-by: Daniel Henrique Barboza <danielhb413>
Reviewed-by: Michal Privoznik <mprivozn>
v7.0.0-rc2-2-g0a6cb05e95
Thanks Andrea. As discussed on github, it would be great to get a backport to 8.3, so that we can consume it in CNV 2.6 which is based on 8.3. Verified Version:
libvirt-7.0.0-2.module+el8.4.0+9520+ef609c5f.x86_64
qemu-kvm-5.2.0-3.module+el8.4.0+9499+42e58f08.x86_64
Verified Steps:
1. Start a guest with virtio-non-transitional memballoon device.
# virsh dumpxml lmn | grep /memballoon -B3
<memballoon model='virtio-non-transitional'>
<alias name='balloon0'/>
<address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
</memballoon>
# virsh start lmn
Domain 'lmn' started
2. Check memory statistic.
# virsh dommemstat lmn
actual 1572864
swap_in 0
swap_out 0
major_fault 249
minor_fault 134381
unused 1197908
available 1343988
usable 1156248
last_update 1611553595
disk_caches 59292
hugetlb_pgalloc 0
hugetlb_pgfail 0
rss 604824
3. Start a guest with virtio-transitional membaloon device.
# virsh dumpxml lmn | grep /memballoon -B3
<memballoon model='virtio-transitional'>
<alias name='balloon0'/>
<address type='pci' domain='0x0000' bus='0x09' slot='0x01' function='0x0'/>
</memballoon>
# virsh start lmn
Domain 'lmn' started
4. Check memory statistic.
# virsh dommemstat lmn
actual 1572864
swap_in 0
swap_out 0
major_fault 248
minor_fault 135906
unused 1199488
available 1344332
usable 1157784
last_update 1611553887
disk_caches 58996
hugetlb_pgalloc 0
hugetlb_pgfail 0
rss 608008
The test results are expected.
(In reply to Roman Mohr from comment #12) > Thanks Andrea. As discussed on github, it would be great to get a backport > to 8.3, so that we can consume it in CNV 2.6 which is based on 8.3. This has been handled with Bug 1918364, but on second thought there's something that I should confirm: will AV 8.3.1 work for CNV? Or does the fix need to be backported to AV 8.3.0.z before you can consume it? (In reply to Andrea Bolognani from comment #14) > (In reply to Roman Mohr from comment #12) > > Thanks Andrea. As discussed on github, it would be great to get a backport > > to 8.3, so that we can consume it in CNV 2.6 which is based on 8.3. > > This has been handled with Bug 1918364, but on second thought there's > something that I should confirm: will AV 8.3.1 work for CNV? Or does > the fix need to be backported to AV 8.3.0.z before you can consume it? Created https://bugzilla.redhat.com/show_bug.cgi?id=1927853 now to track this separately on the CNV level. Dan, without this, el6 guests work properly but some memory stats are not reported since the guest does not recognize the ballooning device. I am not sure how important you judge this. CNV 2.6, 2.6.1 or 2.7? (In reply to Roman Mohr from comment #15) > (In reply to Andrea Bolognani from comment #14) > > (In reply to Roman Mohr from comment #12) > > > Thanks Andrea. As discussed on github, it would be great to get a backport > > > to 8.3, so that we can consume it in CNV 2.6 which is based on 8.3. > > > > This has been handled with Bug 1918364, but on second thought there's > > something that I should confirm: will AV 8.3.1 work for CNV? Or does > > the fix need to be backported to AV 8.3.0.z before you can consume it? @phoracek had the very same question, I hope he has an answer by now. > > Created https://bugzilla.redhat.com/show_bug.cgi?id=1927853 now to track > this separately on the CNV level. > > Dan, without this, el6 guests work properly but some memory stats are not > reported since the guest does not recognize the ballooning device. > I am not sure how important you judge this. CNV 2.6, 2.6.1 or 2.7? It is clearly not a 2.6.0 blocker. Depending on the work and risk for upstream, I trust your decision whether this should go to 2.6.1 or to 4.8.0 (there's no CNV-2.7, btw, we renumber to match OCP) (In reply to Roman Mohr from comment #15) > (In reply to Andrea Bolognani from comment #14) > > (In reply to Roman Mohr from comment #12) > > > Thanks Andrea. As discussed on github, it would be great to get a backport > > > to 8.3, so that we can consume it in CNV 2.6 which is based on 8.3. > > > > This has been handled with Bug 1918364, but on second thought there's > > something that I should confirm: will AV 8.3.1 work for CNV? Or does > > the fix need to be backported to AV 8.3.0.z before you can consume it? > > Created https://bugzilla.redhat.com/show_bug.cgi?id=1927853 now to track > this separately on the CNV level. > > Dan, without this, el6 guests work properly but some memory stats are not > reported since the guest does not recognize the ballooning device. > I am not sure how important you judge this. CNV 2.6, 2.6.1 or 2.7? Are you sure memory stats are the only thing that's not going to work when using a non-transitional virtio-memballoon with a RHEL 6 guest? As far as I understand, the entire device will be non-functional, which is not limited to not relying stats but also to the actual ballooning functionality (increasing and decreasing the amount of memory available to the guest OS) not working. (In reply to Andrea Bolognani from comment #17) > (In reply to Roman Mohr from comment #15) > > (In reply to Andrea Bolognani from comment #14) > > > (In reply to Roman Mohr from comment #12) > > > > Thanks Andrea. As discussed on github, it would be great to get a backport > > > > to 8.3, so that we can consume it in CNV 2.6 which is based on 8.3. > > > > > > This has been handled with Bug 1918364, but on second thought there's > > > something that I should confirm: will AV 8.3.1 work for CNV? Or does > > > the fix need to be backported to AV 8.3.0.z before you can consume it? > > > > Created https://bugzilla.redhat.com/show_bug.cgi?id=1927853 now to track > > this separately on the CNV level. > > > > Dan, without this, el6 guests work properly but some memory stats are not > > reported since the guest does not recognize the ballooning device. > > I am not sure how important you judge this. CNV 2.6, 2.6.1 or 2.7? > > Are you sure memory stats are the only thing that's not going to > work when using a non-transitional virtio-memballoon with a RHEL 6 > guest? > > As far as I understand, the entire device will be non-functional, > which is not limited to not relying stats but also to the actual > ballooning functionality (increasing and decreasing the amount of > memory available to the guest OS) not working. You are absolutely right. It is just that CNV right now only supports reading from the ballooning device at this stage. Therefore only metrics collection is affected. (In reply to Dan Kenigsberg from comment #16) > (In reply to Roman Mohr from comment #15) > > (In reply to Andrea Bolognani from comment #14) > > > (In reply to Roman Mohr from comment #12) > > > > Thanks Andrea. As discussed on github, it would be great to get a backport > > > > to 8.3, so that we can consume it in CNV 2.6 which is based on 8.3. > > > > > > This has been handled with Bug 1918364, but on second thought there's > > > something that I should confirm: will AV 8.3.1 work for CNV? Or does > > > the fix need to be backported to AV 8.3.0.z before you can consume it? > > @phoracek had the very same question, I hope he has an answer by > now. > > > > > Created https://bugzilla.redhat.com/show_bug.cgi?id=1927853 now to track > > this separately on the CNV level. > > > > Dan, without this, el6 guests work properly but some memory stats are not > > reported since the guest does not recognize the ballooning device. > > I am not sure how important you judge this. CNV 2.6, 2.6.1 or 2.7? > > It is clearly not a 2.6.0 blocker. Depending on the work and risk for > upstream, I trust your decision whether this should go to 2.6.1 or to 4.8.0 > (there's no CNV-2.7, btw, we renumber to match OCP) Ok, I think it there is little risk, but we are pretty full right now. I would say it can wait for 4.8. Thanks. > > This has been handled with Bug 1918364, but on second thought there's
> > something that I should confirm: will AV 8.3.1 work for CNV? Or does
> > the fix need to be backported to AV 8.3.0.z before you can consume it?
IIUIC 8.3.1 is enough. We always consume the latest AV version in CNV. We stick to 8.Y.0.z only with EUS CNV.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (virt:av bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2098 |