Bug 1458638
Summary: | Guest can not be started with the guest memory is to be mapped as "shared" | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Lili Zhu <lizhu> |
Component: | libvirt | Assignee: | Michal Privoznik <mprivozn> |
Status: | CLOSED ERRATA | QA Contact: | Jing Qi <jinqi> |
Severity: | high | Docs Contact: | |
Priority: | urgent | ||
Version: | 7.4 | CC: | dyuan, jsuchane, lhuang, mprivozn, ovasik, rbalakri, xuzhang, yalzhang, zpeng |
Target Milestone: | rc | Keywords: | Regression, Upstream |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | libvirt-3.7.0-1.el7 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-04-10 10:46:43 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1469590, 1473046 |
Description
Lili Zhu
2017-06-05 05:21:56 UTC
Patch proposed upstream: https://www.redhat.com/archives/libvir-list/2017-June/msg00258.html Test result from bug 1214369: 1. hugepage + dimm device: xml: <memoryBacking> <hugepages/> </memoryBacking> <numa> <cell id='0' cpus='0-2' memory='524288' unit='KiB'/> <cell id='1' cpus='3-5' memory='524288' unit='KiB'/> </numa> <memory model='dimm' access='private'> <target> <size unit='KiB'>524287</size> <node>0</node> </target> <address type='dimm' slot='0'/> </memory> cmdline: -mem-path /dev/hugepages/libvirt/qemu/2-r7 -numa node,nodeid=0,cpus=0-2,mem=512 -numa node,nodeid=1,cpus=3-5,mem=512 -object memory-backend-file,id=memdimm0,mem-path=/var/lib/libvirt/qemu/ram,share=no,size=536870912 -device pc-dimm,node=0,memdev=memdimm0,id=dimm0,slot=0 You can see that libvirt use the /var/lib/libvirt/qemu/ram for the dimm device, but this is not right, should use the hugepage path. 2. hugepages + memAccess shared (just like the comment 0): xml: <memoryBacking> <hugepages/> </memoryBacking> <cpu mode='host-model' check='partial'> <model fallback='allow'>qemu64</model> <numa> <cell id='0' cpus='0-2' memory='524288' unit='KiB' memAccess='shared'/> <cell id='1' cpus='3-5' memory='524288' unit='KiB'/> </numa> </cpu> cmdline: -object memory-backend-file,id=ram-node0,mem-path=/var/lib/libvirt/qem/ram,share=yes,size=536870912 -numa node,nodeid=0,cpus=0-2,memdev=ram-node0 -object memory-backend-ram,id=ram-node1,size=536870912 -numa node,nodeid=1,cpus=3-5,memdev=ram-node1 This just like the problem 1, libvirt use /var/lib/libvirt/qem/ram instead of hugepage path and node 1 didn't use the hugepage. This still will be a regression, although guest can start up Please take a look at the commment 7. The issue exists in latest version. Patch proposed upstream: https://www.redhat.com/archives/libvir-list/2017-August/msg00260.html I've just pushed the patch upstream: commit e255cf02b2a24d19412d9bf08dfa654150d9a31b Author: Michal Privoznik <mprivozn> AuthorDate: Tue Aug 8 16:51:30 2017 +0200 Commit: Michal Privoznik <mprivozn> CommitDate: Thu Aug 10 17:26:30 2017 +0200 qemuBuildMemoryBackendStr: Handle one more corner case https://bugzilla.redhat.com/show_bug.cgi?id=1458638 This code is so complicated because we allow enabling the same bits at many places. Just like in this case: huge pages can be enabled by global <hugepages/> element under <memoryBacking> or on per <memory/> basis. To complicate things a bit more, users are allowed to omit the page size which case the default page size is used. And this is what is causing this bug. If no page size is specified, @pagesize is keeping value of zero throughout whole function. Therefore we need yet another boolean to hold [use, don't use] information as we can't sue @pagesize for that. Signed-off-by: Michal Privoznik <mprivozn> Reviewed-by: Martin Kletzander <mkletzan> v3.6.0-70-ge255cf02b Verified it with libvirt-3.7.0-1.el7.x86_64 and qemu-kvm-rhev-2.10.0-1.el7.x86_64. Using below xml in a domain, the domain can be started successfully. <currentMemory unit='KiB'>51200</currentMemory> <memoryBacking> <hugepages/> </memoryBacking> <numa> <cell id='0' cpus='0-1' memory='51200' unit='KiB' memAccess='shared'/> </numa> Checked the qemu command line, /dev/hugepagess is being used in mem-path. -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu/1-avocado-vt-vm1,share=yes,size=52428800,host-nodes=3,host-nodes=5,policy=bind -numa node,nodeid=0,cpus=0-1,memdev=ram-node0 When I prepared a guest on latest rhel7.4.z with the same xml in the bug description, xml: .... <memoryBacking> <hugepages/> </memoryBacking> .... <numa> <cell id='0' cpus='0-1' memory='512000' unit='KiB' memAccess='shared'/> <cell id='1' cpus='2-3' memory='512000' unit='KiB'/> </numa> the guest can be started now. Then I tried to migrate the guest to rhel7.5 host, the migration would fail. # virsh migrate --live rhel74 qemu+ssh://{target IP}/system --verbose root.73.81's password: error: internal error: qemu unexpectedly closed the monitor: 2017-12-04T11:10:45.625821Z qemu-kvm: Unknown ramblock "ram-node1", cannot accept migration 2017-12-04T11:10:45.625836Z qemu-kvm: error while loading state for instance 0x0 of device 'ram' 2017-12-04T11:10:45.626204Z qemu-kvm: load of migration failed: Invalid argument Please have a check. Can you attach debug logs from src and destination please? Hi, Michal debug logs are in https://bugzilla.redhat.com/show_bug.cgi?id=1458638#c17 See comment 18 I don't think there's much we can do here. I mean, this bug is about us regressing in generating the correct command line (we have to put memory-backend-file on the cmd line), but at the same time we can't because of migrating from older systems where wrong cmd line was generated. Worst case scenario, we can put something into the migration cookies (which libvirt exchanges when trying to migrate a guest) that will preserve this buggy behaviour and only use the right way when starting a new guest freshly. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:0704 |