Bug 1324092
Summary: | Ballooning probably doesn't work on ARM | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Karen Noel <knoel> |
Component: | qemu-kvm-rhev | Assignee: | Wei Huang (AMD) <wehuang> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 7.3 | CC: | chayang, dgilbert, drjones, fweimer, jcm, juzhang, knoel, michen, qzhang, thuth, virt-bugs, virt-maint |
Target Milestone: | rc | Keywords: | OtherQA |
Target Release: | --- | ||
Hardware: | aarch64 | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | 1323988 | Environment: | |
Last Closed: | 2016-07-18 14:51:39 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1173755 |
Description
Karen Noel
2016-04-05 13:45:48 UTC
Most of our existing RHELSA tests are using the same page table sizes (HOST and TARGET sizes are 64KB). I did some fresh installation and testing with RHELSA 7.3. Testing setup: 1) start a VM 2) stress VM's virtual memory with a stress program 3) observe the host free memory size 4) use "setmem vm mem_size --config --live" 5) obserev the host free memory size and compare it with (3) 1. HOST 64KB + TARGET 64KB The testing didn't reveal any problems. Memory can be returned to host without any problem. 2. HOST 64KB + TARGET 4KB Lots of issues. I saw guest VM hang with various reasons. See L1 for one example. 3. HOST 64KB + TARGET 16KB Guest VM failed to boot. In summary, the overall usability of ballooning in AArch64 depends on guest page size. We saw lots of issues when host/guest page size are mismatched, though the symptom are different from this BZ's description. -Wei ========= LOG ========= L1. Dump (host 64KB + target 64KB) [ 149.303569] systemd[1]: unhandled level 1 translation fault (11) at 0x00000018, esr 0x92000005 [ 149.305176] pgd = ffffffc131641000 [ 149.305773] [00000018] *pgd=0000000000000000, *pud=0000000000000000 [ 149.306913] [ 149.307166] CPU: 0 PID: 1 Comm: systemd Not tainted 4.5.0-rc7 #1 [ 149.308205] Hardware name: linux,dummy-virt (DT) [ 149.308990] task: ffffffc139930000 ti: ffffffc1398bc000 task.ti: ffffffc1398bc000 [ 149.310277] PC is at 0x7f827e5ffc [ 149.310859] LR is at 0x7f827e81b8 [ 149.311406] pc : [<0000007f827e5ffc>] lr : [<0000007f827e81b8>] pstate: 80000000 [ 149.312695] sp : 0000007fd7b88070 [ 149.313300] x29: 0000007fd7b88070 x28: 0000007fd7b88100 [ 149.314222] x27: 0000007f828dd590 x26: 0000007f828dd550 [ 149.315130] x25: 0000007f828dd540 x24: 0000000000000000 [ 149.316071] x23: 0000007f828dd598 x22: 0000000000000000 [ 149.316993] x21: 00000055924fd2f0 x20: 0000000000000000 [ 149.317922] x19: 00000055924fd2f0 x18: 000000557a43a050 [ 149.318812] x17: 0000007f827ea084 x16: 0000007f828dd008 [ 149.319697] x15: 003b9aca00000000 x14: 001b45f4dc000000 [ 149.320656] x13: ffffffffa8f809c7 x12: 0fd5c362e5e3ac00 [ 149.321583] x11: 0000007fd7b879f8 x10: 0000007fd7b879d0 [ 149.322498] x9 : 0000000000000000 x8 : 0000000000000038 [ 149.323408] x7 : 0fd5c362e5e3ac00 x6 : 0000000000000001 [ 149.324309] x5 : 0000000000000000 x4 : 0000000000000000 [ 149.325223] x3 : 0000000000000001 x2 : 0000000000000091 [ 149.326143] x1 : 0000000000000000 x0 : 0000000000000000 [ 149.327074] (In reply to Wei Huang from comment #1) > Most of our existing RHELSA tests are using the same page table sizes (HOST > and TARGET sizes are 64KB). I did some fresh installation and testing with > RHELSA 7.3. > > Testing setup: > 1) start a VM > 2) stress VM's virtual memory with a stress program > 3) observe the host free memory size > 4) use "setmem vm mem_size --config --live" > 5) obserev the host free memory size and compare it with (3) > > > 1. HOST 64KB + TARGET 64KB > The testing didn't reveal any problems. Memory can be returned to host > without any problem. > > 2. HOST 64KB + TARGET 4KB > Lots of issues. I saw guest VM hang with various reasons. See L1 for one > example. > > 3. HOST 64KB + TARGET 16KB > Guest VM failed to boot. When you say 'TARGET' here is this the 'TARGET_PAGE_SIZE' setting in QEMU or just the guest's page size; note that they are very different. (In reply to Dr. David Alan Gilbert from comment #2) > (In reply to Wei Huang from comment #1) > > Most of our existing RHELSA tests are using the same page table sizes (HOST > > and TARGET sizes are 64KB). I did some fresh installation and testing with > > RHELSA 7.3. > > When you say 'TARGET' here is this the 'TARGET_PAGE_SIZE' setting in QEMU or > just the guest's page size; note that they are very different. Right. TARGET_PAGE_SIZE is always 1K in QEMU, while a Linux guest kernel may be compiled to use 4, 16, or 64K. RHEL only cares about 64K guest kernels, so testing with any other guest kernel isn't really relevant to this RHEL BZ. Based on Wei's testing though, it seems we don't have a problem with 64K guests? I guess we should double check the code to try and understand why it works, and/or double check Wei's test steps to ensure the expected problem would indeed be apparent. (In reply to Andrew Jones from comment #3) > (In reply to Dr. David Alan Gilbert from comment #2) > > (In reply to Wei Huang from comment #1) > > > Most of our existing RHELSA tests are using the same page table sizes (HOST > > > and TARGET sizes are 64KB). I did some fresh installation and testing with > > > RHELSA 7.3. > > > > When you say 'TARGET' here is this the 'TARGET_PAGE_SIZE' setting in QEMU or > > just the guest's page size; note that they are very different. > > Right. TARGET_PAGE_SIZE is always 1K in QEMU, while a Linux guest kernel may > be compiled to use 4, 16, or 64K. RHEL only cares about 64K guest kernels, > so testing with any other guest kernel isn't really relevant to this RHEL BZ. > > Based on Wei's testing though, it seems we don't have a problem with 64K > guests? I guess we should double check the code to try and understand why it > works, and/or double check Wei's test steps to ensure the expected problem > would indeed be apparent. Yeh we should understand it; as far as I understand all this code is shared between all the architectures, and Dave Gibson's seems to agree on the power bug that this shouldn't work. (In reply to Dr. David Alan Gilbert from comment #2) > (In reply to Wei Huang from comment #1) > > Most of our existing RHELSA tests are using the same page table sizes (HOST > > and TARGET sizes are 64KB). I did some fresh installation and testing with > > RHELSA 7.3. > > > > Testing setup: > > 1) start a VM > > 2) stress VM's virtual memory with a stress program > > 3) observe the host free memory size > > 4) use "setmem vm mem_size --config --live" > > 5) obserev the host free memory size and compare it with (3) > > > > > > 1. HOST 64KB + TARGET 64KB > > The testing didn't reveal any problems. Memory can be returned to host > > without any problem. > > > > 2. HOST 64KB + TARGET 4KB > > Lots of issues. I saw guest VM hang with various reasons. See L1 for one > > example. > > > > 3. HOST 64KB + TARGET 16KB > > Guest VM failed to boot. > > When you say 'TARGET' here is this the 'TARGET_PAGE_SIZE' setting in QEMU or > just the guest's page size; note that they are very different. Guest kernel's page size... See Thomas's comments on the Power equivalent; https://bugzilla.redhat.com/show_bug.cgi?id=1323988 he noticed that actually the madvise will work on aligned addresses but destroys the whole host page; so if you're lucky and the guest does the right thing it will work; if you're unlucky it'll remove more than you might have expected. (In reply to Dr. David Alan Gilbert from comment #6) > See Thomas's comments on the Power equivalent; > https://bugzilla.redhat.com/show_bug.cgi?id=1323988 > he noticed that actually the madvise will work on aligned addresses but > destroys the whole host page; so if you're lucky and the guest does the > right thing it will work; if you're unlucky it'll remove more than you might > have expected. David Gibson's comment (bug 1323988 comment 6) states what Wei's testing has shown. 64k guest page on 64k host page configurations will work, although only by luck. We're only planning to support 64K host obviously, but we might have a case in which someone runs another Linux distro with 4K guest pages (e.g. Ubuntu). So worth tracking this for sure. Suggested patch upstream: https://patchwork.ozlabs.org/patch/609982/ The final accepted patch for this is 01310e2aa "hw/virtio/balloon: Replace TARGET_PAGE_SIZE with BALLOON_PAGE_SIZE" We should test it on AArch64 and give Thomas feedback. I presume he'll be backporting something for 7.3 for Power, which should resolve this bug too. I'm afraid, but that's not the full fix. The problem with 4k guests on 64k page size hosts still remains - that part of the patch has not been accepted, since there are currently some other reworks of the balloon code on the list which might help to fix this problem later in a more proper way... *** Bug 1323988 has been marked as a duplicate of this bug. *** Based on the test results, I no longer see this problem on RHELSA 7.3 release. We can close this BZ now. |