Bug 630975
Summary: | KVM guest limited to 40bit of physical address space | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Andrea Arcangeli <aarcange> | ||||||||||
Component: | seabios | Assignee: | Andrea Arcangeli <aarcange> | ||||||||||
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||||||
Severity: | medium | Docs Contact: | |||||||||||
Priority: | low | ||||||||||||
Version: | 6.1 | CC: | acathrow, amit.shah, dshaks, jburke, john.cooper, juzhang, khong, michen, minovotn, mkenneth, perfbz, rjones, syeghiay, tburke, virt-maint, xfu | ||||||||||
Target Milestone: | beta | Keywords: | Triaged | ||||||||||
Target Release: | 6.1 | ||||||||||||
Hardware: | All | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | seabios-0.6.1.2-7.el6 | Doc Type: | Bug Fix | ||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | |||||||||||||
: | 743391 (view as bug list) | Environment: | |||||||||||
Last Closed: | 2011-12-06 17:00:51 UTC | Type: | --- | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Bug Depends On: | |||||||||||||
Bug Blocks: | 580953, 743391, 748554 | ||||||||||||
Attachments: |
|
Description
Andrea Arcangeli
2010-09-07 14:16:49 UTC
Created attachment 443516 [details]
show the host physical address bits in KVM guest
Some results on an 8-socket Intel EXT machine (128-way), with 2TB physical memory; Host: [root@hp-dl980g7-01 ~]# uname -a Linux hp-dl980g7-01.lab.bos.redhat.com 2.6.32-71.el6.x86_64 #1 SMP Wed Sep 1 01:33:01 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux Guest: [root@hp-dl980g7-01 kvm]# cat rhel6kvm_r6.sh WORKS at 1TB /usr/libexec/qemu-kvm -m 1028576 -smp 64 -name rhel6 -uuid 6b7fe2a1-0073-49f5-b28c-cf17f8312ac8 -monitor pty -no-kvm-pit-reinjection -boot c -drive file=/shak/kvm/rhel6.img,if=ide,index=0,boot=on -net nic,macaddr=52:54:00:12:34:56,vlan=0,model=virtio -net tap,script=/etc/qemu-ifup0,vlan=0,ifname=vnet0 -usb -vnc 127.0.0.1:0 & Breaks at > 1T -m 1.1 TB -m 1.4 TB -m 1.8 TB [root@hp-dl980g7-01 kvm]# cat rhel6kvm_r6.sh /usr/libexec/qemu-kvm -m 1128576 -smp 64 -name rhel6 -uuid 6b7fe2a1-0073-49f5-b28c-cf17f8312ac8 -monitor pty -no-kvm-pit-reinjection -boot c -drive file=/shak/kvm/rhel6.img,if=ide,index=0,boot=on -net nic,macaddr=52:54:00:12:34:56,vlan=0,model=virtio -net tap,script=/etc/qemu-ifup0,vlan=0,ifname=vnet0 -usb -vnc 127.0.0.1:0 & Can ssh too - dhcp47-223.lab.bos.redhat.com -------------------------------------------------------------------------------- Sep 7 09:57:54 localhost kernel: BIOS-provided physical RAM map: Sep 7 09:57:54 localhost kernel: BIOS-e820: 0000000000000000 - 000000000009f400 (usable) Sep 7 09:57:54 localhost kernel: BIOS-e820: 000000000009f400 - 00000000000a0000 (reserved) Sep 7 09:57:54 localhost kernel: BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) Sep 7 09:57:54 localhost kernel: BIOS-e820: 0000000000100000 - 00000000dfffa000 (usable) Sep 7 09:57:54 localhost kernel: BIOS-e820: 00000000dfffa000 - 00000000e0000000 (reserved) Sep 7 09:57:54 localhost kernel: BIOS-e820: 00000000fffbc000 - 0000000100000000 (reserved) Sep 7 09:57:54 localhost kernel: BIOS-e820: 0000000100000000 - 000000be8e000000 (usable) Sep 7 09:57:54 localhost kernel: DMI 2.4 present. Sep 7 09:57:54 localhost kernel: last_pfn = 0xbe8e000 max_arch_pfn = 0x400000000 Sep 7 09:57:54 localhost kernel: PAT not supported by CPU. Sep 7 09:57:54 localhost kernel: last_pfn = 0xdfffa max_arch_pfn = 0x400000000 Sep 7 09:57:54 localhost kernel: init_memory_mapping: 0000000000000000-00000000dfffa000 Sep 7 09:57:54 localhost kernel: init_memory_mapping: 0000000100000000-000000be8e000000 Sep 7 09:57:54 localhost kernel: RAMDISK: 373aa000 - 37fef6fb Sep 7 09:57:54 localhost kernel: ACPI: RSDP 00000000000f7bf0 00014 (v00 BOCHS ) Sep 7 09:57:54 localhost kernel: ACPI: RSDT 00000000dfffc4b0 00030 (v01 BOCHS BXPCRSDT 00000001 BXPC 00000001) Sep 7 09:57:54 localhost kernel: ACPI: FACP 00000000dfffe9e0 00074 (v01 BOCHS BXPCFACP 00000001 BXPC 00000001) Sep 7 09:57:54 localhost kernel: ACPI: DSDT 00000000dfffcb30 01E4B (v01 BXPC BXDSDT 00000001 INTL 20090123) Sep 7 09:57:54 localhost kernel: ACPI: FACS 00000000dfffe980 00040 Sep 7 09:57:54 localhost kernel: ACPI: SSDT 00000000dfffc7c0 0036B (v01 BOCHS BXPCSSDT 00000001 BXPC 00000001) Sep 7 09:57:54 localhost kernel: ACPI: APIC 00000000dfffc4e0 0026A (v01 BOCHS BXPCAPIC 00000001 BXPC 00000001) Sep 7 09:57:54 localhost kernel: No NUMA configuration found Sep 7 09:57:54 localhost kernel: Faking a node at 0000000000000000-000000be8e000000 Sep 7 09:57:54 localhost kernel: Bootmem setup node 0 0000000000000000-000000be8e000000 Sep 7 09:57:54 localhost kernel: NODE_DATA [000000000000c000 - 000000000003ffff] Sep 7 09:57:54 localhost kernel: bootmap [0000000001c2c000 - 00000000033fdbff] pages 17d2 Sep 7 09:57:54 localhost kernel: (8 early reservations) ==> bootmem [0000000000 - be8e000000] Sep 7 09:57:54 localhost kernel: #0 [0000000000 - 0000001000] BIOS data page ==> [0000000000 - 0000001000] Sep 7 09:57:54 localhost kernel: #1 [0000006000 - 0000008000] TRAMPOLINE ==> [0000006000 - 0000008000] Sep 7 09:57:54 localhost kernel: #2 [0001000000 - 0001c2a630] TEXT DATA BSS ==> [0001000000 - 0001c2a630] Sep 7 09:57:54 localhost kernel: #3 [00373aa000 - 0037fef6fb] RAMDISK ==> [00373aa000 - 0037fef6fb] Sep 7 09:57:54 localhost kernel: #4 [000009f400 - 0000100000] BIOS reserved ==> [000009f400 - 0000100000] Sep 7 09:57:54 localhost kernel: #5 [0001c2b000 - 0001c2b079] BRK ==> [0001c2b000 - 0001c2b079] Sep 7 09:57:54 localhost kernel: #6 [0000008000 - 000000c000] PGTABLE ==> [0000008000 - 000000c000] Sep 7 09:57:54 localhost kernel: #7 [0000100000 - 00003f8000] PGTABLE ==> [0000100000 - 00003f8000] Sep 7 09:57:54 localhost kernel: found SMP MP-table at [ffff8800000f7c40] f7c40 Sep 7 09:57:54 localhost kernel: Reserving 4096MB of memory at 4096MB for crashkernel (System RAM: 780512MB) I found the bug is in seabios: u64 high = ((inb_cmos(CMOS_MEM_HIGHMEM_LOW) << 16) | ((u32)inb_cmos(CMOS_MEM_HIGHMEM_MID) << 24) | ((u64)inb_cmos(CMOS_MEM_HIGHMEM_HIGH) << 32)); RamSizeOver4G = high; And corresponding bug in qemu-kvm that sends the info to seabios: if (above_4g_mem_size) { rtc_set_memory(s, 0x5b, (unsigned int)above_4g_mem_size >> 16); rtc_set_memory(s, 0x5c, (unsigned int)above_4g_mem_size >> 24); rtc_set_memory(s, 0x5d, (uint64_t)above_4g_mem_size >> 32); } We need to add the 48-40 bits range too, so adding a >>40 line to the above two places, right now seabios is limited to 40bits (1T). That in addition of the 0x80000008 patch I attached already. Created attachment 443549 [details]
seabios patch
Created attachment 443551 [details]
seabios patch
Created attachment 443552 [details]
kvm seabios patch
With all 3 patches applied my simulation seems to be working better. I can't succeed the booting (so I can't be sure this is enough) because 16G of ram aren't enough (despite of 16G in swap too) but definitely it's using more than the previous 256G of ram.
As you can see from attachment 443516 [details], the comment talks about some 42bit limit in exec.c . Nobody could have ever booted anything more than 1TB because of the siabios limit anyway, so that limit surely has never been exercised. But I don't see it, so maybe >42bits already works just fine. I'm optimistic that at least up to 4TB should work now (42bits) but so the next test is to see if >4TB works too.
Two patches posted to rhvirt-patches with Message-ID: <20110930173155.GO7768> and Message-ID: <20110930173650.GP7768>. reproduce on qemu-kvm-0.12.1.2-2.193.el6.x86_64 and seabios-0.6.1.2-5.el6.x86_64 in 4T host 1. boot guest with >1T memory, guest will identify <1T memory. 1.1. #/usr/libexec/qemu-kvm -M rhel6.2.0 -enable-kvm -m 1600G 1.2 in guest,the memory is just 567G not 1.6T #free -g total used free shared buffers cached Mem: 567 6 561 0 0 0 -/+ buffers/cache: 6 561 Swap: 3 0 3 2. boot guest with =1T memory, guest will identify 1T memory 2.1#/usr/libexec/qemu-kvm -M rhel6.2.0 -enable-kvm -m 1024G 2.1 in guest,the memory is 1T # free -g total used free shared buffers cached Mem: 1009 10 999 0 0 0 -/+ buffers/cache: 9 999 Swap: 3 0 3 Tested with qemu-206 using rhel6.2-snap2 guest,the host RAM is 4T,I tested 2 conditions 1.boot guest with host free memory,guest can be booted successfully in 1-2 mins with RAM=host free memory(3998G) #qemu-kvm -m -m 3998G ..... #in guest # free -g total used free shared buffers cached Mem: 3942 37 3905 0 0 0 -/+ buffers/cache: 37 3905 Swap: 3 0 3 2.boot guest with >= 4t,qemu-kvm will be aborted with dump (gdb) bt #0 0x0000003eea032885 in raise () from /lib64/libc.so.6 #1 0x0000003eea034065 in abort () from /lib64/libc.so.6 #2 0x0000000000484710 in qemu_memalign (alignment=2097152, size=4399120252928) at osdep.c:112 #3 0x00000000004ecc52 in qemu_ram_alloc_from_ptr (dev=<value optimized out>, name=<value optimized out>, size=4399120252928, host=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/exec.c:2730 #4 0x0000000000452863 in pc_init1 (ram_size=3758096384, boot_device=0x7fffffffdfc0 "cad", kernel_filename=0x0, kernel_cmdline=0x642ee2 "", initrd_filename=0x0, cpu_model=0x631d85 "cpu64-rhel6", pci_enabled=1) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/pc.c:1115 #5 0x000000000040d4ae in main (argc=<value optimized out>, argv=<value optimized out>, envp=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:6254 2.1 swap size in host Swap: 2031608k total Hi,Andrea According to comment24,would you please answer me two questions? 1.if boot guest with >= 4t,qemu-kvm will be aborted with dump,it is normal?as I knew,kvm quest support over-commit.for this case,the host RAM is 4T,if we boot guest is >=4T,the qemu-kvm will be aborted with dump.would you please elaborate it? 2.if question problem is normal,according to our results,can we set this issue as verified? thanks
> 2.if question problem is normal,according to our results,can we set this
> issue as verified?
A little update.
2 if question 1 problem is normal,according to our results,can we set this
issue as verified?
According to comment24,I'm going to set this issue as verified.if you think comment25 question1 is a problem.please let us know.we will open new issue. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2011-1680.html |