Bug 821663 - kernel panic when booting guest(rhel5.8x64 rhel5.7x64 rhel4.9x64 ) with -smp 65 in rhel6.3 host
kernel panic when booting guest(rhel5.8x64 rhel5.7x64 rhel4.9x64 ) with -smp ...
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: qemu-kvm (Show other bugs)
6.3
x86_64 Linux
medium Severity medium
: rc
: ---
Assigned To: Gleb Natapov
Virtualization Bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-05-15 06:01 EDT by FuXiangChun
Modified: 2013-12-08 19:57 EST (History)
14 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-06-14 03:13:52 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
full log (48.73 KB, application/octet-stream)
2012-06-12 07:53 EDT, FuXiangChun
no flags Details

  None (edit)
Description FuXiangChun 2012-05-15 06:01:54 EDT
Description of problem:
boot guest rhel5.8x64 rhel5.7x64 and rhel4.9x64 with -smp >64, guest will kernel panic.  if smp value<=64 then guest work well. 

Version-Release number of selected component (if applicable):
#rpm -qa|grep qemu
qemu-kvm-0.12.1.2-2.292.el6.x86_64

# uname -r
2.6.32-270.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1./usr/libexec/qemu-kvm -M rhel6.3.0 -cpu host --enable-kvm -m 512G -smp 64,maxcpus=161 -name rhel6.3 -uuid ddcbfb49-3411-1701-3c36-6bdbc00bedbc -rtc base=utc,clock=host,driftfix=slew -drive file=/home/images/RHEL-Server-5.7-64-virtio.qcow2,if=none,id=virtio,format=qcow2,cache=none,werror=stop,rerror=stop -device virtio-blk-pci,drive=virtio,id=drive-virtio0-0-0,bootindex=1 -netdev tap,id=hostnet1 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=86:12:50:a4:35:75 -spice port=5911,disable-ticketing -vga qxl -device sga -chardev socket,id=serial0,path=/var/test3,server,nowait -device isa-serial,chardev=serial0 -balloon virtio -monitor unix:/tmp/monitor3,server,nowait -monitor stdio
2.
3.
  
Actual results:
Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: 
 [<ffffffff8008d4ca>] sd_degenerate+0x31/0x44
PGD 0 
Oops: 0000 [1] SMP 
last sysfs file: 
CPU 0 
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.18-300.el5 #1
RIP: 0010:[<ffffffff8008d4ca>]  [<ffffffff8008d4ca>] sd_degenerate+0x31/0x44
RSP: 0000:ffff81801fc17be0  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff810001c02be0 RCX: 000000000000004d
RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000000
RBP: ffff81801fc17bf0 R08: 0000000000000040 R09: 0000000000000000
R10: ffff81801fc17e40 R11: 000000d000000000 R12: ffff810001c02be0
R13: ffff810001c02700 R14: ffff810001c01420 R15: 0000000000000027
FS:  0000000000000000(0000) GS:ffffffff8042f000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000000201000 CR4: 00000000000006e0
Process swapper (pid: 1, threadinfo ffff81801fc16000, task ffff81801fc057a0)
Stack:  ffff810001c02a40 ffff810001c02a40 ffff81801fc17c30 ffffffff8008eac6
 0000004d000000d0 0000000000000040 00000000000000ff ffff810001c02a40
 00000000000000ff 0000000000000040 ffff81801fc17e30 ffffffff800917d0
Call Trace:
 [<ffffffff8008eac6>] cpu_attach_domain+0x4b/0xce
 [<ffffffff800917d0>] __build_sched_domains+0xd42/0x13d3
 [<ffffffff80155d7f>] __next_cpu+0x19/0x28
 [<ffffffff80091f57>] arch_init_sched_domains+0x2e/0x35
 [<ffffffff8047e049>] sched_init_smp+0x1e/0xa5
 [<ffffffff8046b9e8>] init+0x183/0x2f7
 [<ffffffff8005dfb1>] child_rip+0xa/0x11
 [<ffffffff8018835c>] acpi_ds_init_one_object+0x0/0x80
 [<ffffffff8046b865>] init+0x0/0x2f7
 [<ffffffff8005dfa7>] child_rip+0x0/0x11

Code: 48 3b 00 75 08 31 d2 f6 c1 70 0f 94 c2 59 5b c9 89 d0 c3 55 
RIP  [<ffffffff8008d4ca>] sd_degenerate+0x31/0x44
 RSP <ffff81801fc17be0>
CR2: 0000000000000000
 <0>Kernel panic - not syncing: Fatal exception


Expected results:
guest boot successful

Additional info:
for rhel5.8x86 rhel5.7x86 rhel4.9x86 guest, don't hit this issue.
Comment 1 Gleb Natapov 2012-06-10 04:14:48 EDT
Check our product limits before testing: https://home.corp.redhat.com/wiki/enterprise-linux-product-limits. We do not support more then 64 cpus with any of this products and x86 variations do not even try to initialize all available vcpus.

Your "reproduce" command line has "smp 64,maxcpus=161". Does this mean you hit this issue with 64 cpus, or the command line is incorrect? Regardless I do not hit this issue (which looks like kernel bug) with much more than 64 vcpus and rhel5. What host cpu do you have? Attach full console log.
Comment 2 FuXiangChun 2012-06-11 06:22:39 EDT
My "repdouce" command line should be "smp 65, maxcpus=161".
the first reproduce it on AMD 6172 (48 cores)
JUst can reproduce it in local host.
AMD Phenom(tm) 9600B Quad-Core Processor, 4 cores.

This is full console log.

Booting 'Red Hat Enterprise Linux Server (2.6.18-274.el5)'

root (hd0,0)
 Filesystem type is ext2fs, partition type 0x83
kernel /vmlinuz-2.6.18-274.el5 ro root=/dev/VolGroup00/LogVol00 crashkernel=128
M@32M console=tty0 console=ttyS0,115200n8 rhgb quiet
   [Linux-bzImage, setup=0x1e00, size=0x20029c]
initrd /initrd-2.6.18-274.el5.img
   [Linux-initrd @ 0x37c9d000, 0x3520f5 bytes]

WARNING calibrate_APIC_clock: the APIC timer calibration may be wrong.
Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: 
 [<ffffffff8008cecd>] sd_degenerate+0x31/0x44
PGD 0 
Oops: 0000 [1] SMP 
last sysfs file: 
CPU 0 
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.18-274.el5 #1
RIP: 0010:[<ffffffff8008cecd>]  [<ffffffff8008cecd>] sd_degenerate+0x31/0x44
RSP: 0000:ffff81007ffbfbe0  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff81000a755e60 RCX: 000000000000004d
RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000000
RBP: ffff81007ffbfbf0 R08: 0000000000000040 R09: 0000000000000000
R10: ffff81007ffbfe40 R11: 000000d000000000 R12: ffff81000a755e60
R13: ffff81000a755980 R14: ffff81000a7546a0 R15: 0000000000000027
FS:  0000000000000000(0000) GS:ffffffff8042a000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000000201000 CR4: 00000000000006e0
Process swapper (pid: 1, threadinfo ffff81007ffbe000, task ffff81007ffad7a0)
Stack:  ffff81000a755cc0 ffff81000a755cc0 ffff81007ffbfc30 ffffffff8008e4c9
 0000004d000000d0 0000000000000040 00000000000000ff ffff81000a755cc0
 00000000000000ff 0000000000000040 ffff81007ffbfe30 ffffffff800911d3
Call Trace:
 [<ffffffff8008e4c9>] cpu_attach_domain+0x4b/0xce
 [<ffffffff800911d3>] __build_sched_domains+0xd42/0x13d3
 [<ffffffff8009195a>] arch_init_sched_domains+0x2e/0x35
 [<ffffffff8047810a>] sched_init_smp+0x1e/0xa5
 [<ffffffff804659e8>] init+0x183/0x2f7
 [<ffffffff8005dfb1>] child_rip+0xa/0x11
 [<ffffffff8018681a>] acpi_ds_init_one_object+0x0/0x80
 [<ffffffff80465865>] init+0x0/0x2f7
 [<ffffffff8005dfa7>] child_rip+0x0/0x11


Code: 48 3b 00 75 08 31 d2 f6 c1 70 0f 94 c2 59 5b c9 89 d0 c3 55 
RIP  [<ffffffff8008cecd>] sd_degenerate+0x31/0x44
 RSP <ffff81007ffbfbe0>
CR2: 0000000000000000
 <0>Kernel panic - not syncing: Fatal exception
 
(In reply to comment #1)
> Check our product limits before testing:
> https://home.corp.redhat.com/wiki/enterprise-linux-product-limits. We do not
> support more then 64 cpus with any of this products and x86 variations do
> not even try to initialize all available vcpus.
> 
> Your "reproduce" command line has "smp 64,maxcpus=161". Does this mean you
> hit this issue with 64 cpus, or the command line is incorrect? Regardless I
> do not hit this issue (which looks like kernel bug) with much more than 64
> vcpus and rhel5. What host cpu do you have? Attach full console log.

you can login my testing host.
 ip:10.66.9.97 
 user/password:redhat/redhat
 image path:/home/
Comment 3 Gleb Natapov 2012-06-11 07:18:18 EDT
(In reply to comment #2)
> My "repdouce" command line should be "smp 65, maxcpus=161".
> the first reproduce it on AMD 6172 (48 cores)
> JUst can reproduce it in local host.
> AMD Phenom(tm) 9600B Quad-Core Processor, 4 cores.
> 
Do you see the same on Intel?

> This is full console log.
This is not full console log. To get full console log remove "rhgb quiet" from the kernel command line.
Comment 4 FuXiangChun 2012-06-12 07:52:42 EDT
I attached full console log in attachment. I don't hit this issue on Intel.
Comment 5 FuXiangChun 2012-06-12 07:53:30 EDT
Created attachment 591169 [details]
full log
Comment 6 Gleb Natapov 2012-06-14 03:13:52 EDT
I am closing the bug since this is not support configuration. It looks like guest kernel limitation.

Note You need to log in before you can comment on or make changes to this bug.