| Summary: | RHEL4.9 guest NMI watchdog detect LOCKUP during boot on RHEL6.1 AMD host | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Qingtang Zhou <qzhou> | ||||||
| Component: | qemu-kvm | Assignee: | Gleb Natapov <gleb> | ||||||
| Status: | CLOSED WONTFIX | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | 6.1 | CC: | knoel, michen, mkenneth, tburke, virt-maint | ||||||
| Target Milestone: | rc | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: |
nmi_watchdog is not working for rhel4 guest kernels
|
Story Points: | --- | ||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2011-06-03 18:09:37 UTC | Type: | --- | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Attachments: |
|
||||||||
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
New Contents:
nmi_watchdog is not working for rhel4 guest kernels
(In reply to comment #2) > Technical note added. If any revisions are required, please edit the > "Technical Notes" field > accordingly. All revisions will be proofread by the Engineering Content > Services team. > > New Contents: > nmi_watchdog is not working for rhel4 guest kernels I do not see how nmi_watchdog is to blame here. First of all cache=writethrough should not be used. Can you reproduce with cache=none? Does it hangs with virtio block too? (In reply to comment #4) > First of all cache=writethrough should not be used. Can you reproduce with > cache=none? Does it hangs with virtio block too? Hi Gleb, I have tried ide/virtio_blk drive with "cache=none", guest still hang during boot. btw, do you mean "cache=writethrough" should not be used for all RHEL4.9 guest with raw format image? or we should use 'cache=none' for all guest with raw image? (In reply to comment #5) > (In reply to comment #4) > > First of all cache=writethrough should not be used. Can you reproduce with > > cache=none? Does it hangs with virtio block too? > Hi Gleb, > I have tried ide/virtio_blk drive with "cache=none", guest still hang during > boot. Can you post dmesg of the hang with virtio_blk please? > > > btw, do you mean "cache=writethrough" should not be used for all RHEL4.9 guest > with raw format image? or we should use 'cache=none' for all guest with raw > image? IIRC we should always use cache=none regardless of disk format. Created attachment 487237 [details]
guest dmesg (virtio block)
(In reply to comment #7) > Created attachment 487237 [details] > guest dmesg (virtio block) I do not see any lockup message there. Can you post one with lockup message? BTW what "info status" and "info cpus" shows in monitor when it hang? (In reply to comment #8) > > I do not see any lockup message there. Can you post one with lockup message? > BTW what "info status" and "info cpus" shows in monitor when it hang? yep, it didn't lockup this time, I tried about 20 times today, no lockup occurred, but hang 80%. I'll continue try to let it lockup. monitor output: (qemu) info status info status VM status: running (qemu) info cpus info cpus * CPU #0: pc=0xffffffff80110919 thread_id=5052 CPU #1: pc=0xffffffff8011cd16 thread_id=5060 continue to try reboot guest, I find cpu halted: (qemu) info status info status VM status: running (qemu) info cpus info cpus * CPU #0: pc=0xffffffff80110919 thread_id=5170 CPU #1: pc=0xffffffff8010e7a9 (halted) thread_id=5171 (qemu) still no NMI lockup occur... use gdb to attach this cpu #1 thread: (gdb) bt #0 0x00000036122dde87 in ioctl () from /lib64/libc.so.6 #1 0x000000000042d03f in kvm_run (env=0x2c30010) at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:927 #2 0x000000000042d4c9 in kvm_cpu_exec (env=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:1663 #3 0x000000000042e20f in kvm_main_loop_cpu (_env=0x2c30010) at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:1931 #4 ap_main_loop (_env=0x2c30010) at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:1981 #5 0x00000036126077e1 in start_thread () from /lib64/libpthread.so.0 #6 0x00000036122e5dcd in clone () from /lib64/libc.so.6 (In reply to comment #11) > use gdb to attach this cpu #1 thread: > > (gdb) bt > #0 0x00000036122dde87 in ioctl () from /lib64/libc.so.6 > #1 0x000000000042d03f in kvm_run (env=0x2c30010) at > /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:927 > #2 0x000000000042d4c9 in kvm_cpu_exec (env=<value optimized out>) at > /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:1663 > #3 0x000000000042e20f in kvm_main_loop_cpu (_env=0x2c30010) at > /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:1931 > #4 ap_main_loop (_env=0x2c30010) at > /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:1981 > #5 0x00000036126077e1 in start_thread () from /lib64/libpthread.so.0 > #6 0x00000036122e5dcd in clone () from /lib64/libc.so.6 That is useless. Better do ftrace. nmi watchdog in a guest is not supported. Closing. |
Created attachment 486962 [details] guest dmesg Description of problem: RHEL4.9 guest has 10% probability kernel panic during boot, and 80% probability hang during boot. Version-Release number of selected component (if applicable): host: # rpm -q qemu-kvm qemu-kvm-0.12.1.2-2.150.el6.x86_64 # rpm -q kernel 2.6.32-120.el6.x86_64 How reproducible: 10% kernel panic 80% hang Steps to Reproduce: 1. start guest with cmd: qemu -name 'vm1' -chardev socket,id=human_monitor_MTkB,path=/tmp/monitor-humanmonitor1-20110319-152410-zRkw,server,nowait -mon chardev=human_monitor_MTkB,mode=readline -chardev socket,id=serial_Yxc0,path=/tmp/serial-20110319-152410-zRkw,server,nowait -device isa-serial,chardev=serial_Yxc0 -drive file='RHEL-4.9-64-virtio.raw',index=0,if=none,id=drive-ide0-0-0,media=disk,cache=writethrough,format=raw,aio=native -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -device virtio-net-pci,netdev=idOpH6lL,mac=9a:f6:24:35:ce:19,id=ndev00idOpH6lL,bus=pci.0,addr=0x3 -netdev tap,id=idOpH6lL,vhost=on,ifname='t0-152410-zRkw',script='qemu-ifup-switch',downscript='no' -m 4096 -smp 2,cores=1,threads=1,sockets=2 -cpu cpu64-rhel6,vendor="RED HAT PROD",+sse2,+x2apic -spice port=8000,disable-ticketing -vga qxl -rtc base=utc,clock=host,driftfix=none -M rhel6.1.0 -boot order=cdn,once=c,menu=off -usbdevice tablet -no-kvm-pit-reinjection -enable-kvm 2. guest hang (or kernel panic) Actual results: guest hang (or kernel panic) Expected results: guest runs well, no hang, no panic Additional info: guest call trace: NMI Watchdog detected LOCKUP, CPU=0, registers: CPU 0 Modules linked in: parport_pc lp parport autofs4 sunrpc ds yenta_socket pcmcia_core cpufreq_powersave loop joydev button battery ac uhci_hcd floppy virtio_blk virtio_net virtio_pci virtio_ring virtio dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod Pid: 468, comm: kjournald Not tainted 2.6.9-100.ELsmp RIP: 0010:[<ffffffff80110919>] <ffffffff80110919>{iret_label+0} RSP: 0000:ffffffff804732d8 EFLAGS: 00000086 RAX: 0000000000000001 RBX: ffffffff804e5de0 RCX: 00000000000001f6 RDX: 000000000000c000 RSI: 000000000000c000 RDI: 0000000000000001 RBP: ffffffff804e5f28 R08: 0000000000004e1f R09: 0000000000000000 R10: 0000010037dbc000 R11: ffffffff8027d272 R12: ffffffff804e5de0 R13: ffffffff804e5de0 R14: 0000000000000008 R15: 00000101190c5db0 FS: 0000002a95aad6e0(0000) GS:ffffffff80506900(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000002a958860ec CR3: 0000000000101000 CR4: 00000000000006e0 Process kjournald (pid: 468, threadinfo 000001011be3c000, task 0000010037dd9030) Stack: ffffffff80110919 0000000000000010 0000000000000086 ffffffff804732d8 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Call Trace:<ffffffff80110919>{iret_label+0} <EOE> <ffffffff80110919>{iret_label+0} Code: 48 cf 0f ba e2 03 73 1b fb 57 e8 62 43 20 00 5f 65 48 8b 0c Kernel panic - not syncing: nmi watchdog Badness in panic at kernel/panic.c:121 Call Trace:<ffffffff8013881d>{panic+558} <ffffffff801118dc>{show_stack+241} <ffffffff80111a06>{show_registers+277} <ffffffff80111d0d>{die_nmi+130} <ffffffff8011de1b>{nmi_watchdog_tick+276} <ffffffff801125de>{default_do_nmi+116} <ffffffff8011df05>{do_nmi+115} <ffffffff801111eb>{paranoid_exit+0} <ffffffff8027d272>{__ide_dma_begin+0} <ffffffff80110919>{iret_label+0} <EOE> <ffffffff80110919>{iret_label+0} Badness in i8042_panic_blink at drivers/input/serio/i8042.c:987 Call Trace:<ffffffff80249cf5>{i8042_panic_blink+239} <ffffffff801387cb>{panic+476} <ffffffff801118dc>{show_stack+241} <ffffffff80111a06>{show_registers+277} <ffffffff80111d0d>{die_nmi+130} <ffffffff8011de1b>{nmi_watchdog_tick+276} <ffffffff801125de>{default_do_nmi+116} <ffffffff8011df05>{do_nmi+115} <ffffffff801111eb>{paranoid_exit+0} <ffffffff8027d272>{__ide_dma_begin+0} <ffffffff80110919>{iret_label+0} <EOE> <ffffffff80110919>{iret_label+0} Badness in i8042_panic_blink at drivers/input/serio/i8042.c:990 Call Trace:<ffffffff80249d87>{i8042_panic_blink+385} <ffffffff801387cb>{panic+476} <ffffffff801118dc>{show_stack+241} <ffffffff80111a06>{show_registers+277} <ffffffff80111d0d>{die_nmi+130} <ffffffff8011de1b>{nmi_watchdog_tick+276} <ffffffff801125de>{default_do_nmi+116} <ffffffff8011df05>{do_nmi+115} <ffffffff801111eb>{paranoid_exit+0} <ffffffff8027d272>{__ide_dma_begin+0} <ffffffff80110919>{iret_label+0} <EOE> <ffffffff80110919>{iret_label+0} Badness in i8042_panic_blink at drivers/input/serio/i8042.c:992 Call Trace:<ffffffff80249dec>{i8042_panic_blink+486} <ffffffff801387cb>{panic+476} <ffffffff801118dc>{show_stack+241} <ffffffff80111a06>{show_registers+277} <ffffffff80111d0d>{die_nmi+130} <ffffffff8011de1b>{nmi_watchdog_tick+276} <ffffffff801125de>{default_do_nmi+116} <ffffffff8011df05>{do_nmi+115} <ffffffff801111eb>{paranoid_exit+0} <ffffffff8027d272>{__ide_dma_begin+0} <ffffffff80110919>{iret_label+0} <EOE> <ffffffff80110919>{iret_label+0}