Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 690044

Summary: RHEL4.9 guest NMI watchdog detect LOCKUP during boot on RHEL6.1 AMD host
Product: Red Hat Enterprise Linux 6 Reporter: Qingtang Zhou <qzhou>
Component: qemu-kvmAssignee: Gleb Natapov <gleb>
Status: CLOSED WONTFIX QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 6.1CC: knoel, michen, mkenneth, tburke, virt-maint
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
nmi_watchdog is not working for rhel4 guest kernels
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-06-03 18:09:37 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
guest dmesg
none
guest dmesg (virtio block) none

Description Qingtang Zhou 2011-03-23 06:08:31 UTC
Created attachment 486962 [details]
guest dmesg

Description of problem:
RHEL4.9 guest has 10% probability kernel panic during boot, and 80% probability hang during boot.

Version-Release number of selected component (if applicable):
host:
# rpm -q qemu-kvm
qemu-kvm-0.12.1.2-2.150.el6.x86_64
# rpm -q kernel
2.6.32-120.el6.x86_64


How reproducible:
10% kernel panic
80% hang

Steps to Reproduce:
1. start guest with cmd:
qemu -name 'vm1' 
-chardev socket,id=human_monitor_MTkB,path=/tmp/monitor-humanmonitor1-20110319-152410-zRkw,server,nowait 
-mon chardev=human_monitor_MTkB,mode=readline 
-chardev socket,id=serial_Yxc0,path=/tmp/serial-20110319-152410-zRkw,server,nowait 
-device isa-serial,chardev=serial_Yxc0 -drive file='RHEL-4.9-64-virtio.raw',index=0,if=none,id=drive-ide0-0-0,media=disk,cache=writethrough,format=raw,aio=native 
-device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 
-device virtio-net-pci,netdev=idOpH6lL,mac=9a:f6:24:35:ce:19,id=ndev00idOpH6lL,bus=pci.0,addr=0x3 
-netdev tap,id=idOpH6lL,vhost=on,ifname='t0-152410-zRkw',script='qemu-ifup-switch',downscript='no' 
-m 4096 -smp 2,cores=1,threads=1,sockets=2 
-cpu cpu64-rhel6,vendor="RED HAT PROD",+sse2,+x2apic 
-spice port=8000,disable-ticketing -vga qxl -rtc base=utc,clock=host,driftfix=none 
-M rhel6.1.0 -boot order=cdn,once=c,menu=off   -usbdevice tablet -no-kvm-pit-reinjection -enable-kvm

2. guest hang (or kernel panic)

  
Actual results:
guest hang (or kernel panic)

Expected results:
guest runs well, no hang, no panic

Additional info:
guest call trace:

NMI Watchdog detected LOCKUP, CPU=0, registers:
CPU 0
Modules linked in: parport_pc lp parport autofs4 sunrpc ds yenta_socket pcmcia_core cpufreq_powersave loop joydev button battery ac uhci_hcd floppy virtio_blk virtio_net virtio_pci virtio_ring virtio dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod
Pid: 468, comm: kjournald Not tainted 2.6.9-100.ELsmp
RIP: 0010:[<ffffffff80110919>] <ffffffff80110919>{iret_label+0}
RSP: 0000:ffffffff804732d8  EFLAGS: 00000086
RAX: 0000000000000001 RBX: ffffffff804e5de0 RCX: 00000000000001f6
RDX: 000000000000c000 RSI: 000000000000c000 RDI: 0000000000000001
RBP: ffffffff804e5f28 R08: 0000000000004e1f R09: 0000000000000000
R10: 0000010037dbc000 R11: ffffffff8027d272 R12: ffffffff804e5de0
R13: ffffffff804e5de0 R14: 0000000000000008 R15: 00000101190c5db0
FS:  0000002a95aad6e0(0000) GS:ffffffff80506900(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000002a958860ec CR3: 0000000000101000 CR4: 00000000000006e0
Process kjournald (pid: 468, threadinfo 000001011be3c000, task 0000010037dd9030)
Stack: ffffffff80110919 0000000000000010 0000000000000086 ffffffff804732d8
       0000000000000000 0000000000000000 0000000000000000 0000000000000000
       0000000000000000 0000000000000000
Call Trace:<ffffffff80110919>{iret_label+0}  <EOE> <ffffffff80110919>{iret_label+0}


Code: 48 cf 0f ba e2 03 73 1b fb 57 e8 62 43 20 00 5f 65 48 8b 0c
Kernel panic - not syncing: nmi watchdog
 Badness in panic at kernel/panic.c:121

Call Trace:<ffffffff8013881d>{panic+558} <ffffffff801118dc>{show_stack+241}
<ffffffff80111a06>{show_registers+277} <ffffffff80111d0d>{die_nmi+130}
<ffffffff8011de1b>{nmi_watchdog_tick+276} <ffffffff801125de>{default_do_nmi+116}
<ffffffff8011df05>{do_nmi+115} <ffffffff801111eb>{paranoid_exit+0}
<ffffffff8027d272>{__ide_dma_begin+0} <ffffffff80110919>{iret_label+0}
 <EOE> <ffffffff80110919>{iret_label+0}
Badness in i8042_panic_blink at drivers/input/serio/i8042.c:987

Call Trace:<ffffffff80249cf5>{i8042_panic_blink+239} <ffffffff801387cb>{panic+476}
<ffffffff801118dc>{show_stack+241} <ffffffff80111a06>{show_registers+277}
<ffffffff80111d0d>{die_nmi+130} <ffffffff8011de1b>{nmi_watchdog_tick+276}
<ffffffff801125de>{default_do_nmi+116} <ffffffff8011df05>{do_nmi+115}
<ffffffff801111eb>{paranoid_exit+0} <ffffffff8027d272>{__ide_dma_begin+0}
<ffffffff80110919>{iret_label+0}  <EOE> <ffffffff80110919>{iret_label+0}

Badness in i8042_panic_blink at drivers/input/serio/i8042.c:990

Call Trace:<ffffffff80249d87>{i8042_panic_blink+385} <ffffffff801387cb>{panic+476}
<ffffffff801118dc>{show_stack+241} <ffffffff80111a06>{show_registers+277}
<ffffffff80111d0d>{die_nmi+130} <ffffffff8011de1b>{nmi_watchdog_tick+276}
<ffffffff801125de>{default_do_nmi+116} <ffffffff8011df05>{do_nmi+115}
<ffffffff801111eb>{paranoid_exit+0} <ffffffff8027d272>{__ide_dma_begin+0}
<ffffffff80110919>{iret_label+0}  <EOE> <ffffffff80110919>{iret_label+0}

Badness in i8042_panic_blink at drivers/input/serio/i8042.c:992

Call Trace:<ffffffff80249dec>{i8042_panic_blink+486} <ffffffff801387cb>{panic+476}
<ffffffff801118dc>{show_stack+241} <ffffffff80111a06>{show_registers+277}
<ffffffff80111d0d>{die_nmi+130} <ffffffff8011de1b>{nmi_watchdog_tick+276}
<ffffffff801125de>{default_do_nmi+116} <ffffffff8011df05>{do_nmi+115}
<ffffffff801111eb>{paranoid_exit+0} <ffffffff8027d272>{__ide_dma_begin+0}
<ffffffff80110919>{iret_label+0}  <EOE> <ffffffff80110919>{iret_label+0}

Comment 2 Dor Laor 2011-03-23 14:14:54 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
nmi_watchdog is not working for rhel4 guest kernels

Comment 3 Gleb Natapov 2011-03-23 16:29:52 UTC
(In reply to comment #2)
>     Technical note added. If any revisions are required, please edit the
> "Technical Notes" field
>     accordingly. All revisions will be proofread by the Engineering Content
> Services team.
> 
>     New Contents:
> nmi_watchdog is not working for rhel4 guest kernels

I do not see how nmi_watchdog is to blame here.

Comment 4 Gleb Natapov 2011-03-23 16:31:05 UTC
First of all cache=writethrough should not be used. Can you reproduce with cache=none? Does it hangs with virtio block too?

Comment 5 Qingtang Zhou 2011-03-24 05:05:24 UTC
(In reply to comment #4)
> First of all cache=writethrough should not be used. Can you reproduce with
> cache=none? Does it hangs with virtio block too?
Hi Gleb,
I have tried ide/virtio_blk drive with "cache=none", guest still hang during boot.


btw, do you mean "cache=writethrough" should not be used for all RHEL4.9 guest with raw format image? or we should use 'cache=none' for all guest with raw image?

Comment 6 Gleb Natapov 2011-03-24 05:10:37 UTC
(In reply to comment #5)
> (In reply to comment #4)
> > First of all cache=writethrough should not be used. Can you reproduce with
> > cache=none? Does it hangs with virtio block too?
> Hi Gleb,
> I have tried ide/virtio_blk drive with "cache=none", guest still hang during
> boot.
Can you post dmesg of the hang with virtio_blk please?

> 
> 
> btw, do you mean "cache=writethrough" should not be used for all RHEL4.9 guest
> with raw format image? or we should use 'cache=none' for all guest with raw
> image?

IIRC we should always use cache=none regardless of disk format.

Comment 7 Qingtang Zhou 2011-03-24 07:20:44 UTC
Created attachment 487237 [details]
guest dmesg (virtio block)

Comment 8 Gleb Natapov 2011-03-24 07:40:37 UTC
(In reply to comment #7)
> Created attachment 487237 [details]
> guest dmesg (virtio block)

I do not see any lockup message there. Can you post one with lockup message? BTW what "info status"  and "info cpus" shows in monitor when it hang?

Comment 9 Qingtang Zhou 2011-03-24 08:02:48 UTC
(In reply to comment #8)
> 
> I do not see any lockup message there. Can you post one with lockup message?
> BTW what "info status"  and "info cpus" shows in monitor when it hang?

yep, it didn't lockup this time, I tried about 20 times today, no lockup occurred, but hang 80%. I'll continue try to let it lockup.

monitor output:
(qemu) info status
info status
VM status: running
(qemu) info cpus
info cpus
* CPU #0: pc=0xffffffff80110919 thread_id=5052 
  CPU #1: pc=0xffffffff8011cd16 thread_id=5060

Comment 10 Qingtang Zhou 2011-03-24 09:15:10 UTC
continue to try reboot guest, I find cpu halted:

(qemu) info status
info status
VM status: running
(qemu) info cpus
info cpus
* CPU #0: pc=0xffffffff80110919 thread_id=5170 
  CPU #1: pc=0xffffffff8010e7a9 (halted) thread_id=5171 
(qemu)


still no NMI lockup occur...

Comment 11 Qingtang Zhou 2011-03-24 10:10:08 UTC
use gdb to attach this cpu #1 thread:

(gdb) bt
#0  0x00000036122dde87 in ioctl () from /lib64/libc.so.6
#1  0x000000000042d03f in kvm_run (env=0x2c30010) at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:927
#2  0x000000000042d4c9 in kvm_cpu_exec (env=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:1663
#3  0x000000000042e20f in kvm_main_loop_cpu (_env=0x2c30010) at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:1931
#4  ap_main_loop (_env=0x2c30010) at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:1981
#5  0x00000036126077e1 in start_thread () from /lib64/libpthread.so.0
#6  0x00000036122e5dcd in clone () from /lib64/libc.so.6

Comment 12 Gleb Natapov 2011-03-24 10:16:12 UTC
(In reply to comment #11)
> use gdb to attach this cpu #1 thread:
> 
> (gdb) bt
> #0  0x00000036122dde87 in ioctl () from /lib64/libc.so.6
> #1  0x000000000042d03f in kvm_run (env=0x2c30010) at
> /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:927
> #2  0x000000000042d4c9 in kvm_cpu_exec (env=<value optimized out>) at
> /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:1663
> #3  0x000000000042e20f in kvm_main_loop_cpu (_env=0x2c30010) at
> /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:1931
> #4  ap_main_loop (_env=0x2c30010) at
> /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:1981
> #5  0x00000036126077e1 in start_thread () from /lib64/libpthread.so.0
> #6  0x00000036122e5dcd in clone () from /lib64/libc.so.6

That is useless. Better do ftrace.

Comment 14 Gleb Natapov 2011-06-03 18:09:37 UTC
nmi watchdog in a guest is not supported. Closing.