Bug 689676 - [AMD] RHEL5.6 SMP guest VM hang or kernel panic in bootup after setting nmi_watchdog=1
Summary: [AMD] RHEL5.6 SMP guest VM hang or kernel panic in bootup after setting nmi_w...
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: qemu-kvm
Version: 6.1
Hardware: Unspecified
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Gleb Natapov
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks: 580954
TreeView+ depends on / blocked
 
Reported: 2011-03-22 05:22 UTC by yacui
Modified: 2013-01-09 23:41 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-06-03 18:07:04 UTC
Target Upstream Version:


Attachments (Terms of Use)
serial log for guest file-system broken (32.74 KB, text/plain)
2011-03-24 05:50 UTC, yacui
no flags Details
screen dump for guest file system broken (33.55 KB, image/jpeg)
2011-03-24 05:52 UTC, yacui
no flags Details
serial log for guest kernel panic (31.42 KB, text/plain)
2011-03-24 05:53 UTC, yacui
no flags Details
screen dump for guest kernel panic (19.02 KB, image/png)
2011-03-24 05:54 UTC, yacui
no flags Details

Description yacui 2011-03-22 05:22:49 UTC
Description of problem:
In the RHEL6.1 host, boot up a RHEL5.6 guest vm with more than 1 vcpu and nmi_watchdog=1,the guest might hang or kernel panic in the process of bootup

Version-Release number of selected component (if applicable):
Host Kernel:
2.6.32-121.el6.x86_64
Guest Kernel:
2.6.18-247.el5
KVM Version:
qemu-kvm-debuginfo-0.12.1.2-2.150.el6.x86_64
qemu-kvm-tools-0.12.1.2-2.150.el6.x86_64
qemu-kvm-0.12.1.2-2.150.el6.x86_64

How reproducible:
guest hang in bootup - 7 out of 300 times
guest kernel panic in bootup - 1 out of 300 times

Steps to Reproduce:
1. Bootup a normal RHEL5.6 guest
2. Adding nmi_watchdog=1 to kernel line
3. reboot the RHEL5.6 guest
  
Actual results:
after adding nmi_watchdog=1, the guest might hang sometimes in the boot up process.

Expected results:
the guest should boot up normally

Additional info:
1 CommandLine:
qemu-kvm -name 'vm1' -chardev socket,id=human_monitor_kMoF,path=/tmp/monitor-humanmonitor1-20110315-134747-luGU,server,nowait -mon chardev=human_monitor_kMoF,mode=readline -chardev socket,id=serial_T9FN,path=/tmp/serial-20110315-134747-luGU,server,nowait -device isa-serial,chardev=serial_T9FN -drive file='/home/kvm-qe/autotest/client/tests/kvm/images/RHEL-Server-5.6-64.raw',index=0,if=none,id=drive-ide0-0-0,media=disk,cache=writethrough,snapshot=on,format=raw,aio=native -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -device e1000,netdev=idM8ohr1,mac=9a:54:b1:a9:34:c7,netdev=idM8ohr1,id=ndev00idM8ohr1,bus=pci.0,addr=0x3 -netdev tap,id=idM8ohr1,ifname='t0-134747-luGU',script='/home/kvm-qe/autotest/client/tests/kvm/scripts/qemu-ifup-switch',downscript='no' -m 8192 -smp 4,cores=1,threads=1,sockets=4 -cpu cpu64-rhel6,+sse2,+x2apic -vnc :0 -rtc base=utc,clock=host,driftfix=none  -boot order=cdn,once=c,menu=off   -usbdevice tablet -no-kvm-pit-reinjection -enable-kvm

2 Host CPU Info:
processor : 1
vendor_id : AuthenticAMD
cpu family : 15
model  : 67
model name : Dual-Core AMD Opteron(tm) Processor 1216
stepping : 3
cpu MHz  : 1000.000
cache size : 1024 KB
physical id : 0
siblings : 2
core id  : 1
cpu cores : 2
apicid  : 1
initial apicid : 1
fpu  : yes
fpu_exception : yes
cpuid level : 1
wp  : yes
flags  : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36
clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext
3dnow rep_good extd_apicid pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy
bogomips : 2009.10
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc

3 Serial Output:
3.1 the guest hang during the startup process 
Last few lines of serial output in this scenario:
2011-03-16 22:25:56: uhci_hcd 0000:00:01.2: UHCI Host Controller
2011-03-16 22:25:56: uhci_hcd 0000:00:01.2: new USB bus registered, assigned
bus number 1
2011-03-16 22:25:56: uhci_hcd 0000:00:01.2: irq 11, io base 0x0000c020
2011-03-16 22:25:56: usb usb1: configuration #1 chosen from 1 choice
2011-03-16 22:25:56: hub 1-0:1.0: USB hub found
2011-03-16 22:25:56: hub 1-0:1.0: 2 ports detected
2011-03-16 22:25:56: input: ImExPS/2 Generic Explorer Mouse as
/class/input/input1
2011-03-16 22:25:56: usb 1-1: new full speed USB device using uhci_hcd and
address 2
2011-03-16 22:25:56: SCSI subsystem initialized
2011-03-16 22:25:56: usb 1-1: configuration #1 chosen from 1 choice
2011-03-16 22:25:56: input: QEMU 0.12.1 QEMU USB Tablet as /class/input/input2
2011-03-16 22:25:56: input: USB HID v0.01 Pointer [QEMU 0.12.1 QEMU USB Tablet]
on usb-0000:00:01.2-1
2011-03-16 22:25:56: device-mapper: uevent: version 1.0.3
2011-03-16 22:25:56: device-mapper: ioctl: 4.11.5-ioctl (2007-12-12)
initialised: dm-devel
2011-03-16 22:25:57: device-mapper: dm-raid45: initialized v0.2594l
2011-03-16 22:26:18: kjournald starting.  Commit interval 5 seconds
2011-03-16 22:26:18: EXT3-fs: mounted filesystem with ordered data mode.
2011-03-16 22:26:18: type=1404 audit(1300285577.849:2): enforcing=1
old_enforcing=0 auid=4294967295 ses=4294967295
2011-03-16 22:26:18: type=1403 audit(1300285578.112:3): policy loaded
auid=4294967295 ses=4294967295

3.2 Guest Kernel Panic
2011-03-22 04:46:02: TCP bic registered
2011-03-22 04:46:02: Initializing IPsec netlink socket
2011-03-22 04:46:02: input: AT Translated Set 2 keyboard as /class/input/input0
2011-03-22 04:46:02: NET: Registered protocol family 1
2011-03-22 04:46:02: NET: Registered protocol family 17
2011-03-22 04:46:02: ACPI: (supports S3 S4 S5)
2011-03-22 04:46:02: Initalizing network drop monitor service
2011-03-22 04:46:02: Freeing unused kernel memory: 224k freed
2011-03-22 04:46:02: Write protecting the kernel read-only data: 520k
2011-03-22 04:46:02: input: ImExPS/2 Generic Explorer Mouse as /class/input/input1
2011-03-22 04:46:12: Kernel panic - not syncing: Attempted to kill init!
2011-03-22 04:46:12:

Comment 2 yacui 2011-03-23 08:14:36 UTC
After more times of autotest, I discovered that the first phenomenon "guest hang during the startup process",which happens 7 out of 300 times,is a file system break and can be fixed manually by fsck.

And Currently there is 1 time of Guest Kernel Panic, serial information can be found in section 3.2 of comment 1.

Comment 3 Avi Kivity 2011-03-23 12:13:59 UTC
Is it a guest file system issue?  Or a host file system issue?  What's the cause?

Comment 4 yacui 2011-03-24 05:48:55 UTC
It's a guest file system issue. I am not clear about the reason why the guest would suffer from the file system issue. And the whole process is to first boot a normal guest, then set the nmi_watchdog, and finally reboot the guest, the guest might sometimes have a file system broken in the process of boot up.

I could also upload the full serial logs and screen dumps as attachments for reference.(both the file system broken one and kernel panic one)

Comment 5 yacui 2011-03-24 05:50:44 UTC
Created attachment 487209 [details]
serial log for guest file-system broken

Comment 6 yacui 2011-03-24 05:52:04 UTC
Created attachment 487211 [details]
screen dump for guest file system broken

Comment 7 yacui 2011-03-24 05:53:14 UTC
Created attachment 487213 [details]
serial log for guest kernel panic

Comment 8 yacui 2011-03-24 05:54:30 UTC
Created attachment 487214 [details]
screen dump for guest kernel panic

Comment 9 Gleb Natapov 2011-03-24 09:53:57 UTC
(In reply to comment #8)
> Created attachment 487214 [details]
> screen dump for guest kernel panic

This one is also due to guest file system corruption. Panic happens because files system can't be mounted.

Comment 10 Avi Kivity 2011-03-24 14:18:31 UTC
Don't see the breakage in the logs.

Comment 12 Gleb Natapov 2011-06-03 18:07:04 UTC
nmi_watchdog=1 is not supported. In addition it turned out that the hang is due to guest fs corruption anf not NMI watchdog. Closing.


Note You need to log in before you can comment on or make changes to this bug.