Hide Forgot
Description of problem: In the RHEL6.1 host, boot up a RHEL5.6 guest vm with more than 1 vcpu and nmi_watchdog=1,the guest might hang or kernel panic in the process of bootup Version-Release number of selected component (if applicable): Host Kernel: 2.6.32-121.el6.x86_64 Guest Kernel: 2.6.18-247.el5 KVM Version: qemu-kvm-debuginfo-0.12.1.2-2.150.el6.x86_64 qemu-kvm-tools-0.12.1.2-2.150.el6.x86_64 qemu-kvm-0.12.1.2-2.150.el6.x86_64 How reproducible: guest hang in bootup - 7 out of 300 times guest kernel panic in bootup - 1 out of 300 times Steps to Reproduce: 1. Bootup a normal RHEL5.6 guest 2. Adding nmi_watchdog=1 to kernel line 3. reboot the RHEL5.6 guest Actual results: after adding nmi_watchdog=1, the guest might hang sometimes in the boot up process. Expected results: the guest should boot up normally Additional info: 1 CommandLine: qemu-kvm -name 'vm1' -chardev socket,id=human_monitor_kMoF,path=/tmp/monitor-humanmonitor1-20110315-134747-luGU,server,nowait -mon chardev=human_monitor_kMoF,mode=readline -chardev socket,id=serial_T9FN,path=/tmp/serial-20110315-134747-luGU,server,nowait -device isa-serial,chardev=serial_T9FN -drive file='/home/kvm-qe/autotest/client/tests/kvm/images/RHEL-Server-5.6-64.raw',index=0,if=none,id=drive-ide0-0-0,media=disk,cache=writethrough,snapshot=on,format=raw,aio=native -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -device e1000,netdev=idM8ohr1,mac=9a:54:b1:a9:34:c7,netdev=idM8ohr1,id=ndev00idM8ohr1,bus=pci.0,addr=0x3 -netdev tap,id=idM8ohr1,ifname='t0-134747-luGU',script='/home/kvm-qe/autotest/client/tests/kvm/scripts/qemu-ifup-switch',downscript='no' -m 8192 -smp 4,cores=1,threads=1,sockets=4 -cpu cpu64-rhel6,+sse2,+x2apic -vnc :0 -rtc base=utc,clock=host,driftfix=none -boot order=cdn,once=c,menu=off -usbdevice tablet -no-kvm-pit-reinjection -enable-kvm 2 Host CPU Info: processor : 1 vendor_id : AuthenticAMD cpu family : 15 model : 67 model name : Dual-Core AMD Opteron(tm) Processor 1216 stepping : 3 cpu MHz : 1000.000 cache size : 1024 KB physical id : 0 siblings : 2 core id : 1 cpu cores : 2 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow rep_good extd_apicid pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy bogomips : 2009.10 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp tm stc 3 Serial Output: 3.1 the guest hang during the startup process Last few lines of serial output in this scenario: 2011-03-16 22:25:56: uhci_hcd 0000:00:01.2: UHCI Host Controller 2011-03-16 22:25:56: uhci_hcd 0000:00:01.2: new USB bus registered, assigned bus number 1 2011-03-16 22:25:56: uhci_hcd 0000:00:01.2: irq 11, io base 0x0000c020 2011-03-16 22:25:56: usb usb1: configuration #1 chosen from 1 choice 2011-03-16 22:25:56: hub 1-0:1.0: USB hub found 2011-03-16 22:25:56: hub 1-0:1.0: 2 ports detected 2011-03-16 22:25:56: input: ImExPS/2 Generic Explorer Mouse as /class/input/input1 2011-03-16 22:25:56: usb 1-1: new full speed USB device using uhci_hcd and address 2 2011-03-16 22:25:56: SCSI subsystem initialized 2011-03-16 22:25:56: usb 1-1: configuration #1 chosen from 1 choice 2011-03-16 22:25:56: input: QEMU 0.12.1 QEMU USB Tablet as /class/input/input2 2011-03-16 22:25:56: input: USB HID v0.01 Pointer [QEMU 0.12.1 QEMU USB Tablet] on usb-0000:00:01.2-1 2011-03-16 22:25:56: device-mapper: uevent: version 1.0.3 2011-03-16 22:25:56: device-mapper: ioctl: 4.11.5-ioctl (2007-12-12) initialised: dm-devel 2011-03-16 22:25:57: device-mapper: dm-raid45: initialized v0.2594l 2011-03-16 22:26:18: kjournald starting. Commit interval 5 seconds 2011-03-16 22:26:18: EXT3-fs: mounted filesystem with ordered data mode. 2011-03-16 22:26:18: type=1404 audit(1300285577.849:2): enforcing=1 old_enforcing=0 auid=4294967295 ses=4294967295 2011-03-16 22:26:18: type=1403 audit(1300285578.112:3): policy loaded auid=4294967295 ses=4294967295 3.2 Guest Kernel Panic 2011-03-22 04:46:02: TCP bic registered 2011-03-22 04:46:02: Initializing IPsec netlink socket 2011-03-22 04:46:02: input: AT Translated Set 2 keyboard as /class/input/input0 2011-03-22 04:46:02: NET: Registered protocol family 1 2011-03-22 04:46:02: NET: Registered protocol family 17 2011-03-22 04:46:02: ACPI: (supports S3 S4 S5) 2011-03-22 04:46:02: Initalizing network drop monitor service 2011-03-22 04:46:02: Freeing unused kernel memory: 224k freed 2011-03-22 04:46:02: Write protecting the kernel read-only data: 520k 2011-03-22 04:46:02: input: ImExPS/2 Generic Explorer Mouse as /class/input/input1 2011-03-22 04:46:12: Kernel panic - not syncing: Attempted to kill init! 2011-03-22 04:46:12:
After more times of autotest, I discovered that the first phenomenon "guest hang during the startup process",which happens 7 out of 300 times,is a file system break and can be fixed manually by fsck. And Currently there is 1 time of Guest Kernel Panic, serial information can be found in section 3.2 of comment 1.
Is it a guest file system issue? Or a host file system issue? What's the cause?
It's a guest file system issue. I am not clear about the reason why the guest would suffer from the file system issue. And the whole process is to first boot a normal guest, then set the nmi_watchdog, and finally reboot the guest, the guest might sometimes have a file system broken in the process of boot up. I could also upload the full serial logs and screen dumps as attachments for reference.(both the file system broken one and kernel panic one)
Created attachment 487209 [details] serial log for guest file-system broken
Created attachment 487211 [details] screen dump for guest file system broken
Created attachment 487213 [details] serial log for guest kernel panic
Created attachment 487214 [details] screen dump for guest kernel panic
(In reply to comment #8) > Created attachment 487214 [details] > screen dump for guest kernel panic This one is also due to guest file system corruption. Panic happens because files system can't be mounted.
Don't see the breakage in the logs.
nmi_watchdog=1 is not supported. In addition it turned out that the hang is due to guest fs corruption anf not NMI watchdog. Closing.