Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 689676

Summary: [AMD] RHEL5.6 SMP guest VM hang or kernel panic in bootup after setting nmi_watchdog=1
Product: Red Hat Enterprise Linux 6 Reporter: yacui
Component: qemu-kvmAssignee: Gleb Natapov <gleb>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.1CC: knoel, mkenneth, tburke, virt-maint, ypu
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-06-03 18:07:04 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 580954    
Attachments:
Description Flags
serial log for guest file-system broken
none
screen dump for guest file system broken
none
serial log for guest kernel panic
none
screen dump for guest kernel panic none

Description yacui 2011-03-22 05:22:49 UTC
Description of problem:
In the RHEL6.1 host, boot up a RHEL5.6 guest vm with more than 1 vcpu and nmi_watchdog=1,the guest might hang or kernel panic in the process of bootup

Version-Release number of selected component (if applicable):
Host Kernel:
2.6.32-121.el6.x86_64
Guest Kernel:
2.6.18-247.el5
KVM Version:
qemu-kvm-debuginfo-0.12.1.2-2.150.el6.x86_64
qemu-kvm-tools-0.12.1.2-2.150.el6.x86_64
qemu-kvm-0.12.1.2-2.150.el6.x86_64

How reproducible:
guest hang in bootup - 7 out of 300 times
guest kernel panic in bootup - 1 out of 300 times

Steps to Reproduce:
1. Bootup a normal RHEL5.6 guest
2. Adding nmi_watchdog=1 to kernel line
3. reboot the RHEL5.6 guest
  
Actual results:
after adding nmi_watchdog=1, the guest might hang sometimes in the boot up process.

Expected results:
the guest should boot up normally

Additional info:
1 CommandLine:
qemu-kvm -name 'vm1' -chardev socket,id=human_monitor_kMoF,path=/tmp/monitor-humanmonitor1-20110315-134747-luGU,server,nowait -mon chardev=human_monitor_kMoF,mode=readline -chardev socket,id=serial_T9FN,path=/tmp/serial-20110315-134747-luGU,server,nowait -device isa-serial,chardev=serial_T9FN -drive file='/home/kvm-qe/autotest/client/tests/kvm/images/RHEL-Server-5.6-64.raw',index=0,if=none,id=drive-ide0-0-0,media=disk,cache=writethrough,snapshot=on,format=raw,aio=native -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -device e1000,netdev=idM8ohr1,mac=9a:54:b1:a9:34:c7,netdev=idM8ohr1,id=ndev00idM8ohr1,bus=pci.0,addr=0x3 -netdev tap,id=idM8ohr1,ifname='t0-134747-luGU',script='/home/kvm-qe/autotest/client/tests/kvm/scripts/qemu-ifup-switch',downscript='no' -m 8192 -smp 4,cores=1,threads=1,sockets=4 -cpu cpu64-rhel6,+sse2,+x2apic -vnc :0 -rtc base=utc,clock=host,driftfix=none  -boot order=cdn,once=c,menu=off   -usbdevice tablet -no-kvm-pit-reinjection -enable-kvm

2 Host CPU Info:
processor : 1
vendor_id : AuthenticAMD
cpu family : 15
model  : 67
model name : Dual-Core AMD Opteron(tm) Processor 1216
stepping : 3
cpu MHz  : 1000.000
cache size : 1024 KB
physical id : 0
siblings : 2
core id  : 1
cpu cores : 2
apicid  : 1
initial apicid : 1
fpu  : yes
fpu_exception : yes
cpuid level : 1
wp  : yes
flags  : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36
clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext
3dnow rep_good extd_apicid pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy
bogomips : 2009.10
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc

3 Serial Output:
3.1 the guest hang during the startup process 
Last few lines of serial output in this scenario:
2011-03-16 22:25:56: uhci_hcd 0000:00:01.2: UHCI Host Controller
2011-03-16 22:25:56: uhci_hcd 0000:00:01.2: new USB bus registered, assigned
bus number 1
2011-03-16 22:25:56: uhci_hcd 0000:00:01.2: irq 11, io base 0x0000c020
2011-03-16 22:25:56: usb usb1: configuration #1 chosen from 1 choice
2011-03-16 22:25:56: hub 1-0:1.0: USB hub found
2011-03-16 22:25:56: hub 1-0:1.0: 2 ports detected
2011-03-16 22:25:56: input: ImExPS/2 Generic Explorer Mouse as
/class/input/input1
2011-03-16 22:25:56: usb 1-1: new full speed USB device using uhci_hcd and
address 2
2011-03-16 22:25:56: SCSI subsystem initialized
2011-03-16 22:25:56: usb 1-1: configuration #1 chosen from 1 choice
2011-03-16 22:25:56: input: QEMU 0.12.1 QEMU USB Tablet as /class/input/input2
2011-03-16 22:25:56: input: USB HID v0.01 Pointer [QEMU 0.12.1 QEMU USB Tablet]
on usb-0000:00:01.2-1
2011-03-16 22:25:56: device-mapper: uevent: version 1.0.3
2011-03-16 22:25:56: device-mapper: ioctl: 4.11.5-ioctl (2007-12-12)
initialised: dm-devel
2011-03-16 22:25:57: device-mapper: dm-raid45: initialized v0.2594l
2011-03-16 22:26:18: kjournald starting.  Commit interval 5 seconds
2011-03-16 22:26:18: EXT3-fs: mounted filesystem with ordered data mode.
2011-03-16 22:26:18: type=1404 audit(1300285577.849:2): enforcing=1
old_enforcing=0 auid=4294967295 ses=4294967295
2011-03-16 22:26:18: type=1403 audit(1300285578.112:3): policy loaded
auid=4294967295 ses=4294967295

3.2 Guest Kernel Panic
2011-03-22 04:46:02: TCP bic registered
2011-03-22 04:46:02: Initializing IPsec netlink socket
2011-03-22 04:46:02: input: AT Translated Set 2 keyboard as /class/input/input0
2011-03-22 04:46:02: NET: Registered protocol family 1
2011-03-22 04:46:02: NET: Registered protocol family 17
2011-03-22 04:46:02: ACPI: (supports S3 S4 S5)
2011-03-22 04:46:02: Initalizing network drop monitor service
2011-03-22 04:46:02: Freeing unused kernel memory: 224k freed
2011-03-22 04:46:02: Write protecting the kernel read-only data: 520k
2011-03-22 04:46:02: input: ImExPS/2 Generic Explorer Mouse as /class/input/input1
2011-03-22 04:46:12: Kernel panic - not syncing: Attempted to kill init!
2011-03-22 04:46:12:

Comment 2 yacui 2011-03-23 08:14:36 UTC
After more times of autotest, I discovered that the first phenomenon "guest hang during the startup process",which happens 7 out of 300 times,is a file system break and can be fixed manually by fsck.

And Currently there is 1 time of Guest Kernel Panic, serial information can be found in section 3.2 of comment 1.

Comment 3 Avi Kivity 2011-03-23 12:13:59 UTC
Is it a guest file system issue?  Or a host file system issue?  What's the cause?

Comment 4 yacui 2011-03-24 05:48:55 UTC
It's a guest file system issue. I am not clear about the reason why the guest would suffer from the file system issue. And the whole process is to first boot a normal guest, then set the nmi_watchdog, and finally reboot the guest, the guest might sometimes have a file system broken in the process of boot up.

I could also upload the full serial logs and screen dumps as attachments for reference.(both the file system broken one and kernel panic one)

Comment 5 yacui 2011-03-24 05:50:44 UTC
Created attachment 487209 [details]
serial log for guest file-system broken

Comment 6 yacui 2011-03-24 05:52:04 UTC
Created attachment 487211 [details]
screen dump for guest file system broken

Comment 7 yacui 2011-03-24 05:53:14 UTC
Created attachment 487213 [details]
serial log for guest kernel panic

Comment 8 yacui 2011-03-24 05:54:30 UTC
Created attachment 487214 [details]
screen dump for guest kernel panic

Comment 9 Gleb Natapov 2011-03-24 09:53:57 UTC
(In reply to comment #8)
> Created attachment 487214 [details]
> screen dump for guest kernel panic

This one is also due to guest file system corruption. Panic happens because files system can't be mounted.

Comment 10 Avi Kivity 2011-03-24 14:18:31 UTC
Don't see the breakage in the logs.

Comment 12 Gleb Natapov 2011-06-03 18:07:04 UTC
nmi_watchdog=1 is not supported. In addition it turned out that the hang is due to guest fs corruption anf not NMI watchdog. Closing.