Bug 616337

Summary: guest hang when boot with nmi_watch=1
Product: Red Hat Enterprise Linux 6 Reporter: Suqin Huang <shuang>
Component: kernelAssignee: Red Hat Kernel Manager <kernel-mgr>
Status: CLOSED DUPLICATE QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: low    
Version: 6.1CC: dzickus, llim, ndai, tburke
Target Milestone: rcKeywords: RHELNAK
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-07-28 12:52:37 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 562808, 580953    

Description Suqin Huang 2010-07-20 08:10:00 UTC
Description of problem:


Version-Release number of selected component (if applicable):
2.6.32-44.1.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1. boot guest 
qemu -name 'vm1' -monitor stdio -drive file=rhel6-64.qcow2,if=none,id=drive-virtio-disk1,media=disk,cache=none,boot=on,format=qcow2 -device virtio-blk-pci,drive=drive-virtio-disk1,id=virtio-disk1 -net nic,vlan=0,netdev=idvf7Z8L,model=virtio,macaddr='02:30:25:46:aa:6e' -netdev tap,id=idvf7Z8L,script=qemu-ifup,downscript='no',vhost=on -m 4096 -smp 2 -vnc :0 -rtc base=utc,clock=host -M rhel6.0.0  -snapshot

2. update to 
3.
  
Actual results:


Expected results:


Additional info:
1. 
acpiphp: Slot [22] registered
acpiphp: Slot [23] registered
acpiphp: Slot [24] registered
acpiphp: Slot [25] registered
acpiphp: Slot [26] registered
acpiphp: Slot [27] registered
acpiphp: Slot [28] registered
acpiphp: Slot [29] registered
acpiphp: Slot [30] registered
acpiphp: Slot [31] registered
pci-stub: invalid id string ""
input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0
ACPI: Power Button [PWRF]
ACPI: processor limited to max C-state 1
processor LNXCPU:00: registered as cooling_device0
processor LNXCPU:01: registered as cooling_device1
xen-platform-pci: failed Xen IOPORT backend handshake: unrecognised magic value
hpet_acpi_add: no address or irqs in _CRS
Non-volatile memory driver v1.3
Linux agpgart interface v0.103
crash memory driver: version 1.0
Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled

2. top
15625 root      20   0 4532m 597m 3816 S 200.3  7.9  11:30.22 qemu-kvm  

3. kvm_stat
kvm statistics

 efer_reload                  0       0
 exits                 12730717    3029
 fpu_reload             3242820      58
 halt_exits               59087       0
 halt_wakeup              31998       0
 host_state_reload	3243276      58
 hypercalls                   0       0
 insn_emulation         8555544       0
 insn_emulation_fail          0       0
 invlpg                       0       0
 io_exits               3021048       0
 irq_exits               924870    3029
 irq_injections           96327       0
 irq_window                   0       0
 largepages                 105       0
 mmio_exits              148493       0
 mmu_cache_miss            1286       0
 mmu_flooded                  0       0
 mmu_pde_zapped               0       0
 mmu_pte_updated              0       0
 mmu_pte_write             4000       0
 mmu_recycled                 0       0
 mmu_shadow_zapped         1787       0
 mmu_unsync                   0       0
 nmi_injections           15274       0
 nmi_window               15173       0
 pf_fixed                  9008       0
 pf_guest                     0       0
 remote_tlb_flush           150       0
 request_irq                  0       0
 signal_exits                17       0
 tlb_flush                    0       0

4. (qemu) x /i $pc
0xffffffff8103baab:  leaveq 

5. guest: rhel6 (kernel 2.6.32-44.1.el6.x86_64)

Comment 2 RHEL Program Management 2010-07-20 08:37:48 UTC
This issue has been proposed when we are only considering blocker
issues in the current Red Hat Enterprise Linux release.

** If you would still like this issue considered for the current
release, ask your support representative to file as a blocker on
your behalf. Otherwise ask that it be considered for the next
Red Hat Enterprise Linux release. **

Comment 3 Don Zickus 2010-07-23 13:40:08 UTC
Did you mean to say you used 'nmi_watchdog=1' on the command line?

That is not expected to work for KVM (or any virt guests really) as the ioapic timer interrupts aren't emulated as nmi's to the guest.

Though I wouldn't expect it to hang because with nmi_watchdog autodetection, and the fact there is no emulated performance counters, it defaults to the equivalent of nmi_watchdog=1.  In that case you just see an error message and the kernel proceeds forward.

So the priority should be low for this as it isn't expected to be a valid user configuration.  But it might be a bug that will need to be fixed somewhere.

Comment 4 Dor Laor 2010-07-28 12:52:37 UTC

*** This bug has been marked as a duplicate of bug 616296 ***