Bug 504711

Summary: "vcpu not ready for apic_round_robin" loop on guest panic
Product: Red Hat Enterprise Linux 5 Reporter: david ahern <daahern>
Component: kvmAssignee: john cooper <john.cooper>
Status: CLOSED NOTABUG QA Contact: Lawrence Lim <llim>
Severity: high Docs Contact:
Priority: low    
Version: 5.4CC: bburns, ehabkost, nobody, sghosh, tburke, tools-bugs, virt-maint
Target Milestone: rc   
Target Release: 5.4   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-07-14 00:35:16 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
iso which causes panic none

Description david ahern 2009-06-08 22:56:39 UTC
Created attachment 346942 [details]
iso which causes panic

Description of problem:

The guest is started and running a rhel5.2 kernel with a homegrown
initrd. The init script for the initrd is a bash script with a typo in
it and caused the guest to halt with a guest side panic. That in turn
caused the host to panic.

kdump is setup in the host, but it is running off a USB key which does not have
enough space for a vmcore.

Version-Release number of selected component (if applicable):
kvm-83-59

How reproducible:
Very. See uploaded iso.

Steps to Reproduce:
qemu-kvm -m 2048 -smp 2 -vnc :2 -cdrom /tmp/kvm-bug.iso -boot d

  
Actual results:
guest panics, host panics.

Expected results:
guest to panic (my typo); host should not.

Comment 1 john cooper 2009-06-29 20:27:33 UTC
Able to reproduce under a recent 5.4 kernel tree
with the following:

qemu/x86_64-softmmu/qemu-system-x86_64 -cdrom /workspace/bz504711/kvm-bug.iso -boot d


Unable to handle kernel NULL pointer dereference at 0000000000000028 RIP:
 [<ffffffff887bf886>] :kvm:kvm_get_intr_delivery_bitmask+0x4e/0x86
PGD 4dd99067 PUD 4dc9d067 PMD 0
Oops: 0000 [1] SMP
last sysfs file: /class/misc/kvm/dev
CPU 3
Modules linked in: kvm_intel(U) kvm(U) i915(U) drm(U) ipt_MASQUERADE(U) iptable)
Pid: 7250, comm: qemu-system-x86 Tainted: G      2.6.18-prepgupf #7
RIP: 0010:[<ffffffff887bf886>]  [<ffffffff887bf886>] :kvm:kvm_get_intr_delivery6
RSP: 0018:ffff810052f4dbb8  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff810052f4dbe8 RCX: ffffffff80307c28
RDX: ffffffff80307c28 RSI: 0000000000001000 RDI: ffffffff80307c20
RBP: ffff810052f4dbd8 R08: ffffffff80307c28 R09: 0000000000000046
R10: 0000000000000000 R11: 0000000000000080 R12: ffff81006cceda00
R13: ffff81006cceda20 R14: 0000000000000001 R15: 0000000000001000
FS:  0000000041af6940(0063) GS:ffff81007d63e640(0000) knlGS:0000000000000000
CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
CR2: 0000000000000028 CR3: 000000004fd12000 CR4: 00000000000026e0
Process qemu-system-x86 (pid: 7250, threadinfo ffff810052f4c000, task ffff81004)
Stack:  ffff810064935e80 ffff8100508a8000 ffff81006cceda00 ffffffff887be06b
 0000000000000001 ffff810000000001 0100000000000939 0000000000000001
 ffff810064935e80 ffff8100508a8000 00000000ffffffff 0000000000000001
Call Trace:
 [<ffffffff887be06b>] :kvm:ioapic_service+0x55/0x12a
 [<ffffffff887bfa6e>] :kvm:kvm_set_irq+0x65/0xa3
 [<ffffffff887b0c48>] :kvm:kvm_arch_vm_ioctl+0x37e/0x62e
 [<ffffffff8008cbf1>] sched_clock_cpu+0x116/0x124
 [<ffffffff8008cd1b>] update_rq_clock+0x17/0x20
 [<ffffffff80063489>] __sched_text_start+0xf9/0xbef
 [<ffffffff887aa643>] :kvm:kvm_vm_ioctl+0xa87/0xade
 [<ffffffff887a84ed>] :kvm:kvm_io_bus_find_dev+0x35/0x50
 [<ffffffff8008c4eb>] preempt_notifier_unregister+0x92/0xac
 [<ffffffff887a9ac4>] :kvm:vcpu_put+0x16/0x1f
 [<ffffffff887af789>] :kvm:kvm_arch_vcpu_ioctl_run+0x5fd/0x60b
 [<ffffffff887ab3d0>] :kvm:kvm_vcpu_ioctl+0x435/0x448
 [<ffffffff8008cf62>] default_wake_function+0x0/0xe
 [<ffffffff800424cf>] do_ioctl+0x21/0x6b
 [<ffffffff800306c7>] vfs_ioctl+0x457/0x4b9
 [<ffffffff800b7498>] audit_syscall_entry+0x16e/0x1a1
 [<ffffffff8004cae5>] sys_ioctl+0x59/0x78
 [<ffffffff8005e28d>] tracesys+0xd5/0xe0


Code: 8b 40 28 0f ab 45 00 eb 2a e8 5e 2e 8d f7 85 c0 74 19 40 b6
RIP  [<ffffffff887bf886>] :kvm:kvm_get_intr_delivery_bitmask+0x4e/0x86
 RSP <ffff810052f4dbb8>
CR2: 0000000000000028
 <0>Kernel panic - not syncing: Fatal exception


Looks like kernel/x86/irq_comm.c:kvm_get_intr_delivery_bitmask()
isn't laundering the vcpu * return from kvm_get_lowest_prio_vcpu()
and dereferences a NULL pointer.

Comment 2 Eduardo Habkost 2009-07-09 18:52:47 UTC
It seems to be a duplicate of bug #504018. Could you test with a newer KVM package? (kvm-83-65.el5 or later).

Comment 3 david ahern 2009-07-09 19:34:56 UTC
The host side panic did not happen with kvm-83-82, but the host is spewing the following (from dmesg):

vcpu not ready for apic_round_robin

Comment 4 Eduardo Habkost 2009-07-09 19:46:30 UTC
(In reply to comment #3)
> The host side panic did not happen with kvm-83-82, but the host is spewing the
> following (from dmesg):
> 
> vcpu not ready for apic_round_robin  

Thanks, I'm updating the summary accordingly. It looks like a less serious issue, but allowing a guest to flood dmesg sounds undesirable too.

Comment 6 john cooper 2009-07-11 05:57:01 UTC
Correction: the host does not panic in this scenario
built from a tree after commit fb6eef649d07cc27680ee073fe5d9b5fbf4093c1.
I had reproduced the above failure inadvertently with an
older tree.

I didn't find the "vcpu not ready for apic_round_robin"
diagnostic generated either but that by itself isn't
conclusive w/r/t comment #3 above.  Except for this
nit I think we can close the case.

Comment 7 john cooper 2009-07-14 00:35:16 UTC
Fixed in current tree by fb6eef649d07cc27680ee073fe5d9b5fbf4093c1.