Bug 657149

Summary: guest with passthrough nic got kernel panic when send system_reset signal in QEMU monitor
Product: Red Hat Enterprise Linux 5 Reporter: Chao Yang <chayang>
Component: kvmAssignee: Alex Williamson <alex.williamson>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: low    
Version: 5.6CC: ddutile, gcosta, Jes.Sorensen, juzhang, michen, mkenneth, tburke, virt-maint
Target Milestone: rcKeywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: kvm-83-229.el5 Doc Type: Bug Fix
Doc Text:
When a system_reset signal was sent to a guest with a pass-through NIC (Network Interface Card) attached, a kernel panic occurred in the guest. This bug has been fixed, and the guest now reboots properly in the described scenario.
Story Points: ---
Clone Of:
: 689860 689880 (view as bug list) Environment:
Last Closed: 2011-07-21 08:50:32 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 689860    
Bug Blocks: 580948, 689880    
Attachments:
Description Flags
send system_reset in QEMU monitor
none
kernel console output
none
send system_reset on 32bit guest none

Description Chao Yang 2010-11-25 04:27:48 UTC
Description of problem:


Version-Release number of selected component (if applicable):

--Host
# rpm -qa|grep kvm
kmod-kvm-83-215.el5
kmod-kvm-debug-83-215.el5
kvm-qemu-img-83-215.el5
kvm-tools-83-215.el5
etherboot-zroms-kvm-5.4.4-13.el5
kvm-debuginfo-83-215.el5
kvm-83-215.el5

# uname -r
2.6.18-232.el5

--Guest

# uname -r
2.6.32-71.el6.x86_64


How reproducible:


Steps to Reproduce:
1. boot guest with -pcidevice option
# /usr/libexec/qemu-kvm -no-hpet -rtc-td-hack -usbdevice tablet -startdate now -name rhel6.0-64 -smp 2 -m 4G -boot c  -drive file=/root/zhangjunyi/rhel6.0_64.qcow2,media=disk,if=ide,cache=none,format=qcow2,werror=stop,boot=on -vnc :18 -cpu qemu64 -M rhel5.6.0 -notify all -balloon none -monitor stdio -net none -pcidevice host=03:00.1
2. connect to guest with vnc
# vncviewer 10.66.72.60:18

3. login with root, send system_reset signal in QEMU monitor
  
Actual results:
guest kernel panic, see attachment for details

Expected results:
guest reboot correctly

Additional info:

see attachment

Comment 1 Chao Yang 2010-11-25 04:30:11 UTC
Created attachment 462806 [details]
send system_reset in QEMU monitor

Comment 2 Chao Yang 2010-11-25 04:33:09 UTC
Created attachment 462808 [details]
kernel console output

Comment 3 Chao Yang 2010-11-25 04:46:05 UTC
Note:

1. If boot guest RHEL-Server-6.0-64 without -pcidevice option, guest reboot correctly via system_reset in QEMU monitor. I have tested for five times, all reboot correctly

2. I tried guest RHEL-Server-6.0-32, if boot without -pcidevice, guest reboot via system_reset correctly. But boot with -pcidevice, guest sometimes hangs, sometimes reboot with messages:"do_IRQ: 0.89 No irq handler for vector (irq -1)" via system_reset. I will attach sreenshot and log file

Comment 4 Chao Yang 2010-11-25 04:54:14 UTC
Created attachment 462809 [details]
send system_reset on 32bit guest

Comment 6 RHEL Program Management 2011-01-11 20:53:07 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated in the
current release, Red Hat is unfortunately unable to address this
request at this time. Red Hat invites you to ask your support
representative to propose this request, if appropriate and relevant,
in the next release of Red Hat Enterprise Linux.

Comment 7 RHEL Program Management 2011-01-11 22:56:47 UTC
This request was erroneously denied for the current release of
Red Hat Enterprise Linux.  The error has been fixed and this
request has been re-proposed for the current release.

Comment 12 Alex Williamson 2011-03-02 23:29:13 UTC
I'm unable to reproduce.  What's the device being assigned?  Is the guest enabling and using it?  Are the errors you're reporting occurring as the guest is booting up again after the qemu system_reset?

Comment 13 Chao Yang 2011-03-15 12:36:37 UTC
(In reply to comment #12)
> I'm unable to reproduce.  What's the device being assigned?  Is the guest
> enabling and using it?  Are the errors you're reporting occurring as the guest
> is booting up again after the qemu system_reset?

Alex,
 I reproduced on rhel6.1, please check bz685147

Comment 18 Chao Yang 2011-05-13 03:37:23 UTC
Reproduced on kvm-83-224.el5_6.1 by following steps:
1. boot a guest with 82576 nic card assigned:
 /usr/libexec/qemu-kvm -M rhel5.6.0 -no-hpet -rtc-td-hack -startdate now -name rhel6.0 -smp 2 -m 2048 -cpu qemu64,+sse2 -uuid `uuidgen` -boot c -net nic,vlan=1,macaddr=00:1F:29:03:23:89,model=virtio -net tap,vlan=1,script=/etc/qemu-ifup -drive file=/root/RHEL-Server-6.0-64.qcow2,media=disk,if=virtio,cache=none,boot=on,format=qcow2 -vnc :1 -notify all -balloon none -monitor stdio -pcidevice host=03:00.1  -serial unix:/tmp/chayang-system-reset.sock,server,nowait
2. issue system_reset in monitor

Actual Result:
After step 2, get kernel panic:
Modules linked in: ext4 mbcache jbd2 sr_mod cdrom virtio_blk pata_acpi ata_generic ata_piix virtio_pci virtio_ring virtio dm_mod [last unloaded: scsi_wait_scan]
Pid: 9, comm: events/0 Not tainted 2.6.32-70.el6.x86_64 #1 KVM
RIP: 0010:[<ffffffff81157bed>]  [<ffffffff81157bed>] free_block+0x16d/0x180
RSP: 0018:ffff88007cd0dd30  EFLAGS: 00010046
RAX: ffffea0001bffdd0 RBX: ffff880079122780 RCX: 0000000000000010
RDX: 0000000000000000 RSI: ffff8800790d6040 RDI: ffffffffffff6ec0
RBP: ffff88007cd0dd80 R08: ffff880037236ec0 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000002 R12: 0000000000000006
R13: ffff88007975a820 R14: 0000000000000001 R15: ffffea0000000000
FS:  0000000000000000(0000) GS:ffff880001e00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00007f441e56c5e0 CR3: 00000000372a9000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process events/0 (pid: 9, threadinfo ffff88007cd0c000, task ffff88007cd0b560)
Stack:
 ffff880037236ec0 ffff8800372f7000 000000000000100c ffffffffffff6ec0
<0> 0000000000000000 ffff88007975a800 ffff880079122780 0000000000000006
<0> ffff880037236f00 ffff88007975a818 ffff88007cd0ddd0 ffffffff81157e31
Call Trace:
 [<ffffffff81157e31>] drain_array+0xc1/0x100
 [<ffffffff81158dfe>] cache_reap+0x7e/0x260
 [<ffffffff81158d80>] ? cache_reap+0x0/0x260
 [<ffffffff8108c610>] worker_thread+0x170/0x2a0
 [<ffffffff81091ca0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8108c4a0>] ? worker_thread+0x0/0x2a0
 [<ffffffff81091936>] kthread+0x96/0xa0
 [<ffffffff810141ca>] child_rip+0xa/0x20
 [<ffffffff810918a0>] ? kthread+0x0/0xa0
 [<ffffffff810141c0>] ? child_rip+0x0/0x20
Code: 41 5f c9 c3 66 2e 0f 1f 84 00 00 00 00 00 48 8b 72 08 48 89 c7 e8 84 0e 11 00 e9 ff fe ff ff 48 8b 40 10 48 8b 10 e9 36 ff ff ff <0f> 0b 90 eb fd 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 
RIP  [<ffffffff81157bed>] free_block+0x16d/0x180
 RSP <ffff88007cd0dd30>
---[ end trace e805539d76eb35a9 ]---
Kernel panic - not syncing: Fatal exception
Pid: 9, comm: events/0 Tainted: G      D    ----------------  2.6.32-70.el6.x86_64 #1
Call Trace:
 [<ffffffff814c7b23>] panic+0x78/0x137
 [<ffffffff814cbbf4>] oops_end+0xe4/0x100
 [<ffffffff8101733b>] die+0x5b/0x90
 [<ffffffff814cb4a4>] do_trap+0xc4/0x160
 [<ffffffff81014ee5>] do_invalid_op+0x95/0xb0
 [<ffffffff81157bed>] ? free_block+0x16d/0x180
 [<ffffffff81013f5b>] invalid_op+0x1b/0x20
 [<ffffffff81157bed>] ? free_block+0x16d/0x180
 [<ffffffff81157e31>] drain_array+0xc1/0x100
 [<ffffffff81158dfe>] cache_reap+0x7e/0x260
 [<ffffffff81158d80>] ? cache_reap+0x0/0x260
 [<ffffffff8108c610>] worker_thread+0x170/0x2a0
 [<ffffffff81091ca0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8108c4a0>] ? worker_thread+0x0/0x2a0
 [<ffffffff81091936>] kthread+0x96/0xa0
 [<ffffffff810141ca>] child_rip+0xa/0x20
 [<ffffffff810918a0>] ? kthread+0x0/0xa0
 [<ffffffff810141c0>] ? child_rip+0x0/0x20


-----------------------------------

Verified with kvm-83-232.el5 on kernel 2.6.18-261.el5 using same steps above, issue system_reset 10 times, all *PASS*, guest can boot correctly and work well, no kernel panic. So this bug has been fixed.

Comment 20 Tomas Capek 2011-07-19 09:05:58 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
When a system_reset signal was sent to a guest with a pass-through NIC (Network Interface Card) attached, a kernel panic occurred in the guest. This bug has been fixed, and the guest now reboots properly in the described scenario.

Comment 21 errata-xmlrpc 2011-07-21 08:50:32 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-1068.html

Comment 22 errata-xmlrpc 2011-07-21 11:49:13 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-1068.html