Bug 1007160

Summary: host crash while reload igb model with vf in use
Product: Red Hat Enterprise Linux 6 Reporter: mazhang <mazhang>
Component: qemu-kvmAssignee: Stefan Assmann <sassmann>
Status: CLOSED DUPLICATE QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 6.5CC: acathrow, bsarathy, chayang, flang, juzhang, mazhang, michen, mkenneth, qzhang, sassmann, virt-maint
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-09-23 13:49:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
vmcore-dmesg.txt none

Description mazhang 2013-09-12 03:42:29 UTC
Created attachment 796609 [details]
vmcore-dmesg.txt

Description of problem:
Assign vf to guest, in host unload igb model then reload it, host will crash.

Version-Release number of selected component (if applicable):

host:
RHEL6.5-20130905.1
qemu-kvm-0.12.1.2-2.400.el6.x86_64
kernel-2.6.32-417.el6.x86_64
# lspci -v -s 06:00.0
06:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
	Subsystem: Intel Corporation Gigabit ET Dual Port Server Adapter
	Flags: bus master, fast devsel, latency 0, IRQ 38
	Memory at dd740000 (32-bit, non-prefetchable) [size=128K]
	Memory at dd800000 (32-bit, non-prefetchable) [size=4M]
	I/O ports at ecc0 [size=32]
	Memory at dd738000 (32-bit, non-prefetchable) [size=16K]
	Expansion ROM at dd000000 [disabled] [size=4M]
	Capabilities: [40] Power Management version 3
	Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
	Capabilities: [70] MSI-X: Enable+ Count=10 Masked-
	Capabilities: [a0] Express Endpoint, MSI 00
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [140] Device Serial Number 90-e2-ba-ff-ff-05-63-5e
	Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
	Capabilities: [160] Single Root I/O Virtualization (SR-IOV)
	Kernel driver in use: igb
	Kernel modules: igb
# lspci -v -s 06:10.0
06:10.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
	Subsystem: Intel Corporation Device a03c
	Flags: fast devsel
	[virtual] Memory at dd400000 (64-bit, non-prefetchable) [size=16K]
	[virtual] Memory at dd420000 (64-bit, non-prefetchable) [size=16K]
	Capabilities: [70] MSI-X: Enable- Count=3 Masked-
	Capabilities: [a0] Express Endpoint, MSI 00
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
	Kernel driver in use: pci-stub
	Kernel modules: igbvf


guest:
RHEL6.5-20130905.1

How reproducible:
2/2

Steps to Reproduce:
1. unbind vf and assign to guest, boot up guest
#gdb --args /usr/libexec/qemu-kvm \
-M pc \
-cpu SandyBridge \
-m 2G \
-smp 4,sockets=2,cores=2,threads=1,maxcpus=16 \
-enable-kvm \
-name rhel6 \
-uuid 990ea161-6b67-47b2-b803-19fb01d30d12 \
-smbios type=1,manufacturer='Red Hat',product='RHEV Hypervisor',version=el6,serial=koTUXQrb,uuid=feebc8fd-f8b0-4e75-abc3-e63fcdb67170 \
-k en-us \
-rtc base=localtime,clock=host,driftfix=slew \
-nodefaults \
-monitor stdio \
-qmp tcp:0:6666,server,nowait \
-monitor unix:/tmp/socket,server,nowait \
-device sga \
-boot menu=on \
-bios /usr/share/seabios/bios.bin \
-vga qxl \
-spice port=5900,disable-ticketing \
-global PIIX4_PM.disable_s3=0 \
-global PIIX4_PM.disable_s4=0 \
-drive file=/home/rhel6u5.raw,if=none,id=drive-virtio-disk0,format=raw,cache=none,werror=stop,rerror=stop,aio=threads \
-device virtio-blk-pci,scsi=off,bus=pci.0,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 \
-device pci-assign,host=06:10.0,id=vf,romfile=/home/808610ca.rom \

2. Get ip from dhcp in guest, then ping remote host.

3. unload and reload igb model in host
#modprobe -r igb
#modprobe igb max_vfs=7

Actual results:
host crash

      KERNEL: /usr/lib/debug/lib/modules/2.6.32-417.el6.x86_64/vmlinux
    DUMPFILE: vmcore  [PARTIAL DUMP]
        CPUS: 16
        DATE: Thu Sep 12 11:16:07 2013
      UPTIME: 00:19:34
LOAD AVERAGE: 0.27, 0.19, 0.11
       TASKS: 412
    NODENAME: intel-e5530-8-2.englab.nay.redhat.com
     RELEASE: 2.6.32-417.el6.x86_64
     VERSION: #1 SMP Fri Sep 6 17:19:12 EDT 2013
     MACHINE: x86_64  (2393 Mhz)
      MEMORY: 8 GB
       PANIC: "Oops: 0002 [#1] SMP " (check log for details)
         PID: 5222
     COMMAND: "modprobe"
        TASK: ffff88022bd14ae0  [THREAD_INFO: ffff88022b6f0000]
         CPU: 10
       STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 5222   TASK: ffff88022bd14ae0  CPU: 10  COMMAND: "modprobe"
 #0 [ffff88022b6f1820] machine_kexec at ffffffff81038f2b
 #1 [ffff88022b6f1880] crash_kexec at ffffffff810c5f32
 #2 [ffff88022b6f1950] oops_end at ffffffff8152b700
 #3 [ffff88022b6f1980] no_context at ffffffff8104a20b
 #4 [ffff88022b6f19d0] __bad_area_nosemaphore at ffffffff8104a495
 #5 [ffff88022b6f1a20] bad_area at ffffffff8104a5be
 #6 [ffff88022b6f1a50] __do_page_fault at ffffffff8104ad6f
 #7 [ffff88022b6f1b70] do_page_fault at ffffffff8152d64e
 #8 [ffff88022b6f1ba0] page_fault at ffffffff8152aa05
    [exception RIP: igb_reset+213]
    RIP: ffffffffa03d0da5  RSP: ffff88022b6f1c58  RFLAGS: 00010246
    RAX: 0000000000000000  RBX: ffff88022bd3c6e0  RCX: 000000000000e650
    RDX: 0000000000000001  RSI: 0000000000000000  RDI: ffff88022bd3c6e0
    RBP: ffff88022b6f1c88   R8: 00000000fffffffe   R9: 0000000000000000
    R10: 0000000000000007  R11: 000000000000000c  R12: 0000000000000040
    R13: ffff88022bd3c990  R14: ffff88012dd69000  R15: ffff88022bd3c020
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #9 [ffff88022b6f1c90] igb_probe at ffffffffa03eb864 [igb]
#10 [ffff88022b6f1d40] local_pci_probe at ffffffff812a4da7
#11 [ffff88022b6f1d50] pci_device_probe at ffffffff812a5f91
#12 [ffff88022b6f1db0] driver_probe_device at ffffffff8136dbb0
#13 [ffff88022b6f1de0] __driver_attach at ffffffff8136de5b
#14 [ffff88022b6f1e10] bus_for_each_dev at ffffffff8136d164
#15 [ffff88022b6f1e50] driver_attach at ffffffff8136d94e
#16 [ffff88022b6f1e60] bus_add_driver at ffffffff8136c998
#17 [ffff88022b6f1ea0] driver_register at ffffffff8136e1a6
#18 [ffff88022b6f1ee0] __pci_register_driver at ffffffff812a61f6
#19 [ffff88022b6f1f10] init_module at ffffffffa03fe05b [igb]
#20 [ffff88022b6f1f20] do_one_initcall at ffffffff8100204c
#21 [ffff88022b6f1f50] sys_init_module at ffffffff810bc6d1
#22 [ffff88022b6f1f80] system_call_fastpath at ffffffff8100b072
    RIP: 00007f1957e880fa  RSP: 00007fff58adffb8  RFLAGS: 00010246
    RAX: 00000000000000af  RBX: ffffffff8100b072  RCX: 0000000000000000
    RDX: 00000000009e9460  RSI: 0000000000051898  RDI: 00007f1958282010
    RBP: 0000000000000000   R8: 00007f19582d38a8   R9: 00007f1958345700
    R10: 00007fff58adffd0  R11: 0000000000000246  R12: 00000000009e9460
    R13: 00000000009ea710  R14: 00000000009ea760  R15: 00000000009ea710
    ORIG_RAX: 00000000000000af  CS: 0033  SS: 002b



Expected results:
host not crash

Additional info:

Comment 3 mazhang 2013-09-12 08:34:09 UTC
iommu was enabled.

# cat /proc/cmdline 
ro root=/dev/mapper/vg_intele553082-lv_root rd_NO_LUKS LANG=en_US.UTF-8 rd_NO_MD rd_LVM_LV=vg_intele553082/lv_swap SYSFONT=latarcyrheb-sun16 crashkernel=129M@0M intel_iommu=on KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM rd_LVM_LV=vg_intele553082/lv_root rhgb quiet

# dmesg |grep IOMMU
Intel-IOMMU: enabled
dmar: IOMMU 0: reg_base_addr fed90000 ver 1:0 cap c90780106f0462 ecap f020fe
IOMMU 0xfed90000: using Queued invalidation
IOMMU: Setting RMRR:
IOMMU: Setting identity map for device 0000:00:1d.7 [0xcf4c2000 - 0xcf4c3000]
IOMMU: Setting identity map for device 0000:00:1a.7 [0xcf4c0000 - 0xcf4c1000]
IOMMU: Setting identity map for device 0000:00:1d.1 [0xcf4a7000 - 0xcf4a8000]
IOMMU: Setting identity map for device 0000:00:1d.0 [0xcf4a5000 - 0xcf4a6000]
IOMMU: Setting identity map for device 0000:00:1a.1 [0xcf4a3000 - 0xcf4a4000]
IOMMU: Setting identity map for device 0000:00:1a.0 [0xcf4a1000 - 0xcf4a2000]
IOMMU: Setting identity map for device 0000:00:1a.0 [0xcf4b1000 - 0xcf4c0000]
IOMMU: Setting identity map for device 0000:00:1a.1 [0xcf4b1000 - 0xcf4c0000]
IOMMU: Setting identity map for device 0000:00:1d.0 [0xcf4b1000 - 0xcf4c0000]
IOMMU: Setting identity map for device 0000:00:1d.1 [0xcf4b1000 - 0xcf4c0000]
IOMMU: Setting identity map for device 0000:00:1a.7 [0xcf4c8000 - 0xcf4e0000]
IOMMU: Setting identity map for device 0000:00:1d.7 [0xcf4c8000 - 0xcf4e0000]
IOMMU: Prepare 0-16MiB unity mapping for LPC
IOMMU: Setting identity map for device 0000:00:1f.0 [0x0 - 0x1000000]

Comment 4 Stefan Assmann 2013-09-23 13:49:40 UTC

*** This bug has been marked as a duplicate of bug 985733 ***