Bug 603293

Summary: Host crashed when running iperf from guest to host with vhost on
Product: Red Hat Enterprise Linux 6 Reporter: Keqin Hong <khong>
Component: kernelAssignee: Red Hat Kernel Manager <kernel-mgr>
Status: CLOSED DUPLICATE QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: medium Docs Contact:
Priority: high    
Version: 6.0CC: amit.shah, herbert.xu, lihuang, michen, mst, ndai
Target Milestone: rcKeywords: TestBlocker
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-06-15 08:06:59 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Attachments:
Description Flags
complete ttyS0 log none

Description Keqin Hong 2010-06-12 04:15:30 EDT
Description of problem:
RHEL6 host crashed when starting a rhel6 guest with vhost=on and doing iperf from guest to host. 

Version-Release number of selected component (if applicable):

kernel-2.6.32-33.el6.x86_64

qemu-kvm-0.12.1.2-2.73.el6 
iperf version 2.0.4 (http://sourceforge.net/projects/iperf/)

How reproducible:
100%

CLI:
/usr/libexec/qemu-kvm -m 2G -smp 2 -drive file=RHEL6.0-64-virtio-0603.1.qcow2,if=none,id=drive-virtio0,cache=none,boot=on -device virtio-blk-pci,drive=drive-virtio0,id=virtio-blk-pci0,addr=0x3 -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,mac=76:00:40:3F:20:20,bus=pci.0,addr=0x4 -boot order=c,menu=on -uuid 17644ecc-d3a1-4d3c-a386-12daf50015f2 -rtc base=utc -rtc-td-hack -no-kvm-pit-reinjection -monitor stdio -cpu qemu64,+sse2 -balloon none -vnc :1


Steps to Reproduce:
1. Start VM with CLI listed above (vhost=on)
2. Start iperf server on host
host# iperf -s -w 256K
3. Run iperf client on guest
guest# iperf -c $host -w 256K

Notice you might need run step 3 several times to reproduce the crash  (or use a loop script instead)
  
Actual results:
Host crashed with incomplete crash-core (/var/crash/2010-06-11-07\:38/vmcore-incomplete)

Expected results:
Host won't crash, network between guest and host works fine.

Additional info:
[crash dump]
Last login: Fri Jun 11 15:32:31 from dhcp-65-184.nay.redhat.com
[root@xxx]# BUG: unable to handle kernel NULL pointer dereference at 0000000000000400
IP: [<ffffffffa050a824>] __br_deliver+0x64/0xe0 [bridge]
PGD 20e3ee067 PUD 214fb9067 PMD 0 
Oops: 0000 [#1] SMP 
last sysfs file: /sys/devices/pci0000:00/0000:00:06.0/0000:3f:00.0/irq
CPU 3 
Modules linked in: vhost_net(U) macvtap(U) macvlan(U) tun(U) ip6table_filter(U) ip6_tables(U) ebtable_nat(U) ebtables(U) ipt_MASQUERADE(U) iptable_nat(U) nf_nat(U) autofs4(U) sunrpc(U) cpufreq_ondemand(U) powernow_k8(U) freq_table(U) bridge(U) stp(U) llc(U) be2iscsi(U) bnx2i(U) cnic(U) uio(U) cxgb3i(U) cxgb3(U) mdio(U) ib_iser(U) rdma_cm(U) ib_cm(U) iw_cm(U) ib_sa(U) ib_mad(U) ib_core(U) ib_addr(U) ipv6(U) iscsi_tcp(U) libiscsi_tcp(U) libiscsi(U) scsi_transport_iscsi(U) dm_mirror(U) dm_region_hash(U) dm_log(U) kvm_amd(U) kInitializing cgroup subsys cpuset
Initializing cgroup subsys cpu
Linux version 2.6.32-33.el6.x86_64 (mockbuild@x86-006.build.bos.redhat.com) (gcc version 4.4.4 20100525 (Red Hat 4.4.4-5) (GCC) ) #1 SMP Thu Jun 3 13:00:03 EDT 2010
Command line: ro root=/dev/mapper/vg_dhcp91176-lv_root rd_LVM_LV=vg_dhcp91176/lv_root rd_LVM_LV=vg_dhcp91176/lv_swap rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us  console=tty0 console=ttyS0,115200nb irqpoll maxcpus=1 reset_devices memmap=exactmap memmap=640K@0K memmap=131436K@33408K elfcorehdr=164844K
KERNEL supported cpus:
  Intel GenuineIntel
  AMD AuthenticAMD
  Centaur CentaurHauls
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000100 - 000000000009fc00 (usable)
 BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000e8000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 00000000cefb6b00 (usable)
 BIOS-e820: 00000000cefb6b00 - 00000000e0000000 (reserved)
 BIOS-e820: 00000000f4000000 - 00000000f8000000 (reserved)
 BIOS-e820: 00000000fec00000 - 00000000fed40000 (reserved)
 BIOS-e820: 00000000fed45000 - 0000000100000000 (reserved)
 BIOS-e820: 0000000100000000 - 0000000220000000 (usable)
last_pfn = 0x220000 max_arch_pfn = 0x400000000
user-defined physical RAM map:
 user: 0000000000000000 - 00000000000a0000 (usable)
 user: 00000000020a0000 - 000000000a0fb000 (usable)
DMI 2.5 present.
last_pfn = 0xa0fb max_arch_pfn = 0x400000000
x86 PAT enabled: cpu 0, old 0x7010600070106, new 0x7010600070106
Using GB pages for direct mapping
init_memory_mapping: 0000000000000000-000000000a0fb000
RAMDISK: 09741000 - 0a0eeaf2
ACPI: RSDP 00000000000e6a10 00014 (v00 COMPAQ)
ACPI: RSDT 00000000cefc6b40 00040 (v01 HPQOEM SLIC-BPC 20090226      00000000)
ACPI: FACP 00000000cefc6be8 00074 (v01 COMPAQ HP_RS780 00000001      00000000)
ACPI: DSDT 00000000cefc6f5f 09992 (v01 COMPAQ DSDT_PRJ 00000001 MSFT 0100000E)
ACPI: FACS 00000000cefc6b00 00040
ACPI: APIC 00000000cefc6c5c 00084 (v01 COMPAQ HP_RS780 00000001      00000000)
ACPI: ASF! 00000000cefc6ce0 00063 (v32 COMPAQ HP_RS780 00000001      00000000)
ACPI: MCFG 00000000cefc6d43 0003C (v01 COMPAQ HP_RS780 00000001      00000000)
ACPI: TCPA 00000000cefc6d7f 00032 (v01 COMPAQ HP_RS780 00000001      00000000)
ACPI: SLIC 00000000cefc6db1 00176 (v01 HPQOEM SLIC-BPC 00000001      00000000)
ACPI: HPET 00000000cefc6f27 00038 (v01 COMPAQ HP_RS780 00000001      00000000)
Scanning NUMA topology in Northbridge 24
No NUMA configuration found
Faking a node at 0000000000000000-000000000a0fb000
Bootmem setup node 0 0000000000000000-000000000a0fb000
  NODE_DATA [0000000000009000 - 000000000003cfff]
  bootmap [000000000003d000 -  000000000003e41f] pages 2
(7 early reservations) ==> bootmem [0000000000 - 000a0fb000]
  #0 [0000000000 - 0000001000]   BIOS data page ==> [0000000000 - 0000001000]
  #1 [0000006000 - 0000008000]       TRAMPOLINE ==> [0000006000 - 0000008000]
  #2 [0003000000 - 0003cb9098]    TEXT DATA BSS ==> [0003000000 - 0003cb9098]
  #3 [0009741000 - 000a0eeaf2]          RAMDISK ==> [0009741000 - 000a0eeaf2]
  #4 [000009fc00 - 0000100000]    BIOS reserved ==> [000009fc00 - 0000100000]
  #5 [0003cba000 - 0003cba10c]              BRK ==> [0003cba000 - 0003cba10c]
  #6 [0000008000 - 0000009000]          PGTABLE ==> [0000008000 - 0000009000]
Zone PFN ranges:
  DMA      0x00000001 -> 0x00001000
  DMA32    0x00001000 -> 0x00100000
  Normal   0x00100000 -> 0x00100000
Movable zone start PFN for each node
early_node_map[2] active PFN ranges
    0: 0x00000001 -> 0x000000a0
    0: 0x000020a0 -> 0x0000a0fb
ACPI: PM-Timer IO Port: 0xf808
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x02] enabled)
ACPI: LAPIC (acpi_id[0x04] lapic_id[0x03] enabled)
ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x04] high edge lint[0x1])
ACPI: IOAPIC (id[0x04] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 4, version 33, address 0xfec00000, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
Using ACPI (MADT) for SMP configuration information
ACPI: HPET id: 0x10028300 base: 0xfed00000
SMP: Allowing 4 CPUs, 0 hotplug CPUs
PM: Registered nosave memory: 00000000000a0000 - 00000000020a0000
Allocating PCI resources starting at a0fb000 (gap: a0fb000:f5f05000)
Booting paravirtualized kernel on bare hardware
NR_CPUS:4096 nr_cpumask_bits:4 nr_cpu_ids:4 nr_node_ids:1
PERCPU: Embedded 31 pages/cpu @ffff880002200000 s94744 r8192 d24040 u524288
pcpu-alloc: s94744 r8192 d24040 u524288 alloc=1*2097152
pcpu-alloc: [0] 0 1 2 3 
Built 1 zonelists in Node order, mobility grouping on.  Total pages: 32354
Policy zone: DMA32
Kernel command line: ro root=/dev/mapper/vg_dhcp91176-lv_root rd_LVM_LV=vg_dhcp91176/lv_root rd_LVM_LV=vg_dhcp91176/lv_swap rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us  console=tty0 console=ttyS0,115200nb irqpoll maxcpus=1 reset_devices memmap=exactmap memmap=640K@0K memmap=131436K@33408K elfcorehdr=164844K
Misrouted IRQ fixup and polling support enabled
This may significantly impact system performance
PID hash table entries: 512 (order: 0, 4096 bytes)
Checking aperture...
No AGP bridge found
Node 0: aperture @ 20000000 size 64 MB
Memory: 104056k/164844k available (4997k kernel code, 32772k absent, 28016k reserved, 3972k data, 1220k init)
Hierarchical RCU implementation.
NR_IRQS:33024 nr_irqs:440
Spurious LAPIC timer interrupt on cpu 0
Console: colour VGA+ 80x25
console [tty0] enabled
console [ttyS0] enabled
...

complete dump msg can be seen from attachment
Comment 1 Keqin Hong 2010-06-12 04:17:55 EDT
Created attachment 423459 [details]
complete ttyS0 log
Comment 3 Keqin Hong 2010-06-12 04:24:27 EDT
dmesg after starting VM with vhost on
# dmesg 
device tap0 entered promiscuous mode
breth0: port 2(tap0) entering forwarding state
New device tap0 does not support netpoll
Disabling netpoll for breth0
breth0: port 2(tap0) entering disabled state
device tap0 left promiscuous mode
breth0: port 2(tap0) entering disabled state
device tap0 entered promiscuous mode
breth0: port 2(tap0) entering forwarding state
New device tap0 does not support netpoll
Disabling netpoll for breth0
tap0: no IPv6 routers present
...
Comment 4 RHEL Product and Program Management 2010-06-12 04:33:03 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.
Comment 7 Michael S. Tsirkin 2010-06-13 06:36:29 EDT
do you have vmcore?
Comment 9 Herbert Xu 2010-06-15 08:06:59 EDT
The crash log indicates that this is the same issue as 602927.

*** This bug has been marked as a duplicate of bug 602927 ***