Bug 588015

Summary: x86_64 host on Nehalem-EX machines will panic when installing a 4.8 GA kvm guest
Product: Red Hat Enterprise Linux 5 Reporter: Igor Zhang <yugzhang>
Component: kernelAssignee: Herbert Xu <herbert.xu>
Status: CLOSED ERRATA QA Contact: Network QE <network-qe>
Severity: high Docs Contact:
Priority: urgent    
Version: 5.5CC: benny.won, cvantuin, cww, dhoward, hjia, knoel, lihuang, mjenner, qcai, sandy.garza, tao, tburke, virt-maint
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 594561 (view as bug list) Environment:
Last Closed: 2011-01-13 21:30:22 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 580949, 594561, 616845, 648938    
Attachments:
Description Flags
gro: Fix bogus gso_size on the first fraglist entry
none
gro: Fix illegal merging of trailer trash none

Description Igor Zhang 2010-05-02 07:29:07 UTC
Description of problem:
x86_64 host on Nehalem-EX machines with 5.5.z or 5.5 GA or 5.4 GA kernels installed will surely panic when installing a 4.8 GA kvm guest:
http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=153408
http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=153407
http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=153615
http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=153656
http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=153658

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Avi Kivity 2010-05-02 10:25:40 UTC
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at drivers/net/tun.c:476
invalid opcode: 0000 [1] SMP 
last sysfs file: /class/net/lo/ifindex
CPU 52 
Modules linked in: tun nls_utf8 nfs fscache nfs_acl ipt_MASQUERADE iptable_nat ip_nat xt_state ip_conntrack nfnetlink ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc cpufreq_ondemand acpi_cpufreq freq_table be2iscsi ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic ipv6 xfrm_nalgo crypto_api uio cxgb3i cxgb3 libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi loop dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport ksm(U) kvm_intel(U) kvm(U) joydev sr_mod cdrom sg igb i2c_i801 8021q i2c_core pcspkr dca dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod ahci libata shpchp megaraid_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 15689, comm: qemu-kvm Tainted: G      2.6.18-194.2.1.el5 #1
RIP: 0010:[<ffffffff887967d9>]  [<ffffffff887967d9>] :tun:tun_chr_readv+0x2b1/0x3a6
RSP: 0018:ffff810c75fd7e48  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff810c75fd7e98 RCX: 0000000010015101
RDX: ffff81046d478700 RSI: ffff810c75fd7e9e RDI: ffff810c75fd7e92
RBP: 0000000000010ff6 R08: 0000000000000000 R09: 0000000000000001
R10: ffff810c75fd7e94 R11: 00000000ffffffff R12: ffff81047ed87280
R13: ffff810472cd3d00 R14: 0000000000000000 R15: ffff810c75fd7ef8
FS:  00002acc01e25080(0000) GS:ffff81087ff95840(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002ab832aea490 CR3: 000000107beff000 CR4: 00000000000026e0
Process qemu-kvm (pid: 15689, threadinfo ffff810c75fd6000, task ffff810c7f9db860)
Stack:  ffff81047d903ea0 ffff81047d1441c0 0000000000000000 ffff810c7f9db860
 ffffffff8008d087 ffff810472cd3d30 ffff810472cd3d30 ffff81047e60f3d8
 000005a805ea0000 0000000000000000 000043b6503e1600 0000000000000000
Call Trace:
 [<ffffffff8008d087>] default_wake_function+0x0/0xe
 [<ffffffff887968e8>] :tun:tun_chr_read+0x1a/0x1f
 [<ffffffff8000b681>] vfs_read+0xcb/0x171
 [<ffffffff80011bd2>] sys_read+0x45/0x6e
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0


Code: 0f 0b 68 f0 74 79 88 c2 dc 01 f6 42 0a 08 74 0c 80 4c 24 41 
RIP  [<ffffffff887967d9>] :tun:tun_chr_readv+0x2b1/0x3a6
 RSP <ffff810c75fd7e48>
 <0>Kernel panic - not syncing: Fatal exception
 [-- MARK -- Fri Apr 30 00:05:00 2010]

Comment 2 Herbert Xu 2010-05-14 10:40:22 UTC
What's the NIC? It's probably producing LRO packets which is incompatible with bridging.

Comment 18 Herbert Xu 2010-05-21 02:51:50 UTC
Created attachment 415556 [details]
gro: Fix bogus gso_size on the first fraglist entry

When GRO produces fraglist entries, and the resulting skb hits
an interface that is incapable of TSO but capable of FRAGLIST,
we end up producing a bogus packet with gso_size non-zero.

This was reported in the field with older versions of KVM that
did not set the TSO bits on tuntap.

This patch fixes that.

Reported-by: Igor Zhang <yugzhang>
Signed-off-by: Herbert Xu <herbert.org.au>

Comment 19 Herbert Xu 2010-05-21 04:10:03 UTC
Created attachment 415571 [details]
gro: Fix illegal merging of trailer trash

    gro: Fix illegal merging of trailer trash
    
    When we've merged skb's with page frags, and subsequently receive
    a trailer skb (< MSS) that is not completely non-linear (this can
    occur on Intel NICs if the packet size falls below the threshold),
    GRO ends up producing an illegal GSO skb with a frag_list.
    
    This is harmless unless the skb is then forwarded through an
    interface that requires software GSO, whereupon the GSO code
    will BUG.
    
    This patch detects this case in GRO and avoids merging the
    trailer skb.
    
    Reported-by: Mark Wagner <mwagner>
    Signed-off-by: Herbert Xu <herbert.org.au>
    Signed-off-by: David S. Miller <davem>

Comment 20 Benny Wang 2010-06-01 09:16:56 UTC
x86_64 host on Nehalem (E5504) machines with 5.4 kernels will also surely panic when file transfer to a RHEL4.8 kvm guest if enable virtio driver on guest OS.

This will occer on Intel 82576 NIC .

If replace Intel 82576 with BCM5709 NIC , it works OK .

If we apply the patch to RHEL5.4 kernel source and rebuild the kernel , it resolved the issue .

Comment 23 Jarod Wilson 2010-07-12 15:46:23 UTC
in kernel-2.6.18-206.el5
You can download this test kernel from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.

Comment 25 Michael S. Tsirkin 2010-08-01 08:23:56 UTC
*** Bug 619255 has been marked as a duplicate of this bug. ***

Comment 31 Neil Horman 2011-01-06 15:55:30 UTC
*** Bug 549743 has been marked as a duplicate of this bug. ***

Comment 38 errata-xmlrpc 2011-01-13 21:30:22 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0017.html

Comment 39 IBM Bug Proxy 2011-03-26 14:12:29 UTC
------- Comment From linuxram.com 2010-04-20 17:51 EDT-------
It is rhel5 host.

O well. I see the confusion. This problem is seen with virtio, not with vhost.
the qemu command is

/usr/libexec/qemu-kvm  -name rhel5 -drive file=rhel5.img,boot=on,if=virtio -net
nic,macaddr=54:52:00:46:26:80,model=virtio -net
tap,script=/etc/qemu-if,ifname=vnet0 -m 512

------- Comment From  2010-08-09 04:06 EDT-------
Redhat,

Any updates on this bug? Is this going to be fixed in RHEL5.6?

Thanks
Muni

------- Comment From coschult.com 2011-02-15 19:50 EDT-------
I have verified that this issue is not present in rhel 5.6 RC1.