Description of problem: When running to a guest that is bridged to a NIC that is running with LRO, the host will crash. Version-Release number of selected component (if applicable): kernel 2.6.18-150 How reproducible: Everytime Steps to Reproduce: 1. Configure for KVM guest to use a LRO based NIC 2. Bring up guest 3. Try ro ssh to guest from external box over link that supports LRO Actual results: Host crashes Expected results: Its shouldn't crash the host Additional info: Kernel BUG at drivers/net/tun.c:487 invalid opcode: 0000 [1] SMP last sysfs file: /devices/pci0000:00/0000:00:00.0/irq CPU 15 Modules linked in: iptable_raw iptable_nat ip_nat ip_conntrack nfnetlink iptable_filter ip_tables x_tables tun bridge ipv6 xfrm_nalgo crypto_api autofs4 hidp rfcomm l2cap bluetooth sunrpc cpufreq_ondemand acpi_cpufreq freq_table dm_multipath scsi_dh video hwmon backlight sbs i2c_ec button battery asus_acpi acpi_memhotplug ac parport_pc lp parport ksm(U) kvm_intel(U) kvm(U) joydev sr_mod cdrom shpchp igb i2c_i801 sg i2c_core ixgbe serio_raw pcspkr dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod ahci libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 8482, comm: qemu-kvm Tainted: G 2.6.18-151.el5 #1 RIP: 0010:[<ffffffff884ce7ab>] [<ffffffff884ce7ab>] :tun:tun_chr_readv+0x2b1/0x3a6 RSP: 0018:ffff81031e71de48 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff81031e71de98 RCX: 0000000041551405 RDX: ffff8101b0a32580 RSI: ffff81031e71de9e RDI: ffff81031e71de92 RBP: 0000000000010ff6 R08: 0000000000000000 R09: 0000000000000001 R10: ffff81031e71de94 R11: 0000000000000048 R12: ffff8101b0e0fa80 R13: ffff8101bc091d00 R14: 0000000000000000 R15: ffff81031e71def8 FS: 00002afbce153fc0(0000) GS:ffff81033fcbf0c0(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00002b714c976000 CR3: 00000001aee46000 CR4: 00000000000026e0 Process qemu-kvm (pid: 8482, threadinfo ffff81031e71c000, task ffff81033d5c07e0) Stack: ffff8101bd60bb00 ffff8101bb220e80 0000000000000000 ffff81033d5c07e0 ffffffff8008cd53 ffff8101bc091d28 ffff8101bc091d28 ffff8101bca56ed0 0000010000420000 0000000000000000 0000563412005452 0000000000000000 Call Trace: [<ffffffff8008cd53>] default_wake_function+0x0/0xe [<ffffffff884ce8ba>] :tun:tun_chr_read+0x1a/0x1f [<ffffffff8000bd4d>] vfs_read+0xcb/0x171 [<ffffffff800121f6>] sys_read+0x45/0x6e [<ffffffff8005e28d>] tracesys+0xd5/0xe0 Code: 0f 0b 68 90 f4 4c 88 c2 e7 01 f6 42 0a 08 74 0c 80 4c 24 41 RIP [<ffffffff884ce7ab>] :tun:tun_chr_readv+0x2b1/0x3a6 RSP <ffff81031e71de48> <0>Kernel panic - not syncing: Fatal exception <0>Rebooting in 10 seconds.. And here's the BUG: if (sinfo->gso_type & SKB_GSO_TCPV4) gso.gso_type = VIRTIO_NET_HDR_GSO_TCPV4; else if (sinfo->gso_type & SKB_GSO_TCPV6) gso.gso_type = VIRTIO_NET_HDR_GSO_TCPV6; else BUG(); And here's some quick debugging: tun_put_user: skb: hdr_len 66 gso_size 256, gso_type 0
Suggested remedy: Convert all LRO drivers to GRO.
Here's the list of remaining LRO drivers in RHEL5. Do we have hardware for these so I can test it after converting them to GRO? enic ehea mlx4 benet s2io If no hardware is available, I recommend that we disable LRO on these by default.
I have enic, benet (be2net), and s2io. Someone has access to ehea since patches get posted and there are mlx4 cards floating around. We could certainly try and disable LRO, but I think we should really be carrying these patches too since I think I'm seeing reports of a panic with GRO and bridging too. http://people.redhat.com/agospoda/rhel5/0049-lro-add-check-to-warn-if-forwarding-on-devices-that.patch http://people.redhat.com/agospoda/rhel5/0131-tun-fix-LRO-crash.patch
1. I certainly am not against carrying those patches if they're already upstream. 2. Can you point me to the GRO crashes that you saw? 3. If I give you GRO patches for enic, benet, s2io could you test them for me? Thanks!
(In reply to comment #4) > 1. I certainly am not against carrying those patches if they're already > upstream. Of course they are. Look at them -- they will be quite familiar. :) > 2. Can you point me to the GRO crashes that you saw? > It was in Issue-Tracker last week, but I told the person to open a bug and it looks like someone did: https://bugzilla.redhat.com/show_bug.cgi?id=507189 > 3. If I give you GRO patches for enic, benet, s2io could you test them for me? I was planning to mail the be2net-based card to Westford today, but there is a chance it could be ready in a day or two for testing up there. You do know that none of those drivers support GRO upstream, right?
(In reply to comment #5) > > Of course they are. Look at them -- they will be quite familiar. :) Please post them for RHEL5 then. > > 2. Can you point me to the GRO crashes that you saw? > > > > It was in Issue-Tracker last week, but I told the person to open a bug and it > looks like someone did: > > https://bugzilla.redhat.com/show_bug.cgi?id=507189 Which is already fixed in RHEL5. > > 3. If I give you GRO patches for enic, benet, s2io could you test them for me? > > I was planning to mail the be2net-based card to Westford today, but there is a > chance it could be ready in a day or two for testing up there. > > You do know that none of those drivers support GRO upstream, right? Well you could test the upstream patches too if you have the time :)
(In reply to comment #6) > (In reply to comment #5) > > > > Of course they are. Look at them -- they will be quite familiar. :) > > Please post them for RHEL5 then. > If you are going to get rid of all LRO, then there is really not need (especially since 507189 isn't what I thought it might be). > > > 2. Can you point me to the GRO crashes that you saw? > > > > > > > It was in Issue-Tracker last week, but I told the person to open a bug and it > > looks like someone did: > > > > https://bugzilla.redhat.com/show_bug.cgi?id=507189 > > Which is already fixed in RHEL5. > I see that now -- I didn't look at the bug before I sent it to you since I only had the issue-tracker ticket open from Friday and it was not linked to the bz at that point. Glad it is resolved. > > > 3. If I give you GRO patches for enic, benet, s2io could you test them for me? > > > > I was planning to mail the be2net-based card to Westford today, but there is a > > chance it could be ready in a day or two for testing up there. > > > > You do know that none of those drivers support GRO upstream, right? > > Well you could test the upstream patches too if you have the time :) I can probably try them, but I would like to put these cards in the mail today, so be2net will probably not get tested for a day or two.
Mark, can you give my test kernels a try? They have two patches which will probably help with cards still stuck using LRO. http://people.redhat.com/agospoda/#rhel5
be2net has no GRO upstream yet. If Herbert has a patch, I can test it immediately. If no one has a GRO patch yet, we can do it work on that. Thanks. Subbu
Andy, the only card on the list that I have access to is the s2io. My card is a PCI-e x4 so full gro performance will. Also, given our machine configs it will take a week or too to get the systems reconfigured to do this
We have a GRO port of be2net ready now. Will it help if we submit a patch for GRO ?
Yes please submit it upstream. Thanks!
Created attachment 354179 [details] Patch to use GRO instead of LRO in be2net Upstream patch could not be tested today due to a disk failure. Will be doing it shortly. Patch against el5.158 driver source is attached. Limitted testing has been done. More testing to follow. Subbu
Upstream patch to replace LRO with GRO in be2net was submitted today. Subbu
As we already have the following patch in RHEL5, I think we can close this bug. commit d6543abe29bb59e0a6109d7f4c13384bfdf96d21 Author: Andy Gospodarek <gospo> Date: Thu Sep 10 16:26:48 2009 -0400 [net] bridge: fix LRO crash with tun *** This bug has been marked as a duplicate of bug 483646 ***