Description of problem: While trying to test some network performance to a guest, we switched to a bnx2x based card. When trying to run a high load from an external box to the guest we can make the kernel panic every time. The crashes disappear when we disable the TPA functionality of the bnx2x driver. Version-Release number of selected component (if applicable): This is "stock" RHEL5.3 How reproducible: Everytime Steps to Reproduce: 1. Start guest 2. Run netperf to guest from external box over bnx2x device 3. Actual results: Expected results: Additional info: ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at drivers/net/tun.c:444 invalid opcode: 0000 [1] SMP last sysfs file: /class/net/lo/ifindex CPU 0 Modules linked in: tun ipt_MASQUERADE iptable_nat ip_nat xt_state ip_conntrack nfnetlink ipt_REJECT xt_tcpudp iptable_filter d Pid: 6912, comm: qemu-kvm Tainted: G 2.6.18-128.el5 #1 RIP: 0010:[<ffffffff886f57b0>] [<ffffffff886f57b0>] :tun:tun_chr_readv+0x2b1/0x3a6 RSP: 0018:ffff8102202c5e48 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8102202c5e98 RCX: 0000000004010000 RDX: ffff810227063680 RSI: ffff8102202c5e9e RDI: ffff8102202c5e92 RBP: 0000000000010ff6 R08: 0000000000000000 R09: 0000000000000001 R10: ffff8102202c5e94 R11: 0000000000000202 R12: ffff8102275357c0 R13: ffff81022755e500 R14: 0000000000000000 R15: ffff8102202c5ef8 FS: 00002ae4398db980(0000) GS:ffffffff803ac000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00002ae4ab514000 CR3: 0000000221344000 CR4: 00000000000026e0 Process qemu-kvm (pid: 6912, threadinfo ffff8102202c4000, task ffff81022e58d820) Stack: 00000000498735cb ffff810229d1a3c0 0000000000000000 ffff81022e58d820 ffffffff8008a461 ffff81022755e528 ffff81022755e528 ffffffff8009f925 000005ea05ea0000 ffff8102209d0000 00001051143e1600 ffffffff8003c00e Call Trace: [<ffffffff8008a461>] default_wake_function+0x0/0xe [<ffffffff8009f925>] enqueue_hrtimer+0x55/0x70 [<ffffffff8003c00e>] hrtimer_start+0xbc/0xce [<ffffffff886f58bf>] :tun:tun_chr_read+0x1a/0x1f [<ffffffff8000b3f3>] vfs_read+0xcb/0x171 [<ffffffff800117d4>] sys_read+0x45/0x6e [<ffffffff8005d116>] system_call+0x7e/0x83 Code: 0f 0b 68 40 62 6f 88 c2 bc 01 f6 42 0a 08 74 0c 80 4c 24 41 RIP [<ffffffff886f57b0>] :tun:tun_chr_readv+0x2b1/0x3a6 RSP <ffff8102202c5e48> <0>Kernel panic - not syncing: Fatal exception
The code is: if (sinfo->gso_type & SKB_GSO_TCPV4) gso.gso_type = VIRTIO_NET_HDR_GSO_TCPV4; else if (sinfo->gso_type & SKB_GSO_TCPV6) gso.gso_type = VIRTIO_NET_HDR_GSO_TCPV6; else BUG(); Since this seems to be bnx2x rx path specific, one potential theory is that the card is overrunning the linear data buffer supplied to it and overwriting skb_shared_info which lives at the end of the linear data.
My test kernels have been updated to include a patch for this bugzilla. http://people.redhat.com/agospoda/#rhel5 Please test them and report back your results. Without immediate feedback there is a good chance this or any other fix for this driver will not be included in the upcoming update.
Eilon, We are noticing BUG-halts when using bnx2x inside a bridging interface with TPA enabled. When I remove the place where gso_size is set: diff --git a/drivers/net/bnx2x_main.c b/drivers/net/bnx2x_main.c index 19ddcc2..2fec4a9 100644 --- a/drivers/net/bnx2x_main.c +++ b/drivers/net/bnx2x_main.c @@ -1214,11 +1214,6 @@ static int bnx2x_fill_frag_skb(struct bnx2x *bp, struct bnx2x_fastpath *fp, frag_size = le16_to_cpu(fp_cqe->pkt_len) - len_on_bd; pages = BCM_PAGE_ALIGN(frag_size) >> BCM_PAGE_SHIFT; - /* This is needed in order to enable forwarding support */ - if (frag_size) - skb_shinfo(skb)->gso_size = min((u32)BCM_PAGE_SIZE, - max(frag_size, (u32)len_on_bd)); - #ifdef BNX2X_STOP_ON_ERROR if (pages > 8*PAGES_PER_SGE) { BNX2X_ERR("SGL length is too long: %d. CQE index is %d\n", the box doesn't hit the same bug-halt, but I'm not sure we should use TPA when the bnx2x cards are part of a bridge. Performance is terrible with the above patch too. The upstream driver does not allow TPA to be enabled when the device is a member of a bridge group since the ethtool call to disable LRO (TPA) is made whenever the device is inserted into a bridge (or when routing is enabled). Since we don't have the same ethtool options available in RHEL5, should we just advise customers to disable TPA when using these cards with virtualization? If you can elaborate on the purpose of the chunk of code I'm removing with the patch above that would be helpful too. The comment indicates it was there for forwarding purposes, but it sure does introduce a panic when we are forwarding. Thanks!
Hi, As you wrote, TPA (or any other LRO) should be disabled when working with forwarding/bridging. The function skb_warn_if_lro should catch the case where the user mistakenly turns TPA (LRO) on when using forwarding/bridging – it does so by checking the gso_size != 0 while gso_type == 0. This is why we need those lines in the driver. The problem is that not all devices can support fragmented SKB and so we cannot create such SKBs in forwarding. If we set the gso_type as well as the gso_size then we get good performance when using only the bnx2x as both the receiver and the transmitter – but this is not an acceptable solution since we cannot be sure which device will be used as the transmitter. I think that the only current alternative is to set the gso_size and to document this limitation (no TPA when using forwarding/bridging). Do you agree? Eilon
Please see the patch I just posted to netdev (bridge: Fix LRO crash with tun).
Inclusion of skb_warn_lro_forwarding and friends should probably be added to RHEL5 along with the patch you just posted to netdev. That will resolve the panic. If we did that I would feel OK leaving TPA enabled by default. If not TPA should probably be disabled by default.
Created attachment 331312 [details] rhel5-lro-test.patch This is a patch against rhel5 that would give us the functions that would look out for LRO/TPA usage with forwarding. This would be a nice workaround since we can't use ethtool to disable it automatically. Backport of this: commit 4497b0763cb1afae463f5e144c28b5d806e28b60 Author: Ben Hutchings <bhutchings> Date: Thu Jun 19 16:22:28 2008 -0700 net: Discard and warn about LRO'd skbs received for forwarding Add skb_warn_if_lro() to test whether an skb was received with LRO and warn if so. Change br_forward(), ip_forward() and ip6_forward() to call it) and discard the skb if it returns true. and this: http://marc.info/?l=linux-netdev&m=123417379122404&w=2
Created attachment 331318 [details] rhel5-lro-test2.patch Sorry, that last patch applied, but didn't compile. Here is an updated one.
(In reply to comment #4) > Hi, > > As you wrote, TPA (or any other LRO) should be disabled when working with > forwarding/bridging. The function skb_warn_if_lro should catch the case where > the user mistakenly turns TPA (LRO) on when using forwarding/bridging – it does > so by checking the gso_size != 0 while gso_type == 0. This is why we need those > lines in the driver. > > The problem is that not all devices can support fragmented SKB and so we cannot > create such SKBs in forwarding. If we set the gso_type as well as the gso_size > then we get good performance when using only the bnx2x as both the receiver and > the transmitter – but this is not an acceptable solution since we cannot be > sure which device will be used as the transmitter. > > I think that the only current alternative is to set the gso_size and to > document this limitation (no TPA when using forwarding/bridging). > > Do you agree? > > Eilon I do agree, Eilon. I will try and get the patch in comment #8 included in RHEL5 at some point. In the interim we will have to make sure that customers know to disable LRO/TPA when using virtualization of any kind since the use of bridging or routing for networking can obviously cause problems.
Created attachment 360568 [details] rhel5-tun-fix.patch I am proposing the following patch. Testing feedback is appreciated.
(In reply to comment #12) > Created an attachment (id=360568) [details] > rhel5-tun-fix.patch > I am proposing the following patch. Testing feedback is appreciated. HP verified the patch successfully.
(In reply to comment #16) > (In reply to comment #12) > > Created an attachment (id=360568) [details] [details] > > rhel5-tun-fix.patch > > I am proposing the following patch. Testing feedback is appreciated. > HP verified the patch successfully. The patch appears to fix the panic issue, but doesn't address the second step of the fix which is how to disable LRO. The patch will drop the packet and log a message "received packets cannot be forwarded while LRO is enabled". Do we need have a knowledgebase to let customers know how to disable LRO for specific servers and NIC's?
in kernel-2.6.18-166.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Please do NOT transition this bugzilla state to VERIFIED until our QE team has sent specific instructions indicating when to do so. However feel free to provide a comment indicating that this fix has been verified.
Hushan I was testing guest performance so the I was bridging the interface to the guest. If you are running netperf TCP_STREAM to a guest successfully from an external box then you aren't hitting the problem.
*** Bug 522458 has been marked as a duplicate of this bug. ***
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0178.html
*** Bug 503851 has been marked as a duplicate of this bug. ***
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days