Bug 523873 - Use of HTB for traffic shaping causes crashes when using common Dell ethernet hardware/drivers (i.e. tg3, bnx2), includes patch
Summary: Use of HTB for traffic shaping causes crashes when using common Dell ethernet...
Keywords:
Status: CLOSED DUPLICATE of bug 526481
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.3
Hardware: x86_64
OS: Linux
low
high
Target Milestone: rc
: ---
Assignee: Jiri Pirko
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-09-16 23:37 UTC by davidkwood
Modified: 2015-05-05 01:17 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-02-16 08:24:35 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Adapted linux-netdev HTB bugfix patch to 2.6.18. Tested extensively, working. (3.48 KB, patch)
2009-09-16 23:37 UTC, davidkwood
no flags Details | Diff
Updated linux-netdev HTB bugfix patch to kernel-2.6.18-164.11.1.el5 (3.46 KB, patch)
2010-01-25 23:26 UTC, davidkwood
no flags Details | Diff
Script for setting up simple HTB on eth0 (1.80 KB, application/x-sh)
2010-01-26 22:43 UTC, davidkwood
no flags Details

Description davidkwood 2009-09-16 23:37:16 UTC
Created attachment 361398 [details]
Adapted linux-netdev HTB bugfix patch to 2.6.18. Tested extensively, working. 

Description of problem:

Use of HTB for traffic shaping causes crashes on Redhat 5.3 using certain ethernet drivers (i.e. tg3, bnx2). 

Version-Release number of selected component (if applicable):

Linux kernel 2.6.18-4.1.el5, 2.6.18-7.1.el5, and 2.6.18-164.el5. Only reproduced on 64-bit hardware (I am not running 32-bit), but I believe this bug has the potential to manifest across architectures.

How reproducible:

On Redhat 5.3 x86_64 on a Redhat-certified Dell PowerEdge T105, as soon as HTB queueing disciplines are enabled on a network interface that's handling traffic, the machine will hard-lock within minutes. The longest we ever saw this last without crashing was in an hour, and that was only once. The problem is 100% reproducible.

This crash was only occuring on machines with certain ethernet drivers, and at first I explored this as a potential ethernet driver problem - albeit one that was difficult to believe, since Dell supports Redhat on that hardware. However, research revealed this is actually a bug within HTB itself, that is only revealed when traffic shaping via HTB is used in concert with certain ethernet drivers.

http://marc.info/?l=linux-netdev&m=121607054411649&w=2

The link above points to the critical email in a long thread on linux-netdev regarding this bug. In the end, this bug was fixed in mainline somewhere after 2.6.25.6. Unfortunately this is not much help when running a 2.6.18 derivative. In fact, the patch posted by Jarek Poplawski on linux-netdev doesn't even apply against the vintage HTB in Redhat 5. 

I back-ported the referenced patch to the current kernel and rebuilt from source (by now I have done this 3 times, to 2.6.18-4.1.el5, 2.6.18-7.1.el5, and 2.6.18-164.el5). The changes were not significant. I ran this patched kernel on a large number of servers over a period of 6 weeks. This cured the crash. No other adverse effects were observed. Exact steps to reproduce, and the patch, are included below.

Steps to Reproduce:

1) Use hardware with an affected ethernet driver. Known affected: tg3, bnx2

2) Make any use of HTB. For example: 

# Here's a simple HTB shaping regime with a 50kbit ceiling.

# clear any existing 
/sbin/tc qdisc del dev eth0 root
/sbin/tc qdisc add dev eth0 root handle 1: htb default 11
/sbin/tc class add dev eth0 parent 1: classid 1:1 htb rate 50kbit ceil 50kbit
# Class 1:10, SSH, DNS, ICMP, SYN packets
/sbin/tc class add dev eth0 parent 1:1 classid 1:10 htb rate 40kbit ceil 50kbit prio 0
# Class 1:11, Everything Else
/sbin/tc class add dev eth0 parent 1:1 classid 1:11 htb rate 10kbit ceil 50kbit prio 2
# Following examples, setting low prio class to SFQ
/sbin/tc qdisc add dev eth0 parent 1:11 handle 110: sfq perturb 10
# Set up for iptables filtering via FWMARK
/sbin/tc filter add dev eth0 parent 1:0 protocol ip prio 1 handle 1 fw classid 1:10
/sbin/tc filter add dev eth0 parent 1:0 protocol ip prio 2 handle 2 fw classid 1:11
# Create iptables rules to mark packets
/sbin/iptables -t mangle -A OUTPUT -p tcp -m tcp --tcp-flags SYN,RST,ACK SYN -j MARK --set-mark 0x1
/sbin/iptables -t mangle -A OUTPUT -p tcp -m tcp --tcp-flags SYN,RST,ACK SYN -j RETURN
# DNS
/sbin/iptables -t mangle -A OUTPUT -p udp --dport 53 -j MARK --set-mark 0x1
/sbin/iptables -t mangle -A OUTPUT -p udp --dport 53 -j RETURN
# SSH
/sbin/iptables -t mangle -A OUTPUT -p tcp -m tcp --sport 22 -j MARK --set-mark 0x1
/sbin/iptables -t mangle -A OUTPUT -p tcp -m tcp --sport 22 -j RETURN
/sbin/iptables -t mangle -A OUTPUT -p tcp -m tcp --dport 22 -j MARK --set-mark 0x1
/sbin/iptables -t mangle -A OUTPUT -p tcp -m tcp --dport 22 -j RETURN
# ICMP
/sbin/iptables -t mangle -A OUTPUT -p icmp -j MARK --set-mark 0x1
/sbin/iptables -t mangle -A OUTPUT -p icmp -j RETURN
# Mark everything else
/sbin/iptables -t mangle -A OUTPUT -j MARK --set-mark 0x3

3) Now exercise the network interface. Kaboom.

Actual results:

Crash.

Expected results:

OS functions normally.

Additional info:

Patch attached.

Comment 1 davidkwood 2009-12-21 20:52:22 UTC
Hi all. It's been 3 months... is there any chance of movement on this bug?

Not that I don't love patching every new kernel by hand, but this patch fixes a kernel crash bug caused by the use of a basic feature on vendor-supported and widely-used hardware. 

I'm sure I'm not the only one affected. But it's likely most people, when they come up against this, simply go away saying "Redhat (and/or Linux) just isn't stable" and you never hear from them again. 

I don't think it would take someone very long to look at this tiny patch, as well as its provenance in netdev, and validate and apply it?

Comment 2 davidkwood 2010-01-25 21:54:32 UTC
OK, the latest kernel:

kernel-2.6.18-164.11.1.el5.x86_64.rpm

Has now broken this patch. 

Come on guys. You're in there making changes all around this. Can someone please at least say something? Consider changing the status of the bug?

Comment 3 davidkwood 2010-01-25 23:26:33 UTC
Created attachment 386726 [details]
Updated linux-netdev HTB bugfix patch to kernel-2.6.18-164.11.1.el5

The previous patch conflicted with the patch from John Feeney ("linux-2.6-net-sched-fix-panic-in-bnx2_poll_work.patch"), which seems to be associated with Bug #539686 and #526481. 

This patch is updated to work with kernel-2.6.18-164.11.1.el5 and later.

Comment 5 Jiri Pirko 2010-01-26 19:54:57 UTC
Hi David. Looking at the patch and comparing the code with upstream, I do not see it was fixed there. Did you try to reproduce the issue with an upstream kernel? Please correct me if I'm wrong.

Comment 6 davidkwood 2010-01-26 20:57:34 UTC
An interesting question. I had read David Miller's comment and assumed it was committed. But I, too, cannot see the fix in mainline. So this is confusing. 

We should reach out to David or Jared and ask them directly. I might suggest, any questions could get better answers coming from a @redhat.com email than from me? But I am happy to do it if that would help.

I will also construct a specific test using the exact same hardware and Ubuntu 9.10 (2.6.31). I think anecdotally the world would know it if HTB was broken on all this hardware, but who knows? 

I will report back once I have some results.

Comment 7 Andy Gospodarek 2010-01-26 21:06:32 UTC
You could also use the latest Fedora, F12!

http://fedoraproject.org/en/get-fedora

Support for booting from a livecd or liveusb image in Fedora will probably be much quicker and less destructive than a full install anyway.

Comment 9 davidkwood 2010-01-26 22:43:52 UTC
Created attachment 386938 [details]
Script for setting up simple HTB on eth0

Figured I'd add the HTB lines as a script for easy testing.

Comment 10 David Miller 2010-01-27 05:39:54 UTC
Jiri, the fix did go in, see:

commit 69747650c814a8a79fef412c7416adf823293a3e
Author: David S. Miller <davem>
Date:   Sun Aug 17 23:55:36 2008 -0700

    pkt_sched: Fix return value corruption in HTB and TBF.
    
    Based upon a bug report by Josip Rodin.
    
    Packet schedulers should only return NET_XMIT_DROP iff
    the packet really was dropped.  If the packet does reach
    the device after we return NET_XMIT_DROP then TCP can
    crash because it depends upon the enqueue path return
    values being accurate.
    
    Signed-off-by: David S. Miller <davem>

Only a much simpler fix was required because first we fixed
the bypass signalling with the patch posted in:

http://marc.info/?l=linux-netdev&m=121783222306069&w=2

That eliminated the need for a lot of the special logic we used
in the original fix attempt in HTB.  In fact the original fix
is wrong because it has some problems with qlen accounting etc.

Unfortunately, getting this into our tree is going to be messy.
The original patch has known bugs, so we can't use just that.
You'd need to backport the NET_XMIT_{STOLEN,BYPASS} masking
changes, then put my final HTB fix on top.

But that's a KABI breaker, and thus something we really can't
put into an enterprise kernel release.  Unfortunately, it's
the only way to fix the bug properly and handle all known
cases.

Comment 11 David Miller 2010-01-27 05:46:40 UTC
Disregard some of what I just said, take a look at:

http://marc.info/?l=linux-netdev&m=121904171303074&w=2

We did two different versions of the fix, one that went into
2.6.26-stable and the one I mentioned in my previous comment for
the mainline.

Comment 12 Jiri Pirko 2010-01-27 11:57:34 UTC
Dave, thanks for a quick response.

Looking at rhel5.5 tree, we have upstream fix "69747650c814a8a79fef412c7416adf823293a3e" in:
http://git.engineering.redhat.com/?p=users/jwilson/rhel5/kernel;a=commitdiff;h=5c075c4fccbcb8023784ef6d3ca48e058535fb90

But if I understand that correctly we need rather the following:
http://git.kernel.org/?p=linux/kernel/git/hpa/linux-2.6-allstable.git;a=commitdiff;h=685f605a498b73759cbcbc816089e804710fcc48

Comment 13 David Miller 2010-01-27 12:20:25 UTC
Yeah if you look at what we applied, it doesn't update the 'ret'
value so it won't return the right thing, unlike the upstream fix.

What we did was a poor backport, as the upstream patch referenced
in our change depended upon other changes that were made to these
files.

Comment 14 Jiri Pirko 2010-01-27 15:09:35 UTC
Trying to reproduce this with 2.6.18-164.el5 kernel on:
1)
dell380-2.rhts.bos.redhat.com
Ethernet controller: Broadcom Corporation NetXtreme BCM5751 Gigabit Ethernet
PCI Express (rev 01) (14e4:1677)
driver: tg3
version: 3.96-1
firmware-version: 5751-v3.44a
bus-info: 0000:04:00.0

2)
amd-toonie2-02.lab.bos.redhat.com
Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet
(rev 12) (14e4:1639)
driver: bnx2
version: 1.9.3
firmware-version: 4.6.2
bus-info: 0000:01:00.0

I'm using "htb.sh" script from comment #9 to setup HTB. Then I execute "iperf
-c" on these hosts against another host acting as "iperf -s". After ~3hours
it's still running. davidkwood, what am I doing wrong?

Comment 15 Jiri Pirko 2010-01-27 21:43:26 UTC
davidkwood, I made a test build including patch from 2.6.26 stable tree. RPMs are here:

http://people.redhat.com/jpirko/test/bz523873/

Can you please test this on your machine/machines?

Thanks

Comment 16 davidkwood 2010-01-27 22:56:08 UTC
This is interesting. In our first test with the latest RHEL kernel we had trouble reproducing the issue! Trust me, when we initially encountered the fix, this was not a difficult problem to observe, so something may be wrong with our test case (which is of course necessarily different from the production system this issue repeatedly brought down).

We will widen our tests here, Jiri, and both make a more determined attempt to reproduce the problem, as well as run your custom kernel for comparison.

One other possibility. David, your comments are very interesting; I want to make sure I've followed. Are you saying that you've addressed this issue post 2.6.18-128.4.1.el5, but via another approach?

Comment 17 David Miller 2010-01-28 00:12:05 UTC
Yes, there was a long discussion about how the return value of
->enqueue() methods is handled.

As a result, some pieces of state are passed as boolean state bits
which are OR'd into the ->enqueue() return value.

The higher level code that interfaces with the generic transmit
layer in net/core/dev.c, masks out those state bits, so that the
code above them still sees the same kinds of values.

As a result, this made the final fix to sch_htb.c much simpler,
as less cases had to be handled explicitly.

Comment 18 davidkwood 2010-01-28 17:44:47 UTC
Fascinating. Thank you, David. So, you're saying this issue is fixed in RHEL... sometime after I originally opened this bug, and before the current kernel? 

Do you happen to have the Issue #? I'm sure Jiri will be glad to mark this as a duplicate.  :)

Comment 19 Jiri Pirko 2010-02-03 12:37:19 UTC
Hi davidkwood.

Any news in testing this issue (e.g. with kernel from comment#15) ?

Thanks.

Jirka

Comment 20 davidkwood 2010-02-09 22:25:12 UTC
Jiri - I was actually still waiting on clarification from Mr. Miller before proceeding, since this may explain my existing difficulty with duplicating this in 2.6.18-164.11.1.el5, despite how bad this problem was in i.e. 2.6.18-128.4.1.el5. It sounds like he's saying this bug was already fixed in a different way somewhere in between these two versions. Do I have it right?

Comment 21 Jiri Pirko 2010-02-10 09:50:41 UTC
(In reply to comment #20)
> Jiri - I was actually still waiting on clarification from Mr. Miller before
> proceeding, since this may explain my existing difficulty with duplicating this
> in 2.6.18-164.11.1.el5, despite how bad this problem was in i.e.
> 2.6.18-128.4.1.el5. It sounds like he's saying this bug was already fixed in a
> different way somewhere in between these two versions. Do I have it right?    

So if I understand you correctly, you are unable to reproduce the issue with 2.6.18-164.11.1.el5? The fix I mentioned in comment #12 went in in this release. If this resolved the problem for you I would close this as duplicate.

Comment 22 davidkwood 2010-02-16 00:53:48 UTC
That's correct. Since writing last I've expanded my tests running unpatched 2.6.18-164.11.1.el5. So far, so good. 

I'd close it on up. If we see anything over here we'll holler. 

Thank you very much, all!

Comment 23 Jiri Pirko 2010-02-16 08:24:35 UTC
ok - closing this as duplicate. Feel free to reopen if the problem reappears. Thanks.

*** This bug has been marked as a duplicate of bug 526481 ***


Note You need to log in before you can comment on or make changes to this bug.