Created attachment 361398 [details] Adapted linux-netdev HTB bugfix patch to 2.6.18. Tested extensively, working. Description of problem: Use of HTB for traffic shaping causes crashes on Redhat 5.3 using certain ethernet drivers (i.e. tg3, bnx2). Version-Release number of selected component (if applicable): Linux kernel 2.6.18-4.1.el5, 2.6.18-7.1.el5, and 2.6.18-164.el5. Only reproduced on 64-bit hardware (I am not running 32-bit), but I believe this bug has the potential to manifest across architectures. How reproducible: On Redhat 5.3 x86_64 on a Redhat-certified Dell PowerEdge T105, as soon as HTB queueing disciplines are enabled on a network interface that's handling traffic, the machine will hard-lock within minutes. The longest we ever saw this last without crashing was in an hour, and that was only once. The problem is 100% reproducible. This crash was only occuring on machines with certain ethernet drivers, and at first I explored this as a potential ethernet driver problem - albeit one that was difficult to believe, since Dell supports Redhat on that hardware. However, research revealed this is actually a bug within HTB itself, that is only revealed when traffic shaping via HTB is used in concert with certain ethernet drivers. http://marc.info/?l=linux-netdev&m=121607054411649&w=2 The link above points to the critical email in a long thread on linux-netdev regarding this bug. In the end, this bug was fixed in mainline somewhere after 2.6.25.6. Unfortunately this is not much help when running a 2.6.18 derivative. In fact, the patch posted by Jarek Poplawski on linux-netdev doesn't even apply against the vintage HTB in Redhat 5. I back-ported the referenced patch to the current kernel and rebuilt from source (by now I have done this 3 times, to 2.6.18-4.1.el5, 2.6.18-7.1.el5, and 2.6.18-164.el5). The changes were not significant. I ran this patched kernel on a large number of servers over a period of 6 weeks. This cured the crash. No other adverse effects were observed. Exact steps to reproduce, and the patch, are included below. Steps to Reproduce: 1) Use hardware with an affected ethernet driver. Known affected: tg3, bnx2 2) Make any use of HTB. For example: # Here's a simple HTB shaping regime with a 50kbit ceiling. # clear any existing /sbin/tc qdisc del dev eth0 root /sbin/tc qdisc add dev eth0 root handle 1: htb default 11 /sbin/tc class add dev eth0 parent 1: classid 1:1 htb rate 50kbit ceil 50kbit # Class 1:10, SSH, DNS, ICMP, SYN packets /sbin/tc class add dev eth0 parent 1:1 classid 1:10 htb rate 40kbit ceil 50kbit prio 0 # Class 1:11, Everything Else /sbin/tc class add dev eth0 parent 1:1 classid 1:11 htb rate 10kbit ceil 50kbit prio 2 # Following examples, setting low prio class to SFQ /sbin/tc qdisc add dev eth0 parent 1:11 handle 110: sfq perturb 10 # Set up for iptables filtering via FWMARK /sbin/tc filter add dev eth0 parent 1:0 protocol ip prio 1 handle 1 fw classid 1:10 /sbin/tc filter add dev eth0 parent 1:0 protocol ip prio 2 handle 2 fw classid 1:11 # Create iptables rules to mark packets /sbin/iptables -t mangle -A OUTPUT -p tcp -m tcp --tcp-flags SYN,RST,ACK SYN -j MARK --set-mark 0x1 /sbin/iptables -t mangle -A OUTPUT -p tcp -m tcp --tcp-flags SYN,RST,ACK SYN -j RETURN # DNS /sbin/iptables -t mangle -A OUTPUT -p udp --dport 53 -j MARK --set-mark 0x1 /sbin/iptables -t mangle -A OUTPUT -p udp --dport 53 -j RETURN # SSH /sbin/iptables -t mangle -A OUTPUT -p tcp -m tcp --sport 22 -j MARK --set-mark 0x1 /sbin/iptables -t mangle -A OUTPUT -p tcp -m tcp --sport 22 -j RETURN /sbin/iptables -t mangle -A OUTPUT -p tcp -m tcp --dport 22 -j MARK --set-mark 0x1 /sbin/iptables -t mangle -A OUTPUT -p tcp -m tcp --dport 22 -j RETURN # ICMP /sbin/iptables -t mangle -A OUTPUT -p icmp -j MARK --set-mark 0x1 /sbin/iptables -t mangle -A OUTPUT -p icmp -j RETURN # Mark everything else /sbin/iptables -t mangle -A OUTPUT -j MARK --set-mark 0x3 3) Now exercise the network interface. Kaboom. Actual results: Crash. Expected results: OS functions normally. Additional info: Patch attached.
Hi all. It's been 3 months... is there any chance of movement on this bug? Not that I don't love patching every new kernel by hand, but this patch fixes a kernel crash bug caused by the use of a basic feature on vendor-supported and widely-used hardware. I'm sure I'm not the only one affected. But it's likely most people, when they come up against this, simply go away saying "Redhat (and/or Linux) just isn't stable" and you never hear from them again. I don't think it would take someone very long to look at this tiny patch, as well as its provenance in netdev, and validate and apply it?
OK, the latest kernel: kernel-2.6.18-164.11.1.el5.x86_64.rpm Has now broken this patch. Come on guys. You're in there making changes all around this. Can someone please at least say something? Consider changing the status of the bug?
Created attachment 386726 [details] Updated linux-netdev HTB bugfix patch to kernel-2.6.18-164.11.1.el5 The previous patch conflicted with the patch from John Feeney ("linux-2.6-net-sched-fix-panic-in-bnx2_poll_work.patch"), which seems to be associated with Bug #539686 and #526481. This patch is updated to work with kernel-2.6.18-164.11.1.el5 and later.
Hi David. Looking at the patch and comparing the code with upstream, I do not see it was fixed there. Did you try to reproduce the issue with an upstream kernel? Please correct me if I'm wrong.
An interesting question. I had read David Miller's comment and assumed it was committed. But I, too, cannot see the fix in mainline. So this is confusing. We should reach out to David or Jared and ask them directly. I might suggest, any questions could get better answers coming from a @redhat.com email than from me? But I am happy to do it if that would help. I will also construct a specific test using the exact same hardware and Ubuntu 9.10 (2.6.31). I think anecdotally the world would know it if HTB was broken on all this hardware, but who knows? I will report back once I have some results.
You could also use the latest Fedora, F12! http://fedoraproject.org/en/get-fedora Support for booting from a livecd or liveusb image in Fedora will probably be much quicker and less destructive than a full install anyway.
Created attachment 386938 [details] Script for setting up simple HTB on eth0 Figured I'd add the HTB lines as a script for easy testing.
Jiri, the fix did go in, see: commit 69747650c814a8a79fef412c7416adf823293a3e Author: David S. Miller <davem> Date: Sun Aug 17 23:55:36 2008 -0700 pkt_sched: Fix return value corruption in HTB and TBF. Based upon a bug report by Josip Rodin. Packet schedulers should only return NET_XMIT_DROP iff the packet really was dropped. If the packet does reach the device after we return NET_XMIT_DROP then TCP can crash because it depends upon the enqueue path return values being accurate. Signed-off-by: David S. Miller <davem> Only a much simpler fix was required because first we fixed the bypass signalling with the patch posted in: http://marc.info/?l=linux-netdev&m=121783222306069&w=2 That eliminated the need for a lot of the special logic we used in the original fix attempt in HTB. In fact the original fix is wrong because it has some problems with qlen accounting etc. Unfortunately, getting this into our tree is going to be messy. The original patch has known bugs, so we can't use just that. You'd need to backport the NET_XMIT_{STOLEN,BYPASS} masking changes, then put my final HTB fix on top. But that's a KABI breaker, and thus something we really can't put into an enterprise kernel release. Unfortunately, it's the only way to fix the bug properly and handle all known cases.
Disregard some of what I just said, take a look at: http://marc.info/?l=linux-netdev&m=121904171303074&w=2 We did two different versions of the fix, one that went into 2.6.26-stable and the one I mentioned in my previous comment for the mainline.
Dave, thanks for a quick response. Looking at rhel5.5 tree, we have upstream fix "69747650c814a8a79fef412c7416adf823293a3e" in: http://git.engineering.redhat.com/?p=users/jwilson/rhel5/kernel;a=commitdiff;h=5c075c4fccbcb8023784ef6d3ca48e058535fb90 But if I understand that correctly we need rather the following: http://git.kernel.org/?p=linux/kernel/git/hpa/linux-2.6-allstable.git;a=commitdiff;h=685f605a498b73759cbcbc816089e804710fcc48
Yeah if you look at what we applied, it doesn't update the 'ret' value so it won't return the right thing, unlike the upstream fix. What we did was a poor backport, as the upstream patch referenced in our change depended upon other changes that were made to these files.
Trying to reproduce this with 2.6.18-164.el5 kernel on: 1) dell380-2.rhts.bos.redhat.com Ethernet controller: Broadcom Corporation NetXtreme BCM5751 Gigabit Ethernet PCI Express (rev 01) (14e4:1677) driver: tg3 version: 3.96-1 firmware-version: 5751-v3.44a bus-info: 0000:04:00.0 2) amd-toonie2-02.lab.bos.redhat.com Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 12) (14e4:1639) driver: bnx2 version: 1.9.3 firmware-version: 4.6.2 bus-info: 0000:01:00.0 I'm using "htb.sh" script from comment #9 to setup HTB. Then I execute "iperf -c" on these hosts against another host acting as "iperf -s". After ~3hours it's still running. davidkwood, what am I doing wrong?
davidkwood, I made a test build including patch from 2.6.26 stable tree. RPMs are here: http://people.redhat.com/jpirko/test/bz523873/ Can you please test this on your machine/machines? Thanks
This is interesting. In our first test with the latest RHEL kernel we had trouble reproducing the issue! Trust me, when we initially encountered the fix, this was not a difficult problem to observe, so something may be wrong with our test case (which is of course necessarily different from the production system this issue repeatedly brought down). We will widen our tests here, Jiri, and both make a more determined attempt to reproduce the problem, as well as run your custom kernel for comparison. One other possibility. David, your comments are very interesting; I want to make sure I've followed. Are you saying that you've addressed this issue post 2.6.18-128.4.1.el5, but via another approach?
Yes, there was a long discussion about how the return value of ->enqueue() methods is handled. As a result, some pieces of state are passed as boolean state bits which are OR'd into the ->enqueue() return value. The higher level code that interfaces with the generic transmit layer in net/core/dev.c, masks out those state bits, so that the code above them still sees the same kinds of values. As a result, this made the final fix to sch_htb.c much simpler, as less cases had to be handled explicitly.
Fascinating. Thank you, David. So, you're saying this issue is fixed in RHEL... sometime after I originally opened this bug, and before the current kernel? Do you happen to have the Issue #? I'm sure Jiri will be glad to mark this as a duplicate. :)
Hi davidkwood. Any news in testing this issue (e.g. with kernel from comment#15) ? Thanks. Jirka
Jiri - I was actually still waiting on clarification from Mr. Miller before proceeding, since this may explain my existing difficulty with duplicating this in 2.6.18-164.11.1.el5, despite how bad this problem was in i.e. 2.6.18-128.4.1.el5. It sounds like he's saying this bug was already fixed in a different way somewhere in between these two versions. Do I have it right?
(In reply to comment #20) > Jiri - I was actually still waiting on clarification from Mr. Miller before > proceeding, since this may explain my existing difficulty with duplicating this > in 2.6.18-164.11.1.el5, despite how bad this problem was in i.e. > 2.6.18-128.4.1.el5. It sounds like he's saying this bug was already fixed in a > different way somewhere in between these two versions. Do I have it right? So if I understand you correctly, you are unable to reproduce the issue with 2.6.18-164.11.1.el5? The fix I mentioned in comment #12 went in in this release. If this resolved the problem for you I would close this as duplicate.
That's correct. Since writing last I've expanded my tests running unpatched 2.6.18-164.11.1.el5. So far, so good. I'd close it on up. If we see anything over here we'll holler. Thank you very much, all!
ok - closing this as duplicate. Feel free to reopen if the problem reappears. Thanks. *** This bug has been marked as a duplicate of bug 526481 ***