Bug 130172 - HTB QoS traffic shaping crashes kernel panic on VPN/SWAN
Summary: HTB QoS traffic shaping crashes kernel panic on VPN/SWAN
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: iproute
Version: 1
Hardware: i686
OS: Linux
medium
high
Target Milestone: ---
Assignee: Radek Vokál
QA Contact: Brock Organ
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2004-08-17 18:22 UTC by Trevor Cordes
Modified: 2007-11-30 22:10 UTC (History)
0 users

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-10-31 10:58:57 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
my setup script that causes the panic (2.46 KB, text/plain)
2004-08-17 18:23 UTC, Trevor Cordes
no flags Details
manual entry of screen panic debug info (1.54 KB, text/plain)
2004-08-17 18:24 UTC, Trevor Cordes
no flags Details

Description Trevor Cordes 2004-08-17 18:22:15 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.1)
Gecko/20031114

Description of problem:
[Simultaneously posted to LARTC mailing list -- will keep both sides
updated]

I need to setup QoS on a linux router/firewall I maintain.  I spent 10
hours reading everything I could find on QoS/HTB/iproute2 and came up
with what I thought made sense for my situation.  So I deployed it and
BOOM!  KERNEL PANIC!  Not what I was expecting... now the debugging
begins.

I reproduced the panic twice on two different (yet almost identically
configured) machines.  I can reproduce the panic on demand by doing a
specific set of actions.
 
First, my setup:
 
I have 2 machines at different locations connected via internet.  Both
machines are stock Fedora Core 1 kernel 2.4.22-1.2179.nptl.  I run
free/SWAN (stock FC binary rpm's) between the 2 machines for ipsec
VPN.  I run VoIP, VNC and all other inter-office traffic through the
VPN.  The internet connection is ADSL with 400kbits/s up and 1500 or
so down.  VoIP is routed but not MASQ'd.  VNC is MASQ'd (neither the
originating nor destination machines are the linux boxes themseleves).

Second, my goals:
 
Give a fixed minimum bandwidth and high priority to VoIP through VPN.
Same, but less so, for VNC through VPN.  Give the VPN high enough
allocation for VoIP and VNC to get through ok.  Less important little
tweaks for rarely-used outside (non IPSEC) VNC and ssh access.
 
My situation seems different from the examples I've seen because *I
believe* I need to have 2 completely separate qdiscs, 1 for ppp0 (the
DSL) and 1 for ipsec0 (the freeSWAN VPN).  Yet ipsec0 eventually goes
over ppp0 so they are intertwined.  I have a funny feeling this is
where the crash is coming from.
 
See my setup script and panic screen dump (attached).
 
Everything seemed to go great until I tried VNC'ing in from one office
  to the other.  The VNC screen would pop up, do a first draw, then
completely freeze.  From that point on the remote linux router is  
frozen -- kernel panic.  Strange that the bug would only trigger AFTER
sending the 100-200kB of the initial VNC screen.
 
Looking at my config, I will note a couple of questions I had while
writing it that weren't answered in the docs I found:
 
1. The "tc filter add ... protocol ip" thing confused me.  What
exactly is the "protocol ip" for?  I originally though that it should
read "protocol 50" for the ipsec stuff, but that didn't seem to catch
the packets, so I switched it back to "ip".  Weird, while testing with
it set to 50 (and having no packets match the rule) there were no crashes.
 
2. The iptables mangle rules will in the case of VNC and ssh *over
VPN* match two rules.  I *assume* the last executing MARK will
overwrite the previous MARK.  If for some reason the marks are ANDed
or something, perhaps that is causing the crash (filtering 1 packet
into 2 buckets?).
 
3. As I mentioned above, the fact that one qdisc will feed a separate
qdisc, because ipsec0 eventually goes out over ppp0, may be a problem?
 I wish I had seen some examples of this type of setup.
 
4. I chose HTB instead of CBQ as it seemed simpler (always a good
thing) and more suited to my exact needs.  Not sure if the bug is in
HTB itself or the general QoS stuff.


Version-Release number of selected component (if applicable):
iproute-2.4.7-13.2

How reproducible:
Always

Steps to Reproduce:
1. config QoS and iptables as per my setup script
2. run VNC over one router to the other
3. after VNC screen does initial draw the remote linux box panics
    

Actual Results:  kernel panic!

Expected Results:  everything should have run properly without a
panic, and/or a nice error should have been logged instead of a panic.

Additional info:

Comment 1 Trevor Cordes 2004-08-17 18:23:17 UTC
Created attachment 102804 [details]
my setup script that causes the panic

Comment 2 Trevor Cordes 2004-08-17 18:24:37 UTC
Created attachment 102805 [details]
manual entry of screen panic debug info

Comment 3 Trevor Cordes 2004-08-29 22:19:53 UTC
More details from more testing:

1. Tried HTB qdisc only on ppp0 (no qdisc on the ipsec0) -- no crash.

2. Tried HTB qdisc only on ipsec0 (no qdisc on ppp0) -- CRASH!

3. Tried PRIO qdisc on ipsec0, HTB on ppp0 -- no crash

4. Tried CBQ qdisc on ipsec0, HTB on ppp0 -- no crash!

Therefore it seems the crash must be caused by a conflict when using
HTB to shape a Free/SWAN ipsec interface.  Interesting how CBQ, which
is more complex than HTB, does not have a problem with Free/SWAN.

That should help narrow the scope of the problem.

Comment 4 Trevor Cordes 2004-12-21 09:52:24 UTC
I'm now running FC3 on the servers in question and this bug does NOT
occur with HTB using NATIVE 2.6 ipsec (ie: setkey-based).  So, that
means the bug is specific to HTB only with freeS/WAN.  It *may* also
affect openS/WAN, now part of stock FC, but I have not tested that and
will leave it for someone else.

Now that I'm using FC3 and native ipsec, I won't be able to contribute
further information to this bug anymore.  However, I will still watch
it and answer any questions from memory as needed.

Note to ipsec/QoS users: native ipsec (using setkey) is __easy__ to
add QoS to!  I doubt if QoS on freeS/WAN would have ever worked as
desired... I tried everything!


Comment 5 Radek Vokál 2005-10-31 10:58:57 UTC
Closing as WONTFIX, according to comment #4 it appears to be gone in latest
release. 


Note You need to log in before you can comment on or make changes to this bug.