From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.1)
Description of problem:
[Simultaneously posted to LARTC mailing list -- will keep both sides
I need to setup QoS on a linux router/firewall I maintain. I spent 10
hours reading everything I could find on QoS/HTB/iproute2 and came up
with what I thought made sense for my situation. So I deployed it and
BOOM! KERNEL PANIC! Not what I was expecting... now the debugging
I reproduced the panic twice on two different (yet almost identically
configured) machines. I can reproduce the panic on demand by doing a
specific set of actions.
First, my setup:
I have 2 machines at different locations connected via internet. Both
machines are stock Fedora Core 1 kernel 2.4.22-1.2179.nptl. I run
free/SWAN (stock FC binary rpm's) between the 2 machines for ipsec
VPN. I run VoIP, VNC and all other inter-office traffic through the
VPN. The internet connection is ADSL with 400kbits/s up and 1500 or
so down. VoIP is routed but not MASQ'd. VNC is MASQ'd (neither the
originating nor destination machines are the linux boxes themseleves).
Second, my goals:
Give a fixed minimum bandwidth and high priority to VoIP through VPN.
Same, but less so, for VNC through VPN. Give the VPN high enough
allocation for VoIP and VNC to get through ok. Less important little
tweaks for rarely-used outside (non IPSEC) VNC and ssh access.
My situation seems different from the examples I've seen because *I
believe* I need to have 2 completely separate qdiscs, 1 for ppp0 (the
DSL) and 1 for ipsec0 (the freeSWAN VPN). Yet ipsec0 eventually goes
over ppp0 so they are intertwined. I have a funny feeling this is
where the crash is coming from.
See my setup script and panic screen dump (attached).
Everything seemed to go great until I tried VNC'ing in from one office
to the other. The VNC screen would pop up, do a first draw, then
completely freeze. From that point on the remote linux router is
frozen -- kernel panic. Strange that the bug would only trigger AFTER
sending the 100-200kB of the initial VNC screen.
Looking at my config, I will note a couple of questions I had while
writing it that weren't answered in the docs I found:
1. The "tc filter add ... protocol ip" thing confused me. What
exactly is the "protocol ip" for? I originally though that it should
read "protocol 50" for the ipsec stuff, but that didn't seem to catch
the packets, so I switched it back to "ip". Weird, while testing with
it set to 50 (and having no packets match the rule) there were no crashes.
2. The iptables mangle rules will in the case of VNC and ssh *over
VPN* match two rules. I *assume* the last executing MARK will
overwrite the previous MARK. If for some reason the marks are ANDed
or something, perhaps that is causing the crash (filtering 1 packet
into 2 buckets?).
3. As I mentioned above, the fact that one qdisc will feed a separate
qdisc, because ipsec0 eventually goes out over ppp0, may be a problem?
I wish I had seen some examples of this type of setup.
4. I chose HTB instead of CBQ as it seemed simpler (always a good
thing) and more suited to my exact needs. Not sure if the bug is in
HTB itself or the general QoS stuff.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. config QoS and iptables as per my setup script
2. run VNC over one router to the other
3. after VNC screen does initial draw the remote linux box panics
Actual Results: kernel panic!
Expected Results: everything should have run properly without a
panic, and/or a nice error should have been logged instead of a panic.
Created attachment 102804 [details]
my setup script that causes the panic
Created attachment 102805 [details]
manual entry of screen panic debug info
More details from more testing:
1. Tried HTB qdisc only on ppp0 (no qdisc on the ipsec0) -- no crash.
2. Tried HTB qdisc only on ipsec0 (no qdisc on ppp0) -- CRASH!
3. Tried PRIO qdisc on ipsec0, HTB on ppp0 -- no crash
4. Tried CBQ qdisc on ipsec0, HTB on ppp0 -- no crash!
Therefore it seems the crash must be caused by a conflict when using
HTB to shape a Free/SWAN ipsec interface. Interesting how CBQ, which
is more complex than HTB, does not have a problem with Free/SWAN.
That should help narrow the scope of the problem.
I'm now running FC3 on the servers in question and this bug does NOT
occur with HTB using NATIVE 2.6 ipsec (ie: setkey-based). So, that
means the bug is specific to HTB only with freeS/WAN. It *may* also
affect openS/WAN, now part of stock FC, but I have not tested that and
will leave it for someone else.
Now that I'm using FC3 and native ipsec, I won't be able to contribute
further information to this bug anymore. However, I will still watch
it and answer any questions from memory as needed.
Note to ipsec/QoS users: native ipsec (using setkey) is __easy__ to
add QoS to! I doubt if QoS on freeS/WAN would have ever worked as
desired... I tried everything!
Closing as WONTFIX, according to comment #4 it appears to be gone in latest