From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.1) Gecko/20031114 Description of problem: [Simultaneously posted to LARTC mailing list -- will keep both sides updated] I need to setup QoS on a linux router/firewall I maintain. I spent 10 hours reading everything I could find on QoS/HTB/iproute2 and came up with what I thought made sense for my situation. So I deployed it and BOOM! KERNEL PANIC! Not what I was expecting... now the debugging begins. I reproduced the panic twice on two different (yet almost identically configured) machines. I can reproduce the panic on demand by doing a specific set of actions. First, my setup: I have 2 machines at different locations connected via internet. Both machines are stock Fedora Core 1 kernel 2.4.22-1.2179.nptl. I run free/SWAN (stock FC binary rpm's) between the 2 machines for ipsec VPN. I run VoIP, VNC and all other inter-office traffic through the VPN. The internet connection is ADSL with 400kbits/s up and 1500 or so down. VoIP is routed but not MASQ'd. VNC is MASQ'd (neither the originating nor destination machines are the linux boxes themseleves). Second, my goals: Give a fixed minimum bandwidth and high priority to VoIP through VPN. Same, but less so, for VNC through VPN. Give the VPN high enough allocation for VoIP and VNC to get through ok. Less important little tweaks for rarely-used outside (non IPSEC) VNC and ssh access. My situation seems different from the examples I've seen because *I believe* I need to have 2 completely separate qdiscs, 1 for ppp0 (the DSL) and 1 for ipsec0 (the freeSWAN VPN). Yet ipsec0 eventually goes over ppp0 so they are intertwined. I have a funny feeling this is where the crash is coming from. See my setup script and panic screen dump (attached). Everything seemed to go great until I tried VNC'ing in from one office to the other. The VNC screen would pop up, do a first draw, then completely freeze. From that point on the remote linux router is frozen -- kernel panic. Strange that the bug would only trigger AFTER sending the 100-200kB of the initial VNC screen. Looking at my config, I will note a couple of questions I had while writing it that weren't answered in the docs I found: 1. The "tc filter add ... protocol ip" thing confused me. What exactly is the "protocol ip" for? I originally though that it should read "protocol 50" for the ipsec stuff, but that didn't seem to catch the packets, so I switched it back to "ip". Weird, while testing with it set to 50 (and having no packets match the rule) there were no crashes. 2. The iptables mangle rules will in the case of VNC and ssh *over VPN* match two rules. I *assume* the last executing MARK will overwrite the previous MARK. If for some reason the marks are ANDed or something, perhaps that is causing the crash (filtering 1 packet into 2 buckets?). 3. As I mentioned above, the fact that one qdisc will feed a separate qdisc, because ipsec0 eventually goes out over ppp0, may be a problem? I wish I had seen some examples of this type of setup. 4. I chose HTB instead of CBQ as it seemed simpler (always a good thing) and more suited to my exact needs. Not sure if the bug is in HTB itself or the general QoS stuff. Version-Release number of selected component (if applicable): iproute-2.4.7-13.2 How reproducible: Always Steps to Reproduce: 1. config QoS and iptables as per my setup script 2. run VNC over one router to the other 3. after VNC screen does initial draw the remote linux box panics Actual Results: kernel panic! Expected Results: everything should have run properly without a panic, and/or a nice error should have been logged instead of a panic. Additional info:
Created attachment 102804 [details] my setup script that causes the panic
Created attachment 102805 [details] manual entry of screen panic debug info
More details from more testing: 1. Tried HTB qdisc only on ppp0 (no qdisc on the ipsec0) -- no crash. 2. Tried HTB qdisc only on ipsec0 (no qdisc on ppp0) -- CRASH! 3. Tried PRIO qdisc on ipsec0, HTB on ppp0 -- no crash 4. Tried CBQ qdisc on ipsec0, HTB on ppp0 -- no crash! Therefore it seems the crash must be caused by a conflict when using HTB to shape a Free/SWAN ipsec interface. Interesting how CBQ, which is more complex than HTB, does not have a problem with Free/SWAN. That should help narrow the scope of the problem.
I'm now running FC3 on the servers in question and this bug does NOT occur with HTB using NATIVE 2.6 ipsec (ie: setkey-based). So, that means the bug is specific to HTB only with freeS/WAN. It *may* also affect openS/WAN, now part of stock FC, but I have not tested that and will leave it for someone else. Now that I'm using FC3 and native ipsec, I won't be able to contribute further information to this bug anymore. However, I will still watch it and answer any questions from memory as needed. Note to ipsec/QoS users: native ipsec (using setkey) is __easy__ to add QoS to! I doubt if QoS on freeS/WAN would have ever worked as desired... I tried everything!
Closing as WONTFIX, according to comment #4 it appears to be gone in latest release.