Bug 103177
Summary: | /etc/init.d/iptables stop hangs after upgrade to iptables-1.2.8-8.72.3 | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | Brendan Kelly <bkelly> | ||||||
Component: | kernel | Assignee: | Dave Jones <davej> | ||||||
Status: | CLOSED WONTFIX | QA Contact: | |||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 7.3 | CC: | anton.rops, anvil, bishop, bugs.michael, cefrodrigues, coen, jason-bz, joey, kajtzu, lartc, mark, me, nicolas.mailhot, nphilipp, pfrields, ronald, skarkkai-redhat-bugzilla, twoerner | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | i586 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2004-01-05 19:19:27 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 100644 | ||||||||
Attachments: |
|
Description
Brendan Kelly
2003-08-27 12:22:08 UTC
Can you send me the lsmod output, when the error occurs. Is the modprobe hanging or the rmmod_r? Is the modprobe process id changing in this loop? Info forwarded to email twoerner. It is the modprobe that hangs (loops) that the rmmod_r procedure kicks off. The process ID does not change. Also, I can not kill this process. Here is some information from the user: [ 11:43am up 13:46, 1 user, load average: 0.71, 0.23, 0.07 111 processes: 108 sleeping, 3 running, 0 zombie, 0 stopped CPU states: 0.9% user, 99.0% system, 0.0% nice, 0.0% idle Mem: 29524K av, 29140K used, 384K free, 0K shrd, 6360K buff Swap: 192740K av, 40668K used, 152072K free 10820K cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND 3451 root 20 0 728 728 412 R 96.2 2.4 1:08 modprobe 3453 root 19 0 1032 1032 792 R 3.2 3.4 0:00 top 3150 root 9 0 1168 900 772 S 0.3 3.0 0:07 sshd 1275 root 9 0 1148 704 532 S 0.1 2.3 0:11 nmbd 1 root 9 0 152 112 92 S 0.0 0.3 0:04 init [root@p120 up2date]# lsmod Module Size Used by Not tainted nls_iso8859-1 3488 0 (autoclean) binfmt_misc 7076 1 parport_pc 17316 1 (autoclean) lp 8640 0 (autoclean) parport 34112 1 (autoclean) [parport_pc lp] autofs 11716 1 (autoclean) eexpress 13888 0 (unused) ne 7968 1 8390 8100 0 [ne] natsemi 18432 1 ip_conntrack 0 0 (deleted) ide-cd 32576 0 (autoclean) cdrom 32224 0 (autoclean) [ide-cd] ext3 66880 7 jbd 47020 7 [ext3] [root@p120 up2date]# modprobe -r ip_conntrack ip_conntrack: No such file or directory ip_conntrack: No such file or directory [root@p120 up2date]# insmod ip_conntrack Using /lib/modules/2.4.20-20.7/kernel/net/ipv4/netfilter/ip_conntrack.o insmod: a module named ip_conntrack already exists *** Bug 103573 has been marked as a duplicate of this bug. *** I have also noticed this exact same problem. I am using a stateful iptables firewall and stopping iptables causes modprobe to hang while removing one of the conntrack modules. I have also noticed that ftp connection tracking does not appear to be working anymore, maybe because the ip_conntrack_ftp module never gets loaded, therefore blocking active ftp transfers, allowing only passive to work. Actually, I am not sure if the ip_conntrack_ftp module was ever loaded automatically when starting iptables, I might have been doing that myself manually. ~Jason ip_conntrack_ftp: With the new iptables package, you need to add it to the IPTABLES_MODULES="" variable in the /etc/sysconfig/iptables-config file and uncomment the line. Loading the module manually won't work anymore, because even "service iptables start" unloads all modules. See bug 103573 on how I work around the kernel bug. Would be interesting to know whether that works also for the other reporters. i see the hang on stop as well. in my case it seems to hang when trying to remove ip_conntrack_ftp. high cpu usage and no ability to strace or kill -9 it. i had been loading ip_nat_ftp and ip_conntrack_ftp via rc.local, but i'll try the iptables-config method just in case that has any useful impact. after looking at the iptables init script, i think that rmmod_r has a bug. it looks like the mod=$1 line should be marked as local, otherwise the recursive call is stepping on mod and that could be messing things up. adding the local for mod didn't help much. i'm not sure if $i should be local as well for the for loop. in any case i have just commented out the code to unload the modules and now at least it will not hang. I updated my iptables init script as per suggestion in bug 103573 and it circumvents the problem for me, ie all mods are successfully unloaded and reloaded by the script. Please have a look at http://people.redhat.com/twoerner/RPMS/7.x/iptables-1.2.8-8.72.4.i386.rpm http://people.redhat.com/twoerner/RPMS/7.x/iptables-ipv6-1.2.8-8.72.4.i386.rpm http://people.redhat.com/twoerner/SRPMS/7.x/iptables-1.2.8-8.72.4.src.rpm The init script is updated with Michael Schwendt's patch. In reply to comment 8: That's not a problem, because for the recursive loop, $ref is evaluated only once. But the global variable $ret is set to 0 upon every call of rmmod(). That means, only the return value of the last call of rmmod() is taken into account and one could drop some of the "let ret+=$?" in several places. If one renamed rmmod_r()'s $ret, it would be important to check only the return value of the two last calls of rmmod_r() in stop(). They are crucial. I might have been a little hasty. After further testing (with Michael Schwendt's init script patch) it appears the problem still exists. Just seemed to work ok the first time. After that I tested a few more times and have had the exact same problem. Unsure exactly why this is. Good to know. Makes the problem even worse. The primary goal of the patch was to fix "case 2" as reported in bug 103573. The patch would not fix any kernel/modutils bug, of course, and would have an effect only for additional modules listed in $IPTABLES_MODULES. *** Bug 103943 has been marked as a duplicate of this bug. *** When I replay the iptables-initscript manually I get lots of "Device or resource busy" messages when I enter "modprobe -r ...". The corresponding modules are not unloaded. Is this normal? Is this related? Test-case: Set up a masquerading router: eth1 intranet (e.g. 172.31.1.0/24), eth0 external net. Connect another host at eth1 and establish on this host a passive ftp connection to a ftp server in the external net. Restart firewall and it hangs while unloading a netfilter module (ip_conntrack or other). /etc/sysconfig/iptables: *nat :PREROUTING ACCEPT [0:0] :POSTROUTING ACCEPT [0:0] :OUTPUT ACCEPT [0:0] -A POSTROUTING -o eth0 -j MASQUERADE COMMIT *filter :INPUT ACCEPT [0:0] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [0:0] -A INPUT -i lo -j ACCEPT -A INPUT -i eth1 -j ACCEPT -A OUTPUT -s 172.31.1.0/24 -d 0/0 -j ACCEPT COMMIT /etc/sysconfig/iptables-config: IPTABLES_MODULES="ip_conntrack_ftp ip_nat_ftp" The problem occurs on RH 7.3 to RH 9 with the newer kernel errata. Taroon kernels seem not to have this problem. The problem also occurs on RH 7.1. I am using this version and have a case very similar to comment #17. Please have a look at the following: https://bugzilla.netfilter.org/cgi-bin/bugzilla/show_bug.cgi?id=91 A similar problem is described en solved. In there are some suggestions for changes in the netfilter kernel code. I have applied these changes in a custom kernel (derived from RH standard) and now my problems are solved. Maybe someone else can confirm? *** Bug 99057 has been marked as a duplicate of this bug. *** Will be merged in the next errata kernel. Thanks for chasing this one.. BTW, it seems that bug #102561 is duplicate of this one. *** Bug 102561 has been marked as a duplicate of this bug. *** Can anyone tell me what this error means ...... Sep 29 11:13:52 rg-hosting kernel: ------------[ cut here ]------------ Sep 29 11:13:52 rg-hosting kernel: kernel BUG at page_alloc.c:131! Sep 29 11:13:52 rg-hosting kernel: invalid operand: 0000 Sep 29 11:13:52 rg-hosting kernel: iptable_filter iptable_mangle iptable_nat ip_conntrack ip_tables binfmt_misc autofs tulip appletalk ipx ext3 jbd Sep 29 11:13:52 rg-hosting kernel: CPU: 0 Sep 29 11:13:52 rg-hosting kernel: EIP: 0010:[__free_pages_ok+258/864] Not tainted Sep 29 11:13:52 rg-hosting kernel: EIP: 0010:[<c0134332>] Not tainted Sep 29 11:13:52 rg-hosting kernel: EFLAGS: 00010202 Sep 29 11:13:52 rg-hosting kernel: Sep 29 11:13:52 rg-hosting kernel: EIP is at __free_pages_ok [kernel] 0x102 (2.4.20-20.7) Sep 29 11:13:52 rg-hosting kernel: eax: 0f229370 ebx: c10d5088 ecx: 00000000 edx: 00000000 Sep 29 11:13:52 rg-hosting kernel: esi: 00000040 edi: 00000000 ebp: 03cdd067 esp: df2a7ea0 Sep 29 11:13:52 rg-hosting kernel: ds: 0018 es: 0018 ss: 0018 Sep 29 11:13:52 rg-hosting kernel: Process named (pid: 835, stackpage=df2a7000) Sep 29 11:13:52 rg-hosting kernel: Stack: c01359c0 c0345c80 c02dac48 c1038030 c02dae54 00000217 ffffffff 00001677 Sep 29 11:13:52 rg-hosting kernel: c10d5088 00000040 0006d000 03cdd067 c01251ca c10d5088 00044000 df229370 Sep 29 11:13:52 rg-hosting kernel: c0125960 dfc325c0 08044000 df229370 08000000 0000003b 00000000 08105000 Sep 29 11:13:52 rg-hosting kernel: Call Trace: [remove_exclusive_swap_page+176/192] remove_exclusive_swap_page [kernel] 0xb0 (0xdf2a7ea0)) Sep 29 11:13:52 rg-hosting kernel: Call Trace: [<c01359c0>] remove_exclusive_swap_page [kernel] 0xb0 (0xdf2a7ea0)) Sep 29 11:13:52 rg-hosting kernel: [__free_pte+74/80] __free_pte [kernel] 0x4a (0xdf2a7ed0)) Sep 29 11:13:52 rg-hosting kernel: [<c01251ca>] __free_pte [kernel] 0x4a (0xdf2a7ed0)) Sep 29 11:13:52 rg-hosting kernel: [zap_page_range+544/768] zap_page_range [kernel] 0x220 (0xdf2a7ee0)) Sep 29 11:13:52 rg-hosting kernel: [<c0125960>] zap_page_range [kernel] 0x220 (0xdf2a7ee0)) Sep 29 11:13:52 rg-hosting kernel: [do_munmap+459/592] do_munmap [kernel] 0x1cb (0xdf2a7f50)) Sep 29 11:13:52 rg-hosting kernel: [<c012816b>] do_munmap [kernel] 0x1cb (0xdf2a7f50)) Sep 29 11:13:52 rg-hosting kernel: [path_release+15/48] path_release [kernel] 0xf (0xdf2a7f70)) Sep 29 11:13:52 rg-hosting kernel: [<c0144f7f>] path_release [kernel] 0xf (0xdf2a7f70)) Sep 29 11:13:52 rg-hosting kernel: [sys_brk+96/240] sys_brk [kernel] 0x60 (0xdf2a7f90)) Sep 29 11:13:52 rg-hosting kernel: [<c01272f0>] sys_brk [kernel] 0x60 (0xdf2a7f90)) Sep 29 11:13:52 rg-hosting kernel: [system_call+51/56] system_call [kernel] 0x33 (0xdf2a7fc0)) Sep 29 11:13:52 rg-hosting kernel: [<c0108813>] system_call [kernel] 0x33 (0xdf2a7fc0)) Sep 29 11:13:52 rg-hosting kernel: Sep 29 11:13:52 rg-hosting kernel: Sep 29 11:13:52 rg-hosting kernel: Code: 0f 0b 83 00 16 05 23 c0 b8 02 00 00 00 0f b3 43 18 b8 04 00 VM related oops. Please file a seperate bugzilla report for it. *** Bug 106589 has been marked as a duplicate of this bug. *** *** Bug 107105 has been marked as a duplicate of this bug. *** This still seems to plague 2.4.22-1.2093.nptlsmp? At least I'm able to reproduce modprobe/rmmod rather reliably. % rpm -q kernel-smp iptables kernel-smp-2.4.22-1.2093.nptl iptables-1.2.8-12.1 *** Bug 105757 has been marked as a duplicate of this bug. *** *** Bug 108113 has been marked as a duplicate of this bug. *** Please release this errata soon!! My PowerEdge hangs on shutdown and I have to smack the BRS (big red switch) and it has to rebuild the raid array upon booting... Or is RH planning to wait this one out until 12/31 so that RH doesn't have to fix it? Enquiring minds want to know! A quick workaround to this problem is putting these lines: # work around stupid modprobe -r problem /bin/rm -f /var/lock/subsys/iptables in /etc/rc.local and running them once on the command line. It will prevent stopping iptables (which shouldn't matter) on reboot/halt. Just remember to remove the lockfile as well when doing any "service iptables (re)start"s. According to comment #21 this bug would be fixed in the next errata kernel. Well, on the first of this month an errata kernel was released: RHSA-2003:392-00. Note that the fix for the bug is NOT included. What went wrong? What went wrong? Red Hat's commitment to you getting value from your $60. How about someone adding the patch to this bug as an attachment so that those of us who grow weary of remembering to "remove the lockfile before restarting iptables" can fix it ourselves and be done with it... > Note that the fix for the bug is NOT included. What went wrong?
maybe, because RHSA-2003:392-00 was an emergency errata.
I hope to see a new errata, and _the last_, this month for 7.x kernel :-)
Created attachment 96322 [details]
fix for netfilter hang
I recompiled the previous errata kernel with this patch applied and it has been
working.
Forgot to mention, I've been using the above patch with the errata kernel for 8.0. As an update to this bug: The kernel update with this fixed is in QA, and will be available soon. After installing the errata kernel the problem still exists. Version: kernel-2.4.20-27.7 Steps to reproduce: Iptables is started with this included in iptables-config: IPTABLES_MODULES="ip_conntrack_ftp ip_nat_ftp" FreeSWan is also used The Linux server is used as a router to the internet. I do a non- passive ftp transfer. Shutting down the server stalls at the point where iptables is stopped. We are now passing the time limit where no more maintenance will be done for these RH versions. This fix has been known for months now. Will there be no good kernel in the end? I checked the source code of the new errata kernel. The patch mentioned in comment #36 has not been applied for RH 7.1. I thought I was clear enough in comment #19 that the bug also existed in 7.1. Has the patch been applied in other versions? It is include in the Redhat 7.3 kernel update as 'linux-2.4.1-netfilter-addons.patch' But it didn't fixed this bug. That patch file is included, but not used in the spec-file. Why has another kernel been released, which is probably going to be the last in the 7.x series, and this known fix has still NOT been applied? What kind of QA does RedHat have??? Do you not care about RedHat-7.x anymore since it has reached its EOL? Don't you want to release a kernel that does not loop forever when iptables is stopped for all of your customers that use iptables? For what it's worth and for the record: Also in the brandnew linux-2.4.20-28.7 the bug has not been fixed. Due to an oversight, Patch5040 isn't applied. Adding a .. %patch5040 -p1 to line 1090 or so and rebuilding from the SRPM will fix this. I'll fix this for RHL9, but as RHL7/8 are now EOL, we won't be doing further updates, sorry.. This is unacceptable! From what I read, your EOL statement says no guarantee of support is made after 2003-12-31, that doesn't mean you have to drop it completely, especially when you admit that your own oversight left it out and you neglected to reply to this bug until after your own self made EOL. You were the ones that waited until the last second to release these updated kernels! If you read the bug report and look at the appropriate patch then it becomes obvious that this fix was made by RedHat on 2003-09-18! Three kernel updates were released after that date (2.4.20-24.x, 2.4.20-27.x & 2.4.20-28.x) but none of them included this fix. Also, several followups were made to this bug report recently, including a RH employee (You - Dave Jones) in mid-December claiming this fix would be in the next kernel. Then after that kernel was released a few people immediately followed up several days before the EOL saying this patch was left out and you even release yet another kernel and left it out yet again!!! Patch5040 will need more than a corresponding addition to the spec file, because it doesn't apply unmodified. Looks a bit like the ip_conntrack fix was appended to Patch 5040 which is out-of-date or even obsolete. All the other diffs in it are from 2001 (the conntrack fix is at the bottom): $ grep '+++' linux-2.4.1-netfilter-addons.patch +++ linux/Documentation/Configure.help Mon Nov 5 21:42:00 2001 +++ linux/include/linux/netfilter_ipv4/ip_conntrack.h Fri Jun 1 15:15:32 2001 +++ linux/include/linux/netfilter_ipv4/ip_conntrack_irc.h Sat Apr 21 16:39:09 2001 +++ linux/include/linux/netfilter_ipv4/ip_nat_irc.h Sat Apr 21 16:39:09 2001 +++ linux/net/ipv4/netfilter/Config.in Sat Apr 21 16:39:10 2001 +++ linux/net/ipv4/netfilter/Makefile Thu Apr 26 12:36:56 2001 +++ linux/net/ipv4/netfilter/ip_conntrack_ftp.c Sat Sep 29 10:40:34 2001 +++ linux/net/ipv4/netfilter/ip_conntrack_irc.c Sun Apr 22 13:10:48 2001 +++ linux/net/ipv4/netfilter/ip_nat_ftp.c Sat Sep 29 10:40:34 2001 +++ linux/net/ipv4/netfilter/ip_nat_irc.c Sun Apr 22 13:10:47 2001 +++ linux-2.4.20-dj/net/ipv4/netfilter/ip_conntrack_core.c 2003-09-18 18:35:52.000000000 +0100 Delete all the hunks apart from the final one touching ip_conntrack_core.c the bottom patch is kinda munged. Attached is a fixed one. Created attachment 96775 [details]
patch corrects munging in patch 5040
http://linux.duke.edu/~skvidal/RPMS/kernel/ Those are the kernels built on 7.3 - I only built i686 and athlon. Not enough. The ip_conntrack_core patch doesn't fix it. Odd, the patch in #36 is very similar to the one I applied that was at the bottom of patch 5040. Not sure what the difference would be for it functioning. The only meaningful difference that I can see between #36 and #52 is the call to ip_conntrack_put. Maybe that was a mistake? In reply to comment #56: The ip_conntrack_put line is also part of the official netfilter fix for a bug that is believed to be this one. ======================= I've reopened the corresponding netfilter bug report (#91) because I still can reproduce this problem with Linux kernel 2.4.24 on rh73. I can also reproduce it on rh73 with the most recent Fedora Core 1 kernel (which includes the fixed netfilter code) as well as the previous RHEL 3 kernel 2.4.21-4.0.1.EL. I've transferred more netfilter fixes into most recent rh73 kernel without fixing it either. And I can reproduce it on rh9, but only after upgrading its stock iptables package to at least the most recent one from rh80 (which introduces Thomas Woerner's recursive "Unloading of modules" upon "service iptables start"). =============== Symptoms summary: lsmod shows "ip_conntrack (deleted)" as being the last netfilter module in the list and ps output shows "modprobe -r ip_conntrack_ftp" taking ~99% processor time. =============== Interestingly, I *cannot* reproduce it on Fedora Core 1. Ideas anyone? Above have been success reports about the advertized ip_conntrack_core patch. I wish I could confirm that the fix works. Problems is 100% reproducible here, however. I've encountered this with kernel 2.6 but it isn't 100% reproduceable. (bug #112630) Re: #57 I saw the call to ip_conntrack_put in the netfilter fix, but in that patch the ip_conntrack_put call was there before the fix also (just not inside the if(flag) statement). Have you tried the patch without ip_conntrack_put? <Emily Litella>Never mind!</Emily Litella> I tried building a kernel without the ip_conntrack_put() call and it didn't work. I wonder what official netfilter patch inserted the ip_conntrack_put call in the first place? Double never mind. I didn't look high enough in the netfilter bugzilla thread. You said there was an order in which you could remove the modules that wouldn't cause a hang? What order is that? Does it work around the hang even without the ip_conntrack_core.c patch? Well, one of the module removal strategies is documented in 103573, but unfortunately it is not sufficient for everyone. [...] Btw, 2.4.18-27.7.x also suffers from the ip_conntrack lock-up. And since its netfilter code looks pretty much different in many places, I think I won't go back further to find out whether any older kernel has worked flawlessly. I have had total success by just commenting out the section that unloads the modules from the init script. the block that starts: echo -n $"Unloading $IPTABLES modules: " i haven't had any hangs on any of my systems with that fix. *** Bug 107169 has been marked as a duplicate of this bug. *** I have the same problem on Fedora Core 2(with the latest kernel version, 2.4.7-rc3 smp). So, what can I do? Sorin, take a look at bug #112630, I filed it a while ago for the 2.6 kernels. I've got exactly the same problem on a fedora 3 with iptables v 1.2.11-3.1, the bug isn't fixed yet REOPEN REOPEN |