Description of problem:
After upgrading iptables to 1.2.8-8.72.3 as per latest errata when trying to do
a "/etc/init.d/iptables stop" the stop hangs.
Looks like the "modprobe -r ipt_REDIRECT" from the recursive rmmod_r procedure
loops as a "top" shows it using 90% CPU.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. /etc/init.d/iptables stop
Command shouls result in iptables rules flushed.
Running kernel-2.4.20-20.7 but same pblm occurs on kernel-2.4.20-19.7.
Can you send me the lsmod output, when the error occurs. Is the modprobe hanging
or the rmmod_r? Is the modprobe process id changing in this loop?
Info forwarded to email firstname.lastname@example.org. It is the modprobe that hangs
(loops) that the rmmod_r procedure kicks off. The process ID does not change.
Also, I can not kill this process.
Here is some information from the user:
[ 11:43am up 13:46, 1 user, load average: 0.71, 0.23, 0.07
111 processes: 108 sleeping, 3 running, 0 zombie, 0 stopped
CPU states: 0.9% user, 99.0% system, 0.0% nice, 0.0% idle
Mem: 29524K av, 29140K used, 384K free, 0K shrd, 6360K buff
Swap: 192740K av, 40668K used, 152072K free 10820K cached
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND
3451 root 20 0 728 728 412 R 96.2 2.4 1:08 modprobe
3453 root 19 0 1032 1032 792 R 3.2 3.4 0:00 top
3150 root 9 0 1168 900 772 S 0.3 3.0 0:07 sshd
1275 root 9 0 1148 704 532 S 0.1 2.3 0:11 nmbd
1 root 9 0 152 112 92 S 0.0 0.3 0:04 init
[root@p120 up2date]# lsmod
Module Size Used by Not tainted
nls_iso8859-1 3488 0 (autoclean)
binfmt_misc 7076 1
parport_pc 17316 1 (autoclean)
lp 8640 0 (autoclean)
parport 34112 1 (autoclean) [parport_pc lp]
autofs 11716 1 (autoclean)
eexpress 13888 0 (unused)
ne 7968 1
8390 8100 0 [ne]
natsemi 18432 1
ip_conntrack 0 0 (deleted)
ide-cd 32576 0 (autoclean)
cdrom 32224 0 (autoclean) [ide-cd]
ext3 66880 7
jbd 47020 7 [ext3]
[root@p120 up2date]# modprobe -r ip_conntrack
ip_conntrack: No such file or directory
ip_conntrack: No such file or directory
[root@p120 up2date]# insmod ip_conntrack
insmod: a module named ip_conntrack already exists
*** Bug 103573 has been marked as a duplicate of this bug. ***
I have also noticed this exact same problem. I am using a stateful iptables
firewall and stopping iptables causes modprobe to hang while removing one of the
conntrack modules. I have also noticed that ftp connection tracking does not
appear to be working anymore, maybe because the ip_conntrack_ftp module never
gets loaded, therefore blocking active ftp transfers, allowing only passive to
work. Actually, I am not sure if the ip_conntrack_ftp module was ever loaded
automatically when starting iptables, I might have been doing that myself manually.
ip_conntrack_ftp: With the new iptables package, you need to add it to the
IPTABLES_MODULES="" variable in the /etc/sysconfig/iptables-config file and
uncomment the line. Loading the module manually won't work anymore, because even
"service iptables start" unloads all modules.
See bug 103573 on how I work around the kernel bug. Would be interesting to know
whether that works also for the other reporters.
i see the hang on stop as well. in my case it seems to hang when trying to
remove ip_conntrack_ftp. high cpu usage and no ability to strace or kill -9 it.
i had been loading ip_nat_ftp and ip_conntrack_ftp via rc.local, but i'll try
the iptables-config method just in case that has any useful impact.
after looking at the iptables init script, i think that rmmod_r has a bug. it
looks like the mod=$1 line should be marked as local, otherwise the recursive
call is stepping on mod and that could be messing things up.
adding the local for mod didn't help much. i'm not sure if $i should be local
as well for the for loop. in any case i have just commented out the code to
unload the modules and now at least it will not hang.
I updated my iptables init script as per suggestion in bug 103573 and it
circumvents the problem for me, ie all mods are successfully unloaded and
reloaded by the script.
Please have a look at
The init script is updated with Michael Schwendt's patch.
In reply to comment 8:
That's not a problem, because for the recursive loop, $ref is evaluated only once.
But the global variable $ret is set to 0 upon every call of rmmod(). That means,
only the return value of the last call of rmmod() is taken into account and one
could drop some of the "let ret+=$?" in several places. If one renamed
rmmod_r()'s $ret, it would be important to check only the return value of the
two last calls of rmmod_r() in stop(). They are crucial.
I might have been a little hasty. After further testing (with Michael
Schwendt's init script patch) it appears the problem still exists. Just seemed
to work ok the first time. After that I tested a few more times and have had
the exact same problem. Unsure exactly why this is.
Good to know. Makes the problem even worse. The primary goal of the patch was to
fix "case 2" as reported in bug 103573. The patch would not fix any
kernel/modutils bug, of course, and would have an effect only for additional
modules listed in $IPTABLES_MODULES.
*** Bug 103943 has been marked as a duplicate of this bug. ***
When I replay the iptables-initscript manually I get lots of "Device or
resource busy" messages when I enter "modprobe -r ...". The corresponding
modules are not unloaded.
Is this normal?
Is this related?
Set up a masquerading router: eth1 intranet (e.g. 172.31.1.0/24), eth0 external
net. Connect another host at eth1 and establish on this host a passive ftp
connection to a ftp server in the external net. Restart firewall and it hangs
while unloading a netfilter module (ip_conntrack or other).
:PREROUTING ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A POSTROUTING -o eth0 -j MASQUERADE
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -i lo -j ACCEPT
-A INPUT -i eth1 -j ACCEPT
-A OUTPUT -s 172.31.1.0/24 -d 0/0 -j ACCEPT
The problem occurs on RH 7.3 to RH 9 with the newer kernel errata. Taroon
kernels seem not to have this problem.
The problem also occurs on RH 7.1. I am using this version and have a case
very similar to comment #17.
Please have a look at the following:
A similar problem is described en solved. In there are some suggestions for
changes in the netfilter kernel code. I have applied these changes in a custom
kernel (derived from RH standard) and now my problems are solved.
Maybe someone else can confirm?
*** Bug 99057 has been marked as a duplicate of this bug. ***
Will be merged in the next errata kernel. Thanks for chasing this one..
BTW, it seems that bug #102561 is duplicate of this one.
*** Bug 102561 has been marked as a duplicate of this bug. ***
Can anyone tell me what this error means ......
Sep 29 11:13:52 rg-hosting kernel: ------------[ cut here ]------------
Sep 29 11:13:52 rg-hosting kernel: kernel BUG at page_alloc.c:131!
Sep 29 11:13:52 rg-hosting kernel: invalid operand: 0000
Sep 29 11:13:52 rg-hosting kernel: iptable_filter iptable_mangle iptable_nat
ip_conntrack ip_tables binfmt_misc autofs tulip appletalk ipx ext3 jbd
Sep 29 11:13:52 rg-hosting kernel: CPU: 0
Sep 29 11:13:52 rg-hosting kernel: EIP: 0010:[__free_pages_ok+258/864]
Sep 29 11:13:52 rg-hosting kernel: EIP: 0010:[<c0134332>] Not tainted
Sep 29 11:13:52 rg-hosting kernel: EFLAGS: 00010202
Sep 29 11:13:52 rg-hosting kernel:
Sep 29 11:13:52 rg-hosting kernel: EIP is at __free_pages_ok [kernel] 0x102
Sep 29 11:13:52 rg-hosting kernel: eax: 0f229370 ebx: c10d5088 ecx:
00000000 edx: 00000000
Sep 29 11:13:52 rg-hosting kernel: esi: 00000040 edi: 00000000 ebp:
03cdd067 esp: df2a7ea0
Sep 29 11:13:52 rg-hosting kernel: ds: 0018 es: 0018 ss: 0018
Sep 29 11:13:52 rg-hosting kernel: Process named (pid: 835, stackpage=df2a7000)
Sep 29 11:13:52 rg-hosting kernel: Stack: c01359c0 c0345c80 c02dac48 c1038030
c02dae54 00000217 ffffffff 00001677
Sep 29 11:13:52 rg-hosting kernel: c10d5088 00000040 0006d000 03cdd067
c01251ca c10d5088 00044000 df229370
Sep 29 11:13:52 rg-hosting kernel: c0125960 dfc325c0 08044000 df229370
08000000 0000003b 00000000 08105000
Sep 29 11:13:52 rg-hosting kernel: Call Trace:
[remove_exclusive_swap_page+176/192] remove_exclusive_swap_page [kernel] 0xb0
Sep 29 11:13:52 rg-hosting kernel: Call Trace: [<c01359c0>]
remove_exclusive_swap_page [kernel] 0xb0 (0xdf2a7ea0))
Sep 29 11:13:52 rg-hosting kernel: [__free_pte+74/80] __free_pte [kernel] 0x4a
Sep 29 11:13:52 rg-hosting kernel: [<c01251ca>] __free_pte [kernel] 0x4a
Sep 29 11:13:52 rg-hosting kernel: [zap_page_range+544/768] zap_page_range
[kernel] 0x220 (0xdf2a7ee0))
Sep 29 11:13:52 rg-hosting kernel: [<c0125960>] zap_page_range [kernel] 0x220
Sep 29 11:13:52 rg-hosting kernel: [do_munmap+459/592] do_munmap [kernel] 0x1cb
Sep 29 11:13:52 rg-hosting kernel: [<c012816b>] do_munmap [kernel] 0x1cb
Sep 29 11:13:52 rg-hosting kernel: [path_release+15/48] path_release [kernel]
Sep 29 11:13:52 rg-hosting kernel: [<c0144f7f>] path_release [kernel] 0xf
Sep 29 11:13:52 rg-hosting kernel: [sys_brk+96/240] sys_brk [kernel] 0x60
Sep 29 11:13:52 rg-hosting kernel: [<c01272f0>] sys_brk [kernel] 0x60
Sep 29 11:13:52 rg-hosting kernel: [system_call+51/56] system_call [kernel]
Sep 29 11:13:52 rg-hosting kernel: [<c0108813>] system_call [kernel] 0x33
Sep 29 11:13:52 rg-hosting kernel:
Sep 29 11:13:52 rg-hosting kernel:
Sep 29 11:13:52 rg-hosting kernel: Code: 0f 0b 83 00 16 05 23 c0 b8 02 00 00 00
0f b3 43 18 b8 04 00
VM related oops. Please file a seperate bugzilla report for it.
*** Bug 106589 has been marked as a duplicate of this bug. ***
*** Bug 107105 has been marked as a duplicate of this bug. ***
This still seems to plague 2.4.22-1.2093.nptlsmp? At least I'm able to reproduce
modprobe/rmmod rather reliably.
% rpm -q kernel-smp iptables
*** Bug 105757 has been marked as a duplicate of this bug. ***
*** Bug 108113 has been marked as a duplicate of this bug. ***
Please release this errata soon!! My PowerEdge hangs on shutdown and
I have to smack the BRS (big red switch) and it has to rebuild the
raid array upon booting... Or is RH planning to wait this one out
until 12/31 so that RH doesn't have to fix it? Enquiring minds want
A quick workaround to this problem is putting these lines:
# work around stupid modprobe -r problem
/bin/rm -f /var/lock/subsys/iptables
in /etc/rc.local and running them once on the command line. It will
prevent stopping iptables (which shouldn't matter) on reboot/halt.
Just remember to remove the lockfile as well when doing any "service
According to comment #21 this bug would be fixed in the next errata
kernel. Well, on the first of this month an errata kernel was
Note that the fix for the bug is NOT included. What went wrong?
What went wrong? Red Hat's commitment to you getting value from your $60.
How about someone adding the patch to this bug as an attachment so
that those of us who grow weary of remembering to "remove the lockfile
before restarting iptables" can fix it ourselves and be done with it...
> Note that the fix for the bug is NOT included. What went wrong?
maybe, because RHSA-2003:392-00 was an emergency errata.
I hope to see a new errata, and _the last_, this month for 7.x kernel :-)
Created attachment 96322 [details]
fix for netfilter hang
I recompiled the previous errata kernel with this patch applied and it has been
Forgot to mention, I've been using the above patch with the errata
kernel for 8.0.
As an update to this bug: The kernel update with this fixed is in QA,
and will be available soon.
After installing the errata kernel the problem still exists.
Steps to reproduce:
Iptables is started with this included in iptables-config:
FreeSWan is also used
The Linux server is used as a router to the internet. I do a non-
passive ftp transfer.
Shutting down the server stalls at the point where iptables is
We are now passing the time limit where no more maintenance will be
done for these RH versions. This fix has been known for months now.
Will there be no good kernel in the end?
I checked the source code of the new errata kernel. The patch
mentioned in comment #36 has not been applied for RH 7.1. I thought I
was clear enough in comment #19 that the bug also existed in 7.1.
Has the patch been applied in other versions?
It is include in the Redhat 7.3 kernel update as
But it didn't fixed this bug.
That patch file is included, but not used in the spec-file.
Why has another kernel been released, which is probably going to be
the last in the 7.x series, and this known fix has still NOT been
applied? What kind of QA does RedHat have??? Do you not care about
RedHat-7.x anymore since it has reached its EOL? Don't you want to
release a kernel that does not loop forever when iptables is stopped
for all of your customers that use iptables?
For what it's worth and for the record:
Also in the brandnew linux-2.4.20-28.7 the bug has not been fixed.
Due to an oversight, Patch5040 isn't applied.
Adding a ..
to line 1090 or so and rebuilding from the SRPM will fix this.
I'll fix this for RHL9, but as RHL7/8 are now EOL, we won't be doing
further updates, sorry..
This is unacceptable! From what I read, your EOL statement says no
guarantee of support is made after 2003-12-31, that doesn't mean you
have to drop it completely, especially when you admit that your own
oversight left it out and you neglected to reply to this bug until
after your own self made EOL. You were the ones that waited until the
last second to release these updated kernels!
If you read the bug report and look at the appropriate patch then it
becomes obvious that this fix was made by RedHat on 2003-09-18! Three
kernel updates were released after that date (2.4.20-24.x, 2.4.20-27.x
& 2.4.20-28.x) but none of them included this fix. Also, several
followups were made to this bug report recently, including a RH
employee (You - Dave Jones) in mid-December claiming this fix would be
in the next kernel. Then after that kernel was released a few people
immediately followed up several days before the EOL saying this patch
was left out and you even release yet another kernel and left it out
Patch5040 will need more than a corresponding addition to the spec
file, because it doesn't apply unmodified.
Looks a bit like the ip_conntrack fix was appended to Patch 5040 which
is out-of-date or even obsolete. All the other diffs in it are from
2001 (the conntrack fix is at the bottom):
$ grep '+++' linux-2.4.1-netfilter-addons.patch
+++ linux/Documentation/Configure.help Mon Nov 5 21:42:00 2001
+++ linux/include/linux/netfilter_ipv4/ip_conntrack.h Fri Jun 1
+++ linux/include/linux/netfilter_ipv4/ip_conntrack_irc.h Sat
Apr 21 16:39:09 2001
+++ linux/include/linux/netfilter_ipv4/ip_nat_irc.h Sat Apr 21
+++ linux/net/ipv4/netfilter/Config.in Sat Apr 21 16:39:10 2001
+++ linux/net/ipv4/netfilter/Makefile Thu Apr 26 12:36:56 2001
+++ linux/net/ipv4/netfilter/ip_conntrack_ftp.c Sat Sep 29 10:40:34 2001
+++ linux/net/ipv4/netfilter/ip_conntrack_irc.c Sun Apr 22 13:10:48 2001
+++ linux/net/ipv4/netfilter/ip_nat_ftp.c Sat Sep 29 10:40:34 2001
+++ linux/net/ipv4/netfilter/ip_nat_irc.c Sun Apr 22 13:10:47 2001
2003-09-18 18:35:52.000000000 +0100
Delete all the hunks apart from the final one touching
the bottom patch is kinda munged.
Attached is a fixed one.
Created attachment 96775 [details]
patch corrects munging in patch 5040
Those are the kernels built on 7.3 - I only built i686 and athlon.
Not enough. The ip_conntrack_core patch doesn't fix it.
Odd, the patch in #36 is very similar to the one I applied that was at
the bottom of patch 5040. Not sure what the difference would be for it
The only meaningful difference that I can see between #36 and #52 is
the call to ip_conntrack_put. Maybe that was a mistake?
In reply to comment #56: The ip_conntrack_put line is also part of the
official netfilter fix for a bug that is believed to be this one.
I've reopened the corresponding netfilter bug report (#91) because I
still can reproduce this problem with Linux kernel 2.4.24 on rh73.
I can also reproduce it on rh73 with the most recent Fedora Core 1
kernel (which includes the fixed netfilter code) as well as the
previous RHEL 3 kernel 2.4.21-4.0.1.EL. I've transferred more
netfilter fixes into most recent rh73 kernel without fixing it either.
And I can reproduce it on rh9, but only after upgrading its stock
iptables package to at least the most recent one from rh80 (which
introduces Thomas Woerner's recursive "Unloading of modules" upon
"service iptables start").
Symptoms summary: lsmod shows "ip_conntrack (deleted)" as being the
last netfilter module in the list and ps output shows "modprobe -r
ip_conntrack_ftp" taking ~99% processor time.
Interestingly, I *cannot* reproduce it on Fedora Core 1.
Ideas anyone? Above have been success reports about the advertized
ip_conntrack_core patch. I wish I could confirm that the fix works.
Problems is 100% reproducible here, however.
I've encountered this with kernel 2.6 but it isn't 100% reproduceable.
I saw the call to ip_conntrack_put in the netfilter fix, but in that
patch the ip_conntrack_put call was there before the fix also (just
not inside the if(flag) statement). Have you tried the patch without
<Emily Litella>Never mind!</Emily Litella>
I tried building a kernel without the ip_conntrack_put() call and it
didn't work. I wonder what official netfilter patch inserted the
ip_conntrack_put call in the first place?
Double never mind. I didn't look high enough in the netfilter
bugzilla thread. You said there was an order in which you could
remove the modules that wouldn't cause a hang? What order is that?
Does it work around the hang even without the ip_conntrack_core.c patch?
Well, one of the module removal strategies is documented in 103573,
but unfortunately it is not sufficient for everyone.
Btw, 2.4.18-27.7.x also suffers from the ip_conntrack lock-up. And
since its netfilter code looks pretty much different in many places, I
think I won't go back further to find out whether any older kernel has
I have had total success by just commenting out the section that unloads the modules from the init script. the block that starts:
echo -n $"Unloading $IPTABLES modules: "
i haven't had any hangs on any of my systems with that fix.
*** Bug 107169 has been marked as a duplicate of this bug. ***
I have the same problem on Fedora Core 2(with the latest kernel
version, 2.4.7-rc3 smp). So, what can I do?
Sorin, take a look at bug #112630, I filed it a while ago for the 2.6
I've got exactly the same problem on a fedora 3 with iptables v 1.2.11-3.1, the
bug isn't fixed yet REOPEN