Bug 456664 - Kernel panic when unloading ip conntrack modules
Summary: Kernel panic when unloading ip conntrack modules
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.7
Hardware: i386
OS: Linux
urgent
urgent
Target Milestone: rc
: ---
Assignee: Neil Horman
QA Contact: Martin Jenner
URL:
Whiteboard:
: 460967 466345 467156 480033 (view as bug list)
Depends On:
Blocks: 461297 477147
TreeView+ depends on / blocked
 
Reported: 2008-07-25 13:20 UTC by Graeme Fowler
Modified: 2018-10-20 03:11 UTC (History)
25 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-05-18 19:09:50 UTC


Attachments (Terms of Use)
patch to be more restrictive in matching ops to delete (535 bytes, patch)
2008-08-11 18:10 UTC, Neil Horman
no flags Details | Diff
Testcase (493 bytes, text/plain)
2009-02-11 12:51 UTC, Jan Tluka
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2009:1024 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 4.8 kernel security and bug fix update 2009-05-18 14:57:26 UTC

Description Graeme Fowler 2008-07-25 13:20:32 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1

Description of problem:
Updated system to Red Hat Enterprise Linux AS release 4 (Nahant Update 7) this morning including kernels:

kernel-smp-2.6.9-78.EL
kernel-hugemem-2.6.9-78.EL

Machine booted into -smp kernel; changed to -hugemem version by editing grub.conf. Remotely rebooted but didn't come back up.

Console showed oops, server panicked and halted.



Version-Release number of selected component (if applicable):
kernel-hugemem-2.6.9-78.EL

How reproducible:
Always


Steps to Reproduce:
1. Boot into kernel family 2.6.9-78.EL (SMP or hugemem).
2. Allow to settle
3. Initiate reboot or "service iptables stop"

Actual Results:
Oops, panic, halt

Expected Results:
Graceful shutdown and restart or successful stop of iptables (including unload of modules).

Additional info:
After several tests we've switched panic_on_oops to 0 and caught the following:

Jul 25 14:06:16 muttley kernel: Unable to handle kernel paging request at virtua
l address f91696c8
Jul 25 14:06:16 muttley kernel:  printing eip:
Jul 25 14:06:16 muttley kernel: 0228d80e
Jul 25 14:06:16 muttley kernel: *pde = 00000000
Jul 25 14:06:16 muttley kernel: Oops: 0000 [#1]
Jul 25 14:06:16 muttley kernel: SMP 
Jul 25 14:06:16 muttley kernel: Modules linked in: sg ip_queue md5 ipv6 autofs4 
i2c_dev i2c_core nfs lockd nfs_acl ip_vs sunrpc ip_tables cpufreq_powersave dm_m
irror dm_mod button battery ac hw_random tg3 floppy ata_piix libata ext3 jbd cci
ss sd_mod scsi_mod
Jul 25 14:06:16 muttley kernel: CPU:    0
Jul 25 14:06:16 muttley kernel: EIP:    0060:[<0228d80e>]    Not tainted VLI
Jul 25 14:06:16 muttley kernel: EFLAGS: 00010292   (2.6.9-78.ELhugemem) 
Jul 25 14:06:16 muttley kernel: EIP is at nf_unregister_sockopt+0x48/0x83
Jul 25 14:06:16 muttley kernel: eax: 00000002   ebx: 02355784   ecx: e12b24a0   
edx: f91696c0
Jul 25 14:06:16 muttley kernel: esi: f88932c0   edi: 00000000   ebp: e0156000   
esp: e0156f5c
Jul 25 14:06:16 muttley kernel: ds: 007b   es: 007b   ss: 0068
Jul 25 14:06:16 muttley kernel: Process modprobe (pid: 9389, threadinfo=e0156000
 task=e029f930)
Jul 25 14:06:16 muttley kernel: Stack: 00000000 0232eba8 f8891555 f8893400 02136
15a 00000000 745f7069 656c6261 
Jul 25 14:06:16 muttley kernel:        e08c0073 02150c3e e08cf6c4 e0242ac4 02150
f9b f6fa9000 f6faa000 f6faa000 
Jul 25 14:06:16 muttley kernel:        f6faa000 e025ad84 e08cf680 e08cf6b0 00000
000 e0156000 e0156fc4 00000000 
Jul 25 14:06:16 muttley kernel: Call Trace:
Jul 25 14:06:16 muttley iptables:  failed
Jul 25 14:06:16 muttley kernel:  [<f8891555>] fini+0xd/0x35 [ip_tables]
Jul 25 14:06:16 muttley kernel:  [<0213615a>] sys_delete_module+0x13b/0x184
Jul 25 14:06:16 muttley kernel:  [<02150c3e>] unmap_vma_list+0xe/0x17
Jul 25 14:06:16 muttley kernel:  [<02150f9b>] do_munmap+0x135/0x143
Jul 25 14:06:16 muttley kernel: Code: 04 00 89 d9 f0 ff 0d 84 57 35 02 0f 88 93 
0c 00 00 8b 0d 9c 57 35 02 8b 01 0f 18 00 90 81 f9 9c 57 35 02 74 2f 8b 51 08 8b
 46 08 <39> 42 08 8b 11 75 1e 8b 41 04 89 42 04 89 10 89 c8 c7 01 00 01

Comment 1 Graeme Fowler 2008-07-25 13:24:36 UTC
Modules are removed from list after panic:

[root@muttley ~]# lsmod
Module                  Size  Used by
sg                     37345  0 
ip_queue               14553  0 
md5                     8129  1 
ipv6                  241121  18 
autofs4                25669  0 
i2c_dev                13377  0 
i2c_core               26305  1 i2c_dev
nfs                   222249  0 
lockd                  66249  1 nfs
nfs_acl                 7745  1 nfs
ip_vs                  83649  0 
sunrpc                143525  4 nfs,lockd,nfs_acl
ip_tables              22337  0 
cpufreq_powersave       5953  0 
dm_mirror              32581  0 
dm_mod                 67049  1 dm_mirror
button                 10705  0 
battery                12741  0 
ac                      8901  0 
hw_random               9429  0 
tg3                   111045  0 
floppy                 57553  0 
ata_piix               19781  0 
libata                105629  1 ata_piix
ext3                  119497  6 
jbd                    59865  1 ext3
cciss                  68013  16 
sd_mod                 20417  0 
scsi_mod              119757  4 sg,libata,cciss,sd_mod

Trying to modprobe returns silently without loading:

[root@muttley ~]# modprobe -v ip_tables
[root@muttley ~]# 

...and trying to rmmod returns:

[root@muttley ~]# modprobe -r ip_tables
FATAL: Error removing ip_tables
(/lib/modules/2.6.9-78.ELhugemem/kernel/net/ipv4/netfilter/ip_tables.ko): Device
or resource busy


Backing out to kernel-hugemem-2.6.9-67.0.22.EL gives expected behaviour.

Comment 2 Graeme Fowler 2008-07-25 13:26:28 UTC
SELinux is disabled here, by the way.

Comment 3 Graeme Fowler 2008-07-25 13:27:40 UTC
Running strace on "iptables -L -n" hangs at the following point:

old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0xf6f8b000
mprotect(0x8a3000, 8192, PROT_READ)     = 0
mprotect(0xc55000, 4096, PROT_READ)     = 0
mprotect(0x5a4000, 4096, PROT_READ)     = 0
set_thread_area({entry_number:-1 -> 6, base_addr:0xf6f8b6c0, limit:1048575,
seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0,
useable:1}) = 0
munmap(0xf6f8d000, 36429)               = 0
socket(PF_INET, SOCK_RAW, IPPROTO_RAW)  = 3
getsockopt(3, SOL_IP, 0x40 /* IP_??? */

Process can, however, be interrupted with CTRL-C

Comment 4 Randolph Faux 2008-08-01 00:35:35 UTC
Same behavior here. Running system (kernel-smp-2.6.9-78.EL). This issue seems to
occur for us only when ip_conntrack_* modules in addition to ip_conntrack are
loaded. When only ip_conntrack is loaded the /sbin/service iptables appears to
work as expected.  

Comment 5 Graeme Fowler 2008-08-06 16:20:11 UTC
Exactly the same issue manifests itself in kernel family 2.6.9-78.0.1.EL - the machine oopses when unloading iptables modules.

Comment 6 Jimmy Cho 2008-08-08 08:13:56 UTC
Have experienced a similar kernel panic  immediately after  the iptables script was restarted. From the log below, it is clearly indicated that the ip_conntrack module was being unload when the kernel oops occurs.  

This panic occurred with a  2.6.9-78.0.1  smp  kernel.  A kernel panic was also observed on another system with the same kernel.

==============================================================================
Aug  8 10:10:56 VENUS1 iptables:  succeeded
Aug  8 10:10:56 VENUS1 iptables:  succeeded
Aug  8 10:10:56 VENUS1 kernel: Unable to handle kernel paging request at virtual address f8a7f5e8
Aug  8 10:10:56 VENUS1 kernel:  printing eip:
Aug  8 10:10:56 VENUS1 kernel: c0294b1a
Aug  8 10:10:56 VENUS1 kernel: *pde = 00000000
Aug  8 10:10:56 VENUS1 kernel: Oops: 0000 [#1]
Aug  8 10:10:56 VENUS1 kernel: SMP 
Aug  8 10:10:56 VENUS1 kernel: Modules linked in: loop md5 ipv6 i2c_dev i2c_core ip_conntrack cpufreq_powersave joydev button battery ac uhci_hcd ehci_hcd i5000_edac edac_mc hw_random bnx2 ata_piix libata dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod megaraid_sas sd_mod scsi_mod
Aug  8 10:10:56 VENUS1 kernel: CPU:    0
Aug  8 10:10:56 VENUS1 kernel: EIP:    0060:[<c0294b1a>]    Not tainted VLI
Aug  8 10:10:56 VENUS1 kernel: EFLAGS: 00010216   (2.6.9-78.0.1.ELsmp) 
Aug  8 10:10:56 VENUS1 kernel: EIP is at nf_unregister_sockopt+0x48/0x83
Aug  8 10:10:56 VENUS1 kernel: eax: 00000002   ebx: c035e784   ecx: f7125c80   edx: f8a7f5e0
Aug  8 10:10:56 VENUS1 kernel: esi: f8aa97e0   edi: 00000000   ebp: ccace000   esp: ccacef5c
Aug  8 10:10:56 VENUS1 kernel: ds: 007b   es: 007b   ss: 0068
Aug  8 10:10:56 VENUS1 kernel: Process modprobe (pid: 6314, threadinfo=ccace000 task=f5d1e130)
Aug  8 10:10:56 VENUS1 kernel: Stack: 00000000 c0337ba8 f8aa0769 f8aa9c80 c01373ce 00000000 635f7069 746e6e6f
Aug  8 10:10:56 VENUS1 kernel:        6b636172 c0151e00 c39753c4 f7382964 c01521f2 b7fb4000 b7fb5000 b7fb5000
Aug  8 10:10:56 VENUS1 kernel:        b7fb5000 f6dd2c24 c3975380 c39753b0 00000000 ccace000 089a5840 00000000
Aug  8 10:10:56 VENUS1 kernel: Call Trace:
Aug  8 10:10:56 VENUS1 kernel:  [<f8aa0769>] init_or_cleanup+0x1e6/0x1ea [ip_conntrack]
Aug  8 10:10:56 VENUS1 kernel:  [<c01373ce>] sys_delete_module+0x13b/0x184
Aug  8 10:10:56 VENUS1 kernel:  [<c0151e00>] free_pgtables+0x12/0x7b
Aug  8 10:10:56 VENUS1 kernel:  [<c01521f2>] do_munmap+0x108/0x116
Aug  8 10:10:56 VENUS1 kernel:  [<c02e09d7>] syscall_call+0x7/0xb
Aug  8 10:10:56 VENUS1 kernel: Code: 04 00 89 d9 f0 ff 0d 84 e7 35 c0 0f 88 93 0c 00 00 8b 0d 9c e7 35 c0 8b 01 0f 18 00 90 81 f9 9c e7 35 c0 74 2f 8b 51 08 8b 46 08 <39> 42 08 8b 11 75 1e 8b 41 04 89 42 04 89 10 89 c8 c7 01 00 01
Aug  8 10:10:56 VENUS1 kernel:  <0>Fatal exception: panic in 5 seconds

Comment 7 Nigel Jewell 2008-08-08 18:53:28 UTC
Similarly I was finding that every time I tried to shutdown or restart one of our RHEL machines that it was panicking in ip_conntrack.

Removing some iptables modules from /etc/sysconfig/iptables-config has proved to be a temporary fix:

IPTABLES_MODULES="ip_conntrack_ftp ip_nat_ftp ip_conntrack_irc ip_nat_irc"

to

IPTABLES_MODULES=""

Comment 8 Neil Horman 2008-08-11 18:10:21 UTC
Created attachment 313991 [details]
patch to be more restrictive in matching ops to delete

Hey, I think I see the problem.  It appears that on unregister, we assume we've found the right entry to remove if wrapper->ops->pf matchesx the passed in reg->pf value.  Given that if you have ip_conntrack and ip_contrack_* loaded you probably have several ops ranges registered against the same protocol family.  As such when you unregister, you remove the first matching pf entry that you come to, which (unless you unload in the reverse order that you loaded in), will lead to NULL list pointers that you don't expect).  I've not tested it yet, but I've attached a patch above which should be more restrictive on what entry we delete, matching on pointer values (since nf_sockopt_register_owner assigns wrapper->ops to the passed in reg value).  This should remove the specific entry we are searching for.  If someone could give it a test and let me know the results, I'd appreciate it.  Thanks!

Comment 9 Graeme Fowler 2008-08-22 10:32:28 UTC
Unfortunately I'm on leave at the moment and won't be back in a position to test this until the first week in September...

Comment 10 Neil Horman 2008-08-22 14:17:31 UTC
Ok, let me know how this works when you get back

Comment 14 Fabio Olive Leite 2008-08-26 14:16:52 UTC
Hi Neil,

We have internal confirmation that your patch fixes the problem:

-----8<-----8<-----
Hello Fabio,

<snip>
a) reproduce the problem with unpatched kernels;
</snip>

Yes I am able to reproduce the issue with unpatched kernels every time.

<snip>
b) demonstrate Neil's patch fixes the issue by loading and unloading all ip_conntrack_* modules several times?
</snip>

When I am using patched kernel - 

# uname -a
Linux dhcp6-182.XXX.com 2.6.9-78.0.1.TEST.ELsmp #1 SMP Fri Aug 22 07:19:19 EDT 2008 i686 i686 i386 GNU/Linux


# grep IPTABLES_MODULES /etc/sysconfig/iptables-config
IPTABLES_MODULES="ip_conntrack_ftp ip_nat_ftp ip_conntrack_irc ip_nat_irc"
IPTABLES_MODULES_UNLOAD="yes"


# /etc/init.d/iptables start
Flushing firewall rules:                                   [  OK  ]
Setting chains to policy ACCEPT: nat filter                [  OK  ]
Unloading iptables modules:                                [  OK  ]
Applying iptables firewall rules:                          [  OK  ]
Loading additional iptables modules: ip_conntrack_ftp ip_nat_ftp ip_conntrack_irc ip_nat_irc                                               [  OK  ]

# /etc/init.d/iptables stop
Flushing firewall rules:                                   [  OK  ]
Setting chains to policy ACCEPT: nat filter                [  OK  ]
Unloading iptables modules:                                [  OK  ]
Applying iptables firewall rules:                          [  OK  ]
Loading additional iptables modules: ip_conntrack_ftp ip_nat_ftp ip_conntrack_irc ip_nat_irc                                               [  OK  ]

I have tried loading and unloading the modules several times and it is working fine.

Regards,
Minto
-----8<-----8<-----

I'm working on getting a test kernel to another customer as well.

Cheers,
Fabio Olive

Comment 17 RHEL Product and Program Management 2008-09-03 13:01:21 UTC
Updating PM score.

Comment 18 Graeme Fowler 2008-09-05 18:38:09 UTC
Just building myself a test kernel on my dev box (yeah, on a Friday night, I know) so will see what this gets me...

Comment 19 Graeme Fowler 2008-09-05 19:22:01 UTC
...and the answer is, nothing much - unfortunately the bastion host I use for access to work has just decided to stop responding! What a pain in the rear end.

Comment 20 Neil Horman 2008-09-05 19:36:13 UTC
no worries, we had some other interested parties in this bug.  They tested the fix and reported success.

Comment 23 Graeme Fowler 2008-09-12 09:31:36 UTC
OK, testing done.

iptables stops via init script without throwing any oopses.
machine reboots without throwing an oops

Confirmed: patch fixes problem when unloading iptables modules. Thanks, guys.

Comment 24 Greg Bock 2008-09-14 09:17:00 UTC
Changing the order for the calls to rmmod_r in the init script worked for test cases I ran into with customers running APF:

sed -i.bak -e 's/_tables$/_CONNTRACKNEW/;s/_conntrack$/_tables/;s/_CONNTRACKNEW$/_conntrack/' /etc/init.d/iptables

While not the best solution for this issue it does appear to avoid the trigger without having to roll a new kernel rpm. 

Is there an ETA for a GA release kernel with the patch from comment 8? It seems to me a change the iptables rpm for the above would probably get QA'd and pushed quicker than a kernel update and wouldn't require a reboot to get immediate relief from the bug.

Comment 27 Vivek Goyal 2008-10-07 21:02:41 UTC
Committed in 78.13.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/

Comment 29 Ivan Vecera 2008-10-10 14:08:03 UTC
*** Bug 466345 has been marked as a duplicate of this bug. ***

Comment 31 jonathan auerbach 2008-11-05 16:17:35 UTC
We have discovered that when the following is in the iptables-config:

IPTABLES_MODULES="ip_conntrack_ftp ip_nat_ftp iptable_nat ipt_REJECT iptable_mangle"

and on the 2.6.9-78.0.1 ( or .5) RHEL4 version kernel, the kernel panic does not occur when iptables is stopped.

Comment 32 Thomas Graf 2008-11-18 00:57:55 UTC
*** Bug 460967 has been marked as a duplicate of this bug. ***

Comment 36 Vitaly Mayatskikh 2009-01-14 17:22:11 UTC
*** Bug 480033 has been marked as a duplicate of this bug. ***

Comment 37 Scott Mohnkern 2009-01-21 21:32:13 UTC
I applied the patch to update to the SMP kernel 2.6.9-78.0.13 and am still encountering the problem.  However I did not apply the recommended changes in Comment #24.

Shouldn't the new kernel (updated via the Redhat Site) be enough?

Also have not applied changes listed in Comment #31.

Comment 38 Greg Bock 2009-01-21 22:11:38 UTC
The changes I discussed in Comment #24 were not kernel dependent and simply changed the order of the module unloading in an attempt to avoid the bug. Regardless, errata for this bug was released earlier today as noted in bug 477147.

Comment 39 Randy Brown 2009-01-27 15:29:51 UTC
Re: comment #31 - FYI

I only needed to add ip_nat_ftp to the IPTABLES_MODULES= line in iptables-config to make this work. This worked for the 2.6.9-78.0.8.ELsmp kernel and 2.6.9-78.0.13.ELsmp kernel on RHEL 4.

Comment 42 Jan Tluka 2009-02-11 12:51:02 UTC
Created attachment 331558 [details]
Testcase

Attached correct reproducer for this issue. Tested on RHEL 4.8

Comment 43 Cott Lang 2009-02-12 05:40:08 UTC
This is still a problem in 2.6.9-78.0.13 for us.

Comment 44 chrishickey 2009-03-02 05:43:06 UTC
Comment #31 fixed the issue for us. We are using 2.6.9-78.0.13elSMP.

Thanks so much!

Comment 49 Linda Wang 2009-04-07 19:15:32 UTC
*** Bug 467156 has been marked as a duplicate of this bug. ***

Comment 51 errata-xmlrpc 2009-05-18 19:09:50 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1024.html


Note You need to log in before you can comment on or make changes to this bug.