From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1 Description of problem: Updated system to Red Hat Enterprise Linux AS release 4 (Nahant Update 7) this morning including kernels: kernel-smp-2.6.9-78.EL kernel-hugemem-2.6.9-78.EL Machine booted into -smp kernel; changed to -hugemem version by editing grub.conf. Remotely rebooted but didn't come back up. Console showed oops, server panicked and halted. Version-Release number of selected component (if applicable): kernel-hugemem-2.6.9-78.EL How reproducible: Always Steps to Reproduce: 1. Boot into kernel family 2.6.9-78.EL (SMP or hugemem). 2. Allow to settle 3. Initiate reboot or "service iptables stop" Actual Results: Oops, panic, halt Expected Results: Graceful shutdown and restart or successful stop of iptables (including unload of modules). Additional info: After several tests we've switched panic_on_oops to 0 and caught the following: Jul 25 14:06:16 muttley kernel: Unable to handle kernel paging request at virtua l address f91696c8 Jul 25 14:06:16 muttley kernel: printing eip: Jul 25 14:06:16 muttley kernel: 0228d80e Jul 25 14:06:16 muttley kernel: *pde = 00000000 Jul 25 14:06:16 muttley kernel: Oops: 0000 [#1] Jul 25 14:06:16 muttley kernel: SMP Jul 25 14:06:16 muttley kernel: Modules linked in: sg ip_queue md5 ipv6 autofs4 i2c_dev i2c_core nfs lockd nfs_acl ip_vs sunrpc ip_tables cpufreq_powersave dm_m irror dm_mod button battery ac hw_random tg3 floppy ata_piix libata ext3 jbd cci ss sd_mod scsi_mod Jul 25 14:06:16 muttley kernel: CPU: 0 Jul 25 14:06:16 muttley kernel: EIP: 0060:[<0228d80e>] Not tainted VLI Jul 25 14:06:16 muttley kernel: EFLAGS: 00010292 (2.6.9-78.ELhugemem) Jul 25 14:06:16 muttley kernel: EIP is at nf_unregister_sockopt+0x48/0x83 Jul 25 14:06:16 muttley kernel: eax: 00000002 ebx: 02355784 ecx: e12b24a0 edx: f91696c0 Jul 25 14:06:16 muttley kernel: esi: f88932c0 edi: 00000000 ebp: e0156000 esp: e0156f5c Jul 25 14:06:16 muttley kernel: ds: 007b es: 007b ss: 0068 Jul 25 14:06:16 muttley kernel: Process modprobe (pid: 9389, threadinfo=e0156000 task=e029f930) Jul 25 14:06:16 muttley kernel: Stack: 00000000 0232eba8 f8891555 f8893400 02136 15a 00000000 745f7069 656c6261 Jul 25 14:06:16 muttley kernel: e08c0073 02150c3e e08cf6c4 e0242ac4 02150 f9b f6fa9000 f6faa000 f6faa000 Jul 25 14:06:16 muttley kernel: f6faa000 e025ad84 e08cf680 e08cf6b0 00000 000 e0156000 e0156fc4 00000000 Jul 25 14:06:16 muttley kernel: Call Trace: Jul 25 14:06:16 muttley iptables: failed Jul 25 14:06:16 muttley kernel: [<f8891555>] fini+0xd/0x35 [ip_tables] Jul 25 14:06:16 muttley kernel: [<0213615a>] sys_delete_module+0x13b/0x184 Jul 25 14:06:16 muttley kernel: [<02150c3e>] unmap_vma_list+0xe/0x17 Jul 25 14:06:16 muttley kernel: [<02150f9b>] do_munmap+0x135/0x143 Jul 25 14:06:16 muttley kernel: Code: 04 00 89 d9 f0 ff 0d 84 57 35 02 0f 88 93 0c 00 00 8b 0d 9c 57 35 02 8b 01 0f 18 00 90 81 f9 9c 57 35 02 74 2f 8b 51 08 8b 46 08 <39> 42 08 8b 11 75 1e 8b 41 04 89 42 04 89 10 89 c8 c7 01 00 01
Modules are removed from list after panic: [root@muttley ~]# lsmod Module Size Used by sg 37345 0 ip_queue 14553 0 md5 8129 1 ipv6 241121 18 autofs4 25669 0 i2c_dev 13377 0 i2c_core 26305 1 i2c_dev nfs 222249 0 lockd 66249 1 nfs nfs_acl 7745 1 nfs ip_vs 83649 0 sunrpc 143525 4 nfs,lockd,nfs_acl ip_tables 22337 0 cpufreq_powersave 5953 0 dm_mirror 32581 0 dm_mod 67049 1 dm_mirror button 10705 0 battery 12741 0 ac 8901 0 hw_random 9429 0 tg3 111045 0 floppy 57553 0 ata_piix 19781 0 libata 105629 1 ata_piix ext3 119497 6 jbd 59865 1 ext3 cciss 68013 16 sd_mod 20417 0 scsi_mod 119757 4 sg,libata,cciss,sd_mod Trying to modprobe returns silently without loading: [root@muttley ~]# modprobe -v ip_tables [root@muttley ~]# ...and trying to rmmod returns: [root@muttley ~]# modprobe -r ip_tables FATAL: Error removing ip_tables (/lib/modules/2.6.9-78.ELhugemem/kernel/net/ipv4/netfilter/ip_tables.ko): Device or resource busy Backing out to kernel-hugemem-2.6.9-67.0.22.EL gives expected behaviour.
SELinux is disabled here, by the way.
Running strace on "iptables -L -n" hangs at the following point: old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xf6f8b000 mprotect(0x8a3000, 8192, PROT_READ) = 0 mprotect(0xc55000, 4096, PROT_READ) = 0 mprotect(0x5a4000, 4096, PROT_READ) = 0 set_thread_area({entry_number:-1 -> 6, base_addr:0xf6f8b6c0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0 munmap(0xf6f8d000, 36429) = 0 socket(PF_INET, SOCK_RAW, IPPROTO_RAW) = 3 getsockopt(3, SOL_IP, 0x40 /* IP_??? */ Process can, however, be interrupted with CTRL-C
Same behavior here. Running system (kernel-smp-2.6.9-78.EL). This issue seems to occur for us only when ip_conntrack_* modules in addition to ip_conntrack are loaded. When only ip_conntrack is loaded the /sbin/service iptables appears to work as expected.
Exactly the same issue manifests itself in kernel family 2.6.9-78.0.1.EL - the machine oopses when unloading iptables modules.
Have experienced a similar kernel panic immediately after the iptables script was restarted. From the log below, it is clearly indicated that the ip_conntrack module was being unload when the kernel oops occurs. This panic occurred with a 2.6.9-78.0.1 smp kernel. A kernel panic was also observed on another system with the same kernel. ============================================================================== Aug 8 10:10:56 VENUS1 iptables: succeeded Aug 8 10:10:56 VENUS1 iptables: succeeded Aug 8 10:10:56 VENUS1 kernel: Unable to handle kernel paging request at virtual address f8a7f5e8 Aug 8 10:10:56 VENUS1 kernel: printing eip: Aug 8 10:10:56 VENUS1 kernel: c0294b1a Aug 8 10:10:56 VENUS1 kernel: *pde = 00000000 Aug 8 10:10:56 VENUS1 kernel: Oops: 0000 [#1] Aug 8 10:10:56 VENUS1 kernel: SMP Aug 8 10:10:56 VENUS1 kernel: Modules linked in: loop md5 ipv6 i2c_dev i2c_core ip_conntrack cpufreq_powersave joydev button battery ac uhci_hcd ehci_hcd i5000_edac edac_mc hw_random bnx2 ata_piix libata dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod megaraid_sas sd_mod scsi_mod Aug 8 10:10:56 VENUS1 kernel: CPU: 0 Aug 8 10:10:56 VENUS1 kernel: EIP: 0060:[<c0294b1a>] Not tainted VLI Aug 8 10:10:56 VENUS1 kernel: EFLAGS: 00010216 (2.6.9-78.0.1.ELsmp) Aug 8 10:10:56 VENUS1 kernel: EIP is at nf_unregister_sockopt+0x48/0x83 Aug 8 10:10:56 VENUS1 kernel: eax: 00000002 ebx: c035e784 ecx: f7125c80 edx: f8a7f5e0 Aug 8 10:10:56 VENUS1 kernel: esi: f8aa97e0 edi: 00000000 ebp: ccace000 esp: ccacef5c Aug 8 10:10:56 VENUS1 kernel: ds: 007b es: 007b ss: 0068 Aug 8 10:10:56 VENUS1 kernel: Process modprobe (pid: 6314, threadinfo=ccace000 task=f5d1e130) Aug 8 10:10:56 VENUS1 kernel: Stack: 00000000 c0337ba8 f8aa0769 f8aa9c80 c01373ce 00000000 635f7069 746e6e6f Aug 8 10:10:56 VENUS1 kernel: 6b636172 c0151e00 c39753c4 f7382964 c01521f2 b7fb4000 b7fb5000 b7fb5000 Aug 8 10:10:56 VENUS1 kernel: b7fb5000 f6dd2c24 c3975380 c39753b0 00000000 ccace000 089a5840 00000000 Aug 8 10:10:56 VENUS1 kernel: Call Trace: Aug 8 10:10:56 VENUS1 kernel: [<f8aa0769>] init_or_cleanup+0x1e6/0x1ea [ip_conntrack] Aug 8 10:10:56 VENUS1 kernel: [<c01373ce>] sys_delete_module+0x13b/0x184 Aug 8 10:10:56 VENUS1 kernel: [<c0151e00>] free_pgtables+0x12/0x7b Aug 8 10:10:56 VENUS1 kernel: [<c01521f2>] do_munmap+0x108/0x116 Aug 8 10:10:56 VENUS1 kernel: [<c02e09d7>] syscall_call+0x7/0xb Aug 8 10:10:56 VENUS1 kernel: Code: 04 00 89 d9 f0 ff 0d 84 e7 35 c0 0f 88 93 0c 00 00 8b 0d 9c e7 35 c0 8b 01 0f 18 00 90 81 f9 9c e7 35 c0 74 2f 8b 51 08 8b 46 08 <39> 42 08 8b 11 75 1e 8b 41 04 89 42 04 89 10 89 c8 c7 01 00 01 Aug 8 10:10:56 VENUS1 kernel: <0>Fatal exception: panic in 5 seconds
Similarly I was finding that every time I tried to shutdown or restart one of our RHEL machines that it was panicking in ip_conntrack. Removing some iptables modules from /etc/sysconfig/iptables-config has proved to be a temporary fix: IPTABLES_MODULES="ip_conntrack_ftp ip_nat_ftp ip_conntrack_irc ip_nat_irc" to IPTABLES_MODULES=""
Created attachment 313991 [details] patch to be more restrictive in matching ops to delete Hey, I think I see the problem. It appears that on unregister, we assume we've found the right entry to remove if wrapper->ops->pf matchesx the passed in reg->pf value. Given that if you have ip_conntrack and ip_contrack_* loaded you probably have several ops ranges registered against the same protocol family. As such when you unregister, you remove the first matching pf entry that you come to, which (unless you unload in the reverse order that you loaded in), will lead to NULL list pointers that you don't expect). I've not tested it yet, but I've attached a patch above which should be more restrictive on what entry we delete, matching on pointer values (since nf_sockopt_register_owner assigns wrapper->ops to the passed in reg value). This should remove the specific entry we are searching for. If someone could give it a test and let me know the results, I'd appreciate it. Thanks!
Unfortunately I'm on leave at the moment and won't be back in a position to test this until the first week in September...
Ok, let me know how this works when you get back
Hi Neil, We have internal confirmation that your patch fixes the problem: -----8<-----8<----- Hello Fabio, <snip> a) reproduce the problem with unpatched kernels; </snip> Yes I am able to reproduce the issue with unpatched kernels every time. <snip> b) demonstrate Neil's patch fixes the issue by loading and unloading all ip_conntrack_* modules several times? </snip> When I am using patched kernel - # uname -a Linux dhcp6-182.XXX.com 2.6.9-78.0.1.TEST.ELsmp #1 SMP Fri Aug 22 07:19:19 EDT 2008 i686 i686 i386 GNU/Linux # grep IPTABLES_MODULES /etc/sysconfig/iptables-config IPTABLES_MODULES="ip_conntrack_ftp ip_nat_ftp ip_conntrack_irc ip_nat_irc" IPTABLES_MODULES_UNLOAD="yes" # /etc/init.d/iptables start Flushing firewall rules: [ OK ] Setting chains to policy ACCEPT: nat filter [ OK ] Unloading iptables modules: [ OK ] Applying iptables firewall rules: [ OK ] Loading additional iptables modules: ip_conntrack_ftp ip_nat_ftp ip_conntrack_irc ip_nat_irc [ OK ] # /etc/init.d/iptables stop Flushing firewall rules: [ OK ] Setting chains to policy ACCEPT: nat filter [ OK ] Unloading iptables modules: [ OK ] Applying iptables firewall rules: [ OK ] Loading additional iptables modules: ip_conntrack_ftp ip_nat_ftp ip_conntrack_irc ip_nat_irc [ OK ] I have tried loading and unloading the modules several times and it is working fine. Regards, Minto -----8<-----8<----- I'm working on getting a test kernel to another customer as well. Cheers, Fabio Olive
Updating PM score.
Just building myself a test kernel on my dev box (yeah, on a Friday night, I know) so will see what this gets me...
...and the answer is, nothing much - unfortunately the bastion host I use for access to work has just decided to stop responding! What a pain in the rear end.
no worries, we had some other interested parties in this bug. They tested the fix and reported success.
OK, testing done. iptables stops via init script without throwing any oopses. machine reboots without throwing an oops Confirmed: patch fixes problem when unloading iptables modules. Thanks, guys.
Changing the order for the calls to rmmod_r in the init script worked for test cases I ran into with customers running APF: sed -i.bak -e 's/_tables$/_CONNTRACKNEW/;s/_conntrack$/_tables/;s/_CONNTRACKNEW$/_conntrack/' /etc/init.d/iptables While not the best solution for this issue it does appear to avoid the trigger without having to roll a new kernel rpm. Is there an ETA for a GA release kernel with the patch from comment 8? It seems to me a change the iptables rpm for the above would probably get QA'd and pushed quicker than a kernel update and wouldn't require a reboot to get immediate relief from the bug.
Committed in 78.13.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
*** Bug 466345 has been marked as a duplicate of this bug. ***
We have discovered that when the following is in the iptables-config: IPTABLES_MODULES="ip_conntrack_ftp ip_nat_ftp iptable_nat ipt_REJECT iptable_mangle" and on the 2.6.9-78.0.1 ( or .5) RHEL4 version kernel, the kernel panic does not occur when iptables is stopped.
*** Bug 460967 has been marked as a duplicate of this bug. ***
*** Bug 480033 has been marked as a duplicate of this bug. ***
I applied the patch to update to the SMP kernel 2.6.9-78.0.13 and am still encountering the problem. However I did not apply the recommended changes in Comment #24. Shouldn't the new kernel (updated via the Redhat Site) be enough? Also have not applied changes listed in Comment #31.
The changes I discussed in Comment #24 were not kernel dependent and simply changed the order of the module unloading in an attempt to avoid the bug. Regardless, errata for this bug was released earlier today as noted in bug 477147.
Re: comment #31 - FYI I only needed to add ip_nat_ftp to the IPTABLES_MODULES= line in iptables-config to make this work. This worked for the 2.6.9-78.0.8.ELsmp kernel and 2.6.9-78.0.13.ELsmp kernel on RHEL 4.
Created attachment 331558 [details] Testcase Attached correct reproducer for this issue. Tested on RHEL 4.8
This is still a problem in 2.6.9-78.0.13 for us.
Comment #31 fixed the issue for us. We are using 2.6.9-78.0.13elSMP. Thanks so much!
*** Bug 467156 has been marked as a duplicate of this bug. ***
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1024.html