Bug 158710
Summary: | iptables rule causes hang on boot | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Tim <ack210t> |
Component: | kernel | Assignee: | David Miller <davem> |
Status: | CLOSED NOTABUG | QA Contact: | Brian Brock <bbrock> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 3 | CC: | davej, thibault.lemeur, wtogami |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i686 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2005-08-11 03:26:27 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Tim
2005-05-25 02:17:58 UTC
I've got an quite similar issue on some servers. I managed to boot (with remaining problems) by disabling the RedHat Graphical boot (option norhgb into grub.conf). However, even if it can boot some network services are not available locally. I suspect an issue on iptables and the loopback interface with this new kernel. Once booted, if I can't "ssh localhost" and some errors appear in my /var/log/messages file: May 25 15:13:17 SRVNAME kernel: RULE 17 -- DENY IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC=127.0.0.1 DST=127.0.0.1 LEN=75 TOS=0x00 PREC=0x00 TTL=64 ID=23803 DF PROTO=TCP SPT=22 DPT=34542 WINDOW=8192 RES=0x00 ACK PSH URGP=0 Note that this is the reply from my sshd server that is blocked on the loopback interface. However my iptable script has got the following lines: $IPTABLES -A INPUT -i lo -m state --state NEW -j ACCEPT $IPTABLES -A OUTPUT -o lo -m state --state NEW -j ACCEPT Maybe there is something different in the way the loopback interface is handled in this new kernel ? If I disable the firewall with "/etc/init.d/iptable stop", I can ssh localhost again. I tried that out, but aside from being able to easily see the boot messages (thanks very much for that!!) had no luck. On my system the boot process still hangs at Starting NFS statd: Looking at /var/log/boot.log, I see two different final entries whenever I try to boot .27. The log says either May 26 10:38:27 pearl ifup: Determining IP information for eth0... May 26 10:38:32 pearl NET: /sbin/dhclient-script : updated /etc/resolv.conf May 26 10:38:32 pearl ifup: done. May 26 10:38:32 pearl network: Bringing up interface eth0: succeeded May 26 10:39:33 pearl nfslock: rpc.statd startup succeeded or just May 26 11:27:58 pearl ifup: Determining IP information for eth0... May 26 11:28:02 pearl NET: /sbin/dhclient-script : updated /etc/resolv.conf May 26 11:28:02 pearl ifup: done. May 26 11:28:02 pearl network: Bringing up interface eth0: succeeded I tried removing the "quiet" option from the kernel command to boot .27 as well, but although this generated a lot of extra output early on, the log at the time of the freeze-up doesn't have any additional information. If I go back to kernel 2.6.11-1.14 everything works fine. All packages on the system are up to date (per RHN). Does NumLock work? If yes, we'll need Alt-SysRq-T. If not, what does happen if you add "nmi_watchdog=1" in grub.conf? Thanks for looking at this. Actually in trying these things out I find that the system now boots up after a 60 second delay at Starting NFS statd: I can't say whether this has always been the case. In the logs, there are times when the delay is clearly longer than 60 seconds, but this is the behavior I see today. At the point of the delay, NumLock works. I tried entering Alt-SysRq-T but there was no response. What should I have seen? I also added nmi_watchdog=1 to the grub configuration and didn't notice any additional information on the console or in the logs. Running with .27, I noticed additional data points: (1) Sometimes in the logs I see messages: rpc.statd[3038]: unable to register (statd, 1, udp). (2) Although it appears that statd does eventually start (I see it in the ps output), when I shut down the machine, I get shutdown failure messages from both lockd and statd. (3) When I choose "Log Out" from the Gnome "Actions" menu, the top menu freezes up. No dialog box appears and "Actions" remains highlighted. The only way out is to reboot the machine from an xterm (if one is still available) or by switching to a console screen OR (as I just discovered while writing this) if I wait some very long period of time -- maybe 5 minutes -- the "Log Out" dialog box will appear. If Gnome uses RPC calls, this could all be related to a subtle incompatibility between the .27 kernel and the RPC code. Tim I confirm that I've got the same issues as you: * Issue while "Starting NFS statd" with sometimes the "unable to register (statd, 1, udp" log entry * Very long to appear "Gnome LOG OUT..." dialog box These issue only occur with both: * a firewall (iptables rules by Fwbuilder) startup script enable * The .27 Kernel Once you've booted withyour .27 kernel could you try: * "rpcinfo -p" and check if the portmapper is responding (mine is not) * try "/etc/init.d/iptables stop" * try back "rpcinfo -p" to see if you can now contact the portmapper Have you got any "firewall" script in your /etc/init.d directory, or are you only using the standard "/etc/init.d/iptables" script ? In the second case could you try the following: # /etc/init.d/iptables stop # cp /etc/sysconfig/iptables /etc/sysconfig/iptables.old # /etc/init.d/iptables save If you are in the first case, could you also disable your firewall script by something like: # chkconfig firewall off Then could you try to reboot your system with the .27 kernel and check if your still have teh problem ? ==> I've done this test and I was able to boot without any of the previously described problems. I'm still suspecting an issue on iptables on the loopack interface, and still have the strange log entries in my /var/log/messages file concerning blocked traffic on the loopback interface. As in your setup, If I switch back to my old kernel everything is fine (even with my firewall script). Thibault Hi Thibault, Thanks for that! I checked this and the problem seems to be that Netfilter is marking at least some previously acceptable packets from localhost as INVALID. I had a rule in my iptables setup that explicitly dropped any INVALID packet very early on in the filter table. Changing the setup so that all packets from localhost are accepted first, then dropping INVALID ones after that, appears to have solved the problem. I'm not familiar with FwBuilder so I'm not how to modify its configuration. What I have now in /etc/sysconfig/iptables is essentially this (minus some local stuff): *filter :INPUT ACCEPT [0:0] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [0:0] :RH-Firewall-1-INPUT - [0:0] -A INPUT -j RH-Firewall-1-INPUT -A FORWARD -j RH-Firewall-1-INPUT -A RH-Firewall-1-INPUT -i lo -j ACCEPT <--- Rule that accepts from localhost -A RH-Firewall-1-INPUT -j DROP -m state --state INVALID <--- My DROP rule -A RH-Firewall-1-INPUT -p icmp --icmp-type any -j ACCEPT -A RH-Firewall-1-INPUT -p 50 -j ACCEPT -A RH-Firewall-1-INPUT -p 51 -j ACCEPT -A RH-Firewall-1-INPUT -p udp --dport 5353 -d 224.0.0.251 -j ACCEPT -A RH-Firewall-1-INPUT -p udp -m udp --dport 631 -j ACCEPT -A RH-Firewall-1-INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT -A RH-Firewall-1-INPUT -j REJECT --reject-with icmp-host-prohibited COMMIT I'm not sure what happens now with this bug. Apparently the problem is in the way Netfilter classifies local packets. I'll poke around on the netfilter-devel mailing list and see if anybody else is screaming about this yet. Thanks very much for your help!!! Tim We've found the iptables commands that are not correctly interpreted with the .27 kernel. Indeed, if we change the following lines in our firewall script $IPTABLES -A INPUT -i lo -m state --state NEW -j ACCEPT $IPTABLES -A OUTPUT -o lo -m state --state NEW -j ACCEPT By $IPTABLES -A INPUT -i lo -s 127.0.0.1 -d 127.0.0.1 -j ACCEPT $IPTABLES -A OUTPUT -o lo -s 127.0.0.1 -d 127.0.0.1 -j ACCEPT the system can boot with the new .27 kernel without problem! I'm still wondering why the first two lines aren't working anymore with the new kernel?? Tim, can you confirm (or not) that iptables could be the cause of your problems as it is the case for me? Thibault Tim, you can forget my last question since our two last comments were posted "apparently" at the same time... I wonder what devel team is more about to answer: the netfilter one or the the kernel one ? Let me know if you have updates on this issue from the netfilter lists. Thanks, Thibault Hi Thibault! It turns out that a patch to fix this was added to the Netfilter code on May 4th. The problem description reads: TCP connection tracking incorrectly tries to verify the checksum of CHECKSUM_UNNECESSARY packets. This causes packets on loopback to be tracked as INVALID since we now drop the conntrack reference on output and don't skip connection tracking on input anymore. I'm not sure what kernel version they are working on right now, but I guess we will see this fix in an upcoming FC kernel update. Thanks for all your help!!! Tim You're right, and I confirm that this is it (https://lists.netfilter.org/pipermail/netfilter-devel/2005-May/019543.html) I hope the new kernel RPM with this fix will not last to be released... For those interested I wrote a simple iptables script and got the following log lines while simply running "ssh localhost": Jun 2 10:47:41 MYHOST kernel: lo_NEW_RULE - Accept IN= OUT=lo SRC=127.0.0.1 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=47020 DF PROTO=TCP SPT=52640 DPT=22 WINDOW=32767 RES=0x00 SYN URGP=0 Jun 2 10:47:41 MYHOST kernel: lo_NEW_RULE - Accept IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC=127.0.0.1 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=47020 DF PROTO=TCP SPT=52640 DPT=22 WINDOW=32767 RES=0x00 SYN URGP=0 Jun 2 10:47:41 MYHOST kernel: lo_INVALID_RULE - Accept IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC=127.0.0.1 DST=127.0.0.1 LEN=75 TOS=0x00 PREC=0x00 TTL=64 ID=40002 DF PROTO=TCP SPT=22 DPT=52640 WINDOW=8192 RES=0x00 ACK PSH URGP=0 Jun 2 10:47:41 MYHOST kernel: lo_INVALID_RULE - Accept IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC=127.0.0.1 DST=127.0.0.1 LEN=74 TOS=0x00 PREC=0x00 TTL=64 ID=47026 DF PROTO=TCP SPT=52640 DPT=22 WINDOW=8192 RES=0x00 ACK PSH URGP=0 Here is the test-lo4INVALID.fw script: ---------------------------------------------------------------------------- ---------------------------------------------------------------------------- #!/bin/sh # PATH="/sbin:/usr/sbin:/bin:/usr/bin:${PATH}" export PATH IPTABLES="/sbin/iptables" # Default policy $IPTABLES -P OUTPUT DROP $IPTABLES -P INPUT DROP $IPTABLES -P FORWARD DROP # Flush tables cat /proc/net/ip_tables_names | while read table; do test "X$table" = "Xmangle" && continue $IPTABLES -t $table -L -n | while read c chain rest; do if test "X$c" = "XChain" ; then $IPTABLES -t $table -F $chain fi done $IPTABLES -t $table -X done # Accept related, established $IPTABLES -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT $IPTABLES -A OUTPUT -m state --state ESTABLISHED,RELATED -j ACCEPT $IPTABLES -A FORWARD -m state --state ESTABLISHED,RELATED -j ACCEPT echo "Rule 0 (lo)" # Added to confirm the bug $IPTABLES -N lo_INVALID_RULE $IPTABLES -A INPUT -i lo -m state --state INVALID -j lo_INVALID_RULE $IPTABLES -A OUTPUT -o lo -m state --state INVALID -j lo_INVALID_RULE $IPTABLES -A lo_INVALID_RULE -j LOG --log-level info --log-prefix "lo_INVALID_RULE - Accept " $IPTABLES -A lo_INVALID_RULE -j ACCEPT # Standard loopback rule $IPTABLES -N lo_NEW_RULE $IPTABLES -A INPUT -i lo -m state --state NEW -j lo_NEW_RULE $IPTABLES -A OUTPUT -o lo -m state --state NEW -j lo_NEW_RULE $IPTABLES -A lo_NEW_RULE -j LOG --log-level info --log-prefix "lo_NEW_RULE - Accept " $IPTABLES -A lo_NEW_RULE -j ACCEPT # Simple test policy $IPTABLES -A OUTPUT -s 160.228.120.129 -m state --state NEW -j ACCEPT $IPTABLES -A INPUT -d 160.228.120.129 -p tcp -m tcp --dport 22 -m state --state NEW -j ACCEPT ---------------------------------------------------------------------------- ---------------------------------------------------------------------------- This fix will appear in the 2.6.11-1.33_FC3 (and higher versions) which will appear at http://people.redhat.com/davej/kernels/Fedora/FC3 in an hour or so. *** Bug 159388 has been marked as a duplicate of this bug. *** An update has been released for Fedora Core 3 (kernel-2.6.12-1.1372_FC3) which may contain a fix for your problem. Please update to this new kernel, and report whether or not it fixes your problem. If you have updated to Fedora Core 4 since this bug was opened, and the problem still occurs with the latest updates for that release, please change the version field of this bug to 'fc4'. Thank you. Sorry about the delay. I installed 2.6.12-1.1372_FC3 today (via up2date) and am surprised to find that the problem remains unchanged. It is easily reproduced by adding a rule to drop INVALID packets as the first rule of the INPUT chain: Chain RH-Firewall-1-INPUT (2 references) pkts bytes target prot opt in out source destination 36 3024 DROP all -- * * 0.0.0.0/0 0.0.0.0/0 state INVALID 27 5640 ACCEPT all -- lo * 0.0.0.0/0 0.0.0.0/0 and fixed again by inverting the above order so that loopback traffic is passed through regardless before the INVALID test: Chain RH-Firewall-1-INPUT (2 references) pkts bytes target prot opt in out source destination 37 6196 ACCEPT all -- lo * 0.0.0.0/0 0.0.0.0/0 0 0 DROP all -- * * 0.0.0.0/0 0.0.0.0/0 state INVALID I also noted the same long delay during boot when starting NFS statd, and the trouble with Gnome. Tim Then this is a different bug to the one fixed by the patch above (which is included in the latest update). instead of making it DROP them, can you do something like.. iptables -A INPUT -m state --state INVALID -j LOG --log-prefix "in dropped: " and see what's getting logged in dmesg output? These are log entries that appeared during the boot process: Would Drop Invalid: IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC=127.0.0.1 DST=127.0.0.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=UDP SPT=32768 DPT=111 LEN=64 Would Drop Invalid: IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC=127.0.0.1 DST=127.0.0.1 LEN=56 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=UDP SPT=111 DPT=32768 LEN=36 Would Drop Invalid: IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC=127.0.0.1 DST=127.0.0.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=UDP SPT=32768 DPT=111 LEN=64 Would Drop Invalid: IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC=127.0.0.1 DST=127.0.0.1 LEN=56 TOS=0x00 PREC=0x00 TTL=64 ID=1 DF PROTO=UDP SPT=111 DPT=32768 LEN=36 Would Drop Invalid: IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC=127.0.0.1 DST=127.0.0.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=UDP SPT=32769 DPT=111 LEN=64 Would Drop Invalid: IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC=127.0.0.1 DST=127.0.0.1 LEN=56 TOS=0x00 PREC=0x00 TTL=64 ID=2 DF PROTO=UDP SPT=111 DPT=32769 LEN=36 Would Drop Invalid: IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC=127.0.0.1 DST=127.0.0.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=UDP SPT=32769 DPT=111 LEN=64 Would Drop Invalid: IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC=127.0.0.1 DST=127.0.0.1 LEN=56 TOS=0x00 PREC=0x00 TTL=64 ID=3 DF PROTO=UDP SPT=111 DPT=32769 LEN=36 Plus one that occurred sometime after that: Would Drop Invalid: IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC=127.0.0.1 DST=127.0.0.1 LEN=40 TOS=0x00 PREC=0x00 TTL=64 ID=2 DF PROTO=TCP SPT=5680 DPT=42088 WINDOW=0 RES=0x00 ACK RST URGP=0 Port 111 is the RPC portmapper, while 5680 is Canna, for Japanese input. Let me know if there is any additional data you'd like! Tim Does adding ... iptables -A INPUT -i lo -s 127.0.0.1 -d 127.0.0.1 -j ACCEPT before the DROP rule fix things ? Hello! And yes, as I guess you'd expect, that does work around the problem. What I don't get is why the loopback packets are being marked invalid in the first place. Odd. Tim davem, should we be dropping those packets ? My first instinct was that this was because they're udp packets, and we have no 'state' (invalid or otherwise) to examine. But they should be just passed through rather than flagged invalid surely ? I suspect the current 2.6.12.x stable kernel has a fix for this. There was a recent change in how we handle loopback packets in netfilter that had to be refined some more. This discussion really belongs on netfilter-devel or similar, as I am far from an expert in this area. Yes, I guess now that the root problem is identified it would be better to bring it up there (or find that it's already been solved). I'm not sure what you want to do with the ticket. Leave it as-is until the next kernel update, or I could just open a new one with a more accurate title if problem persists? Tim FWIW, the new kernel, 2.6.12-1.1378_FC3, appears to have finally fixed this bug. Tim |