Bug 158710

Summary: iptables rule causes hang on boot
Product: [Fedora] Fedora Reporter: Tim <ack210t>
Component: kernelAssignee: David Miller <davem>
Status: CLOSED NOTABUG QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 3CC: davej, thibault.lemeur, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-08-11 03:26:27 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Tim 2005-05-25 02:17:58 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.8) Gecko/20050513 Fedora/1.0.4-1.3.1 Firefox/1.0.4

Description of problem:
After upgrading a workstation installation from 2.6.11-1.14_FC3 to 2.6.11-1.27_FC3 via up2date, the system hangs during the bootstrap process. The last events recorded in the boot log were:

May 25 10:02:56 pearl ifup: Determining IP information for eth0...
May 25 10:03:00 pearl dhclient: DHCPREQUEST on eth0 to 255.255.255.255 port 67
May 25 10:03:00 pearl dhclient: DHCPACK from 192.168.0.1
May 25 10:03:00 pearl NET: /sbin/dhclient-script : updated /etc/resolv.conf
May 25 10:03:00 pearl dhclient: bound to 192.168.0.5 -- renewal in 1732974 seconds.
May 25 10:03:00 pearl ifup:  done.
May 25 10:03:00 pearl network: Bringing up interface eth0:  succeeded
May 25 10:04:01 pearl nfslock: rpc.statd startup succeeded

The previous version of the kernel was running fine for about a month before this upgrade. The only other packages upgraded at the same time as the kernel were netpbm, netpbm-devel, and netpbm-progs. Most other components are up to date (per RHN). After going back to 2.6.11-1.14_FC3 via the Grub menu, the system is fine, but all attempts to boot .27 fail.

The graphical startup program hides the console output (any way to turn that off forever?), and freezes at "Starting firewall," so I can't tell whether there are additional console messages.

Version-Release number of selected component (if applicable):
 kernel-2.6.11-1.27_FC3

How reproducible:
Always

Steps to Reproduce:
1. Boot the system
2.
3.
  

Actual Results:  As described above, however note that the log does not always end with the startup of rpc.statd. It sometimes ends at the line before.

Additional info:

Comment 1 Thibault LE MEUR 2005-05-25 14:44:15 UTC
I've got an quite similar issue on some servers.
I managed to boot (with remaining problems) by disabling the RedHat Graphical 
boot (option norhgb into grub.conf). However, even if it can boot some network 
services are not available locally.

I suspect an issue on iptables and the loopback interface with this new kernel.
Once booted, if I can't  "ssh localhost" and some errors appear in 
my /var/log/messages file:
May 25 15:13:17 SRVNAME kernel: RULE 17 -- DENY IN=lo OUT= 
MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC=127.0.0.1 DST=127.0.0.1 
LEN=75 TOS=0x00 PREC=0x00 TTL=64 ID=23803 DF PROTO=TCP SPT=22 DPT=34542 
WINDOW=8192 RES=0x00 ACK PSH URGP=0

Note that this is the reply from my sshd server that is blocked on the 
loopback interface. However my iptable script has got the following lines:
$IPTABLES -A INPUT  -i lo  -m state --state NEW  -j ACCEPT
$IPTABLES -A OUTPUT  -o lo  -m state --state NEW  -j ACCEPT


Maybe there is something different in the way the loopback interface is 
handled in this new kernel ?

If I disable the firewall with "/etc/init.d/iptable stop", I can ssh localhost 
again.

Comment 2 Tim 2005-05-26 02:50:20 UTC
I tried that out, but aside from being able to easily see the boot messages
(thanks very much for that!!) had no luck. On my system the boot process still
hangs at

Starting NFS statd:

Looking at /var/log/boot.log, I see two different final entries whenever I try
to boot .27. The log says either

May 26 10:38:27 pearl ifup: Determining IP information for eth0...
May 26 10:38:32 pearl NET: /sbin/dhclient-script : updated /etc/resolv.conf
May 26 10:38:32 pearl ifup:  done.
May 26 10:38:32 pearl network: Bringing up interface eth0:  succeeded
May 26 10:39:33 pearl nfslock: rpc.statd startup succeeded

or just

May 26 11:27:58 pearl ifup: Determining IP information for eth0...
May 26 11:28:02 pearl NET: /sbin/dhclient-script : updated /etc/resolv.conf
May 26 11:28:02 pearl ifup:  done.
May 26 11:28:02 pearl network: Bringing up interface eth0:  succeeded

I tried removing the "quiet" option from the kernel command to boot .27 as well,
but although this generated a lot of extra output early on, the log at the time
of the freeze-up doesn't have any additional information.

If I go back to kernel 2.6.11-1.14 everything works fine. All packages on the
system are up to date (per RHN).

Comment 3 Pete Zaitcev 2005-05-27 17:42:53 UTC
Does NumLock work? If yes, we'll need Alt-SysRq-T.
If not, what does happen if you add "nmi_watchdog=1" in grub.conf?


Comment 4 Tim 2005-05-28 03:34:59 UTC
Thanks for looking at this. Actually in trying these things out I find that the
system now boots up after a 60 second delay at

Starting NFS statd:

I can't say whether this has always been the case. In the logs, there are times
when the delay is clearly longer than 60 seconds, but this is the behavior I see
today.

At the point of the delay, NumLock works. I tried entering Alt-SysRq-T but there
was no response. What should I have seen? I also added nmi_watchdog=1 to the
grub configuration and didn't notice any additional information on the console
or in the logs.

Running with .27, I noticed additional data points:

(1) Sometimes in the logs I see messages:

rpc.statd[3038]: unable to register (statd, 1, udp).

(2) Although it appears that statd does eventually start (I see it in the ps
output), when I shut down the machine, I get shutdown failure messages from both
lockd and statd.

(3) When I choose "Log Out" from the Gnome "Actions" menu, the top menu freezes
up. No dialog box appears and "Actions" remains highlighted. The only way out is
to reboot the machine from an xterm (if one is still available) or by switching
to a console screen OR (as I just discovered while writing this) if I wait some
very long period of time -- maybe 5 minutes -- the "Log Out" dialog box will appear.

If Gnome uses RPC calls, this could all be related to a subtle incompatibility
between the .27 kernel and the RPC code.

Tim

Comment 5 Thibault LE MEUR 2005-05-30 12:37:56 UTC
I confirm that I've got the same issues as you:
* Issue while "Starting NFS statd" with sometimes the "unable to register 
(statd, 1, udp" log entry
* Very long to appear "Gnome LOG OUT..." dialog box

These issue only occur with both:
* a firewall (iptables rules by Fwbuilder) startup script enable
* The .27 Kernel

Once you've booted withyour .27 kernel could you try:
* "rpcinfo -p" and check if the portmapper is responding (mine is not)
* try "/etc/init.d/iptables stop"
* try back "rpcinfo -p" to see if you can now contact the portmapper

Have you got any "firewall" script in your /etc/init.d directory, or are you 
only using the standard "/etc/init.d/iptables" script ?
In the second case could you try the following:
# /etc/init.d/iptables stop
# cp /etc/sysconfig/iptables /etc/sysconfig/iptables.old
# /etc/init.d/iptables save
If you are in the first case, could you also disable your firewall script by 
something like:
# chkconfig firewall off

Then could you try to reboot your system with the .27 kernel and check if your 
still have teh problem ?
==> I've done this test and I was able to boot without any of the previously 
described problems.

I'm still suspecting an issue on iptables on the loopack interface, and still 
have the strange log entries in my /var/log/messages file concerning blocked 
traffic on the loopback interface.
As in your setup, If I switch back to my old kernel everything is fine (even 
with my firewall script).

Thibault


Comment 6 Tim 2005-06-01 10:23:31 UTC
Hi Thibault,

Thanks for that! I checked this and the problem seems to be that Netfilter is
marking at least some previously acceptable packets from localhost as INVALID. I
had a rule in my iptables setup that explicitly dropped any INVALID packet very
early on in the filter table. Changing the setup so that all packets from
localhost are accepted first, then dropping INVALID ones after that, appears to
have solved the problem.

I'm not familiar with FwBuilder so I'm not how to modify its configuration. What
I have now in /etc/sysconfig/iptables is essentially this (minus some local stuff):

*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:RH-Firewall-1-INPUT - [0:0]
-A INPUT -j RH-Firewall-1-INPUT
-A FORWARD -j RH-Firewall-1-INPUT
-A RH-Firewall-1-INPUT -i lo -j ACCEPT <--- Rule that accepts from localhost
-A RH-Firewall-1-INPUT -j DROP -m state --state INVALID <--- My DROP rule
-A RH-Firewall-1-INPUT -p icmp --icmp-type any -j ACCEPT
-A RH-Firewall-1-INPUT -p 50 -j ACCEPT
-A RH-Firewall-1-INPUT -p 51 -j ACCEPT
-A RH-Firewall-1-INPUT -p udp --dport 5353 -d 224.0.0.251 -j ACCEPT
-A RH-Firewall-1-INPUT -p udp -m udp --dport 631 -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
-A RH-Firewall-1-INPUT -j REJECT --reject-with icmp-host-prohibited
COMMIT

I'm not sure what happens now with this bug. Apparently the problem is in the
way Netfilter classifies local packets. I'll poke around on the netfilter-devel
mailing list and see if anybody else is screaming about this yet.

Thanks very much for your help!!!

Tim

Comment 7 Thibault LE MEUR 2005-06-01 13:08:09 UTC
We've found the iptables commands that are not correctly interpreted with 
the .27 kernel.
Indeed, if we change the following lines in our firewall script
          $IPTABLES -A INPUT  -i lo  -m state --state NEW  -j ACCEPT
          $IPTABLES -A OUTPUT  -o lo  -m state --state NEW  -j ACCEPT
By
          $IPTABLES -A INPUT  -i lo   -s 127.0.0.1 -d 127.0.0.1 -j ACCEPT
          $IPTABLES -A OUTPUT  -o lo   -s 127.0.0.1 -d 127.0.0.1  -j ACCEPT
the system can boot with the new .27 kernel without problem!

I'm still wondering why the first two lines aren't working anymore with the 
new kernel??

Tim, can you confirm (or not) that iptables could be the cause of your 
problems as it is the case for me?

Thibault


Comment 8 Thibault LE MEUR 2005-06-01 13:20:25 UTC
Tim, you can forget my last question since our two last comments were 
posted "apparently" at the same time...

I wonder what devel team is more about to answer: the netfilter one or the the 
kernel one ?

Let me know if you have updates on this issue from the netfilter lists.

Thanks,
Thibault

Comment 9 Tim 2005-06-02 01:52:40 UTC
Hi Thibault!

It turns out that a patch to fix this was added to the Netfilter code on May
4th. The problem description reads:

TCP connection tracking incorrectly tries to verify the checksum of
CHECKSUM_UNNECESSARY packets. This causes packets on loopback to
be tracked as INVALID since we now drop the conntrack reference on
output and don't skip connection tracking on input anymore.

I'm not sure what kernel version they are working on right now, but I guess we
will see this fix in an upcoming FC kernel update.

Thanks for all your help!!!

Tim

Comment 10 Thibault LE MEUR 2005-06-02 09:05:16 UTC
You're right, and I confirm that this is it 
(https://lists.netfilter.org/pipermail/netfilter-devel/2005-May/019543.html)

I hope the new kernel RPM with this fix will not last to be released...


For those interested I wrote a simple iptables script and got the following 
log lines while simply running "ssh localhost":
Jun  2 10:47:41 MYHOST kernel: lo_NEW_RULE - Accept IN= OUT=lo SRC=127.0.0.1
DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=47020 DF PROTO=TCP
SPT=52640 DPT=22 WINDOW=32767 RES=0x00 SYN URGP=0
Jun  2 10:47:41 MYHOST kernel: lo_NEW_RULE - Accept IN=lo OUT=
MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC=127.0.0.1 DST=127.0.0.1
LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=47020 DF PROTO=TCP SPT=52640 DPT=22
WINDOW=32767 RES=0x00 SYN URGP=0
Jun  2 10:47:41 MYHOST kernel: lo_INVALID_RULE - Accept IN=lo OUT=
MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC=127.0.0.1 DST=127.0.0.1
LEN=75 TOS=0x00 PREC=0x00 TTL=64 ID=40002 DF PROTO=TCP SPT=22 DPT=52640
WINDOW=8192 RES=0x00 ACK PSH URGP=0
Jun  2 10:47:41 MYHOST kernel: lo_INVALID_RULE - Accept IN=lo OUT=
MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC=127.0.0.1 DST=127.0.0.1
LEN=74 TOS=0x00 PREC=0x00 TTL=64 ID=47026 DF PROTO=TCP SPT=52640 DPT=22
WINDOW=8192 RES=0x00 ACK PSH URGP=0

Here is the test-lo4INVALID.fw script:
----------------------------------------------------------------------------
----------------------------------------------------------------------------
#!/bin/sh
#
PATH="/sbin:/usr/sbin:/bin:/usr/bin:${PATH}"
export PATH
IPTABLES="/sbin/iptables"

# Default policy
$IPTABLES -P OUTPUT  DROP
$IPTABLES -P INPUT   DROP
$IPTABLES -P FORWARD DROP

# Flush tables
cat /proc/net/ip_tables_names | while read table; do
  test "X$table" = "Xmangle" && continue
  $IPTABLES -t $table -L -n | while read c chain rest; do
      if test "X$c" = "XChain" ; then
        $IPTABLES -t $table -F $chain
      fi
  done
  $IPTABLES -t $table -X
done

# Accept related, established
$IPTABLES -A INPUT   -m state --state ESTABLISHED,RELATED -j ACCEPT
$IPTABLES -A OUTPUT  -m state --state ESTABLISHED,RELATED -j ACCEPT
$IPTABLES -A FORWARD -m state --state ESTABLISHED,RELATED -j ACCEPT

echo "Rule 0 (lo)"
# Added to confirm the bug
$IPTABLES -N lo_INVALID_RULE
$IPTABLES -A INPUT  -i lo  -m state --state INVALID  -j lo_INVALID_RULE
$IPTABLES -A OUTPUT  -o lo  -m state --state INVALID  -j lo_INVALID_RULE
$IPTABLES -A lo_INVALID_RULE -j LOG  --log-level info --log-prefix
"lo_INVALID_RULE - Accept "
$IPTABLES -A lo_INVALID_RULE -j ACCEPT

# Standard loopback rule
$IPTABLES -N lo_NEW_RULE
$IPTABLES -A INPUT  -i lo  -m state --state NEW  -j lo_NEW_RULE
$IPTABLES -A OUTPUT  -o lo  -m state --state NEW  -j lo_NEW_RULE
$IPTABLES -A lo_NEW_RULE -j LOG  --log-level info --log-prefix "lo_NEW_RULE
- Accept "
$IPTABLES -A lo_NEW_RULE -j ACCEPT

# Simple test policy
$IPTABLES -A OUTPUT  -s 160.228.120.129  -m state --state NEW  -j ACCEPT
$IPTABLES -A INPUT  -d 160.228.120.129  -p tcp -m tcp  --dport 22 -m state
--state NEW  -j ACCEPT
----------------------------------------------------------------------------
----------------------------------------------------------------------------

Comment 11 Dave Jones 2005-06-03 17:53:32 UTC
This fix will appear in the 2.6.11-1.33_FC3 (and higher versions) which will
appear at http://people.redhat.com/davej/kernels/Fedora/FC3 in an hour or so.


Comment 12 Dave Jones 2005-06-04 05:48:06 UTC
*** Bug 159388 has been marked as a duplicate of this bug. ***

Comment 13 Dave Jones 2005-07-15 19:47:29 UTC
An update has been released for Fedora Core 3 (kernel-2.6.12-1.1372_FC3) which
may contain a fix for your problem.   Please update to this new kernel, and
report whether or not it fixes your problem.

If you have updated to Fedora Core 4 since this bug was opened, and the problem
still occurs with the latest updates for that release, please change the version
field of this bug to 'fc4'.

Thank you.

Comment 14 Tim 2005-07-23 02:36:10 UTC
Sorry about the delay. I installed 2.6.12-1.1372_FC3 today (via up2date) and am
surprised to find that the problem remains unchanged. It is easily reproduced by
adding a rule to drop INVALID packets as the first rule of the INPUT chain:

Chain RH-Firewall-1-INPUT (2 references)
 pkts bytes target     prot opt in     out     source               destination
   36  3024 DROP       all  --  *      *       0.0.0.0/0            0.0.0.0/0  
        state INVALID
   27  5640 ACCEPT     all  --  lo     *       0.0.0.0/0            0.0.0.0/0

and fixed again by inverting the above order so that loopback traffic is passed
through regardless before the INVALID test:

Chain RH-Firewall-1-INPUT (2 references)
 pkts bytes target     prot opt in     out     source               destination
   37  6196 ACCEPT     all  --  lo     *       0.0.0.0/0            0.0.0.0/0
    0     0 DROP       all  --  *      *       0.0.0.0/0            0.0.0.0/0  
        state INVALID

I also noted the same long delay during boot when starting NFS statd, and the
trouble with Gnome.

Tim


Comment 15 Dave Jones 2005-08-04 01:48:29 UTC
Then this is a different bug to the one fixed by the patch above (which is
included in the latest update).


instead of making it DROP them, can you do something like..

iptables -A INPUT -m state --state INVALID -j LOG --log-prefix "in dropped: "

and see what's getting logged in dmesg output?


Comment 16 Tim 2005-08-05 04:05:13 UTC
These are log entries that appeared during the boot process:

Would Drop Invalid: IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00
SRC=127.0.0.1 DST=127.0.0.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=UDP
SPT=32768 DPT=111 LEN=64
Would Drop Invalid: IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00
SRC=127.0.0.1 DST=127.0.0.1 LEN=56 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=UDP
SPT=111 DPT=32768 LEN=36
Would Drop Invalid: IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00
SRC=127.0.0.1 DST=127.0.0.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=UDP
SPT=32768 DPT=111 LEN=64
Would Drop Invalid: IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00
SRC=127.0.0.1 DST=127.0.0.1 LEN=56 TOS=0x00 PREC=0x00 TTL=64 ID=1 DF PROTO=UDP
SPT=111 DPT=32768 LEN=36
Would Drop Invalid: IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00
SRC=127.0.0.1 DST=127.0.0.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=UDP
SPT=32769 DPT=111 LEN=64
Would Drop Invalid: IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00
SRC=127.0.0.1 DST=127.0.0.1 LEN=56 TOS=0x00 PREC=0x00 TTL=64 ID=2 DF PROTO=UDP
SPT=111 DPT=32769 LEN=36
Would Drop Invalid: IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00
SRC=127.0.0.1 DST=127.0.0.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=UDP
SPT=32769 DPT=111 LEN=64
Would Drop Invalid: IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00
SRC=127.0.0.1 DST=127.0.0.1 LEN=56 TOS=0x00 PREC=0x00 TTL=64 ID=3 DF PROTO=UDP
SPT=111 DPT=32769 LEN=36

Plus one that occurred sometime after that:

Would Drop Invalid: IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00
SRC=127.0.0.1 DST=127.0.0.1 LEN=40 TOS=0x00 PREC=0x00 TTL=64 ID=2 DF PROTO=TCP
SPT=5680 DPT=42088 WINDOW=0 RES=0x00 ACK RST URGP=0

Port 111 is the RPC portmapper, while 5680 is Canna, for Japanese input.

Let me know if there is any additional data you'd like!

Tim

Comment 17 Dave Jones 2005-08-06 04:00:41 UTC
Does adding ...
iptables -A INPUT -i lo -s 127.0.0.1 -d 127.0.0.1 -j ACCEPT

before the DROP rule fix things ?


Comment 18 Tim 2005-08-10 01:38:01 UTC
Hello!

And yes, as I guess you'd expect, that does work around the problem. What I
don't get is why the loopback packets are being marked invalid in the first
place. Odd.

Tim

Comment 19 Dave Jones 2005-08-10 06:21:39 UTC
davem, should we be dropping those packets ? My first instinct was that this was
because they're udp packets, and we have no 'state' (invalid or otherwise) to
examine. But they should be just passed through rather than flagged invalid surely ?

Comment 20 David Miller 2005-08-10 18:20:27 UTC
I suspect the current 2.6.12.x stable kernel has a fix for this.
There was a recent change in how we handle loopback packets
in netfilter that had to be refined some more.

This discussion really belongs on netfilter-devel or similar,
as I am far from an expert in this area.


Comment 21 Tim 2005-08-11 03:26:27 UTC
Yes, I guess now that the root problem is identified it would be better to bring
it up there (or find that it's already been solved). I'm not sure what you want
to do with the ticket. Leave it as-is until the next kernel update, or I could
just open a new one with a more accurate title if problem persists?

Tim

Comment 22 Tim 2005-09-23 00:21:14 UTC
FWIW, the new kernel, 2.6.12-1.1378_FC3, appears to have finally fixed this bug.

Tim