158637 – boot hang with 2.6.11-1.27_FC3 & DHCP (& ypbind)

Bug 158637 - boot hang with 2.6.11-1.27_FC3 & DHCP (& ypbind)

Summary: boot hang with 2.6.11-1.27_FC3 & DHCP (& ypbind)

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	3
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Dave Jones
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2005-05-24 14:26 UTC by Todd Allen
Modified:	2015-01-04 22:19 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2005-08-01 23:46:39 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Todd Allen 2005-05-24 14:26:06 UTC

Description of problem:

I installed the kernel-2.6.11-1.27_FC3 rpm today, and ran into trouble.  The
boot hung while trying to bring up the eth0 network device.  The device
actually came up fine, as I could ping the machine from elsewhere, but it
hung doing some of the post-processing after bringing up the device.
 
The important thing is that this device was configured to use DHCP.
 
I was unable to reproduce the problem by running the individual
/etc/rc.d/init.d/* scripts in single-user mode, oddly enough.  So I had to
debug it with attempted reboots to level 5 several dozen times, and "bash -x"
and "exec > /dev/console 2>&1" in the relevant boot scripts.  Here are the
events leading up to the hang as I tracked it down:
 
   /etc/rc.d/init.d/network start   which runs:
   ifup eth0                        which runs:
   dhclient                         which runs:
   /sbin/dhclient-script            which runs:
   ypbind start                     which runs:
   rpcinfo -p | fgrep -q ypbind
 
Actually, the "ypbind start" tries the rpcinfo command 20 times and then
gives up.
 
With the old 2.6.11-1.14_FC3, the rpcinfo command fails pretty quickly with:
   rpcinfo: can't contact portmapper: RPC: Remote system error - Connection refu
Presumably, networking isn't sufficiently "up" for this to work yet, and the
fact that it fails like this is crucial to the boot process.
 
But with the new 2.6.11.-1.27_FC3, the rpcinfo just hangs.  And so the whole
boot process hangs.

Version-Release number of selected component (if applicable): 2.6.11-1.27_FC3


How reproducible: Always


Steps to Reproduce:
1. Boot 2.6.11-1.27_FC3 kernel on a system with DHCP in an environment that has
a yp server.
2.
3.
  
Actual results:
Hang.

Expected results:
Not hang.  :-)

Additional info:

Comment 1 Todd Allen 2005-06-26 22:30:29 UTC

This problem persists in 2.6.11-1.35_FC3.

I experimented some more.  I'm getting a different hang on a different, desktop
machine that doesn't use DHCP.  It seems to be related to iptables.  I'm getting
unexpected errors logged from iptables on the desktop system.  If I remove
/etc/sysconfig/iptables, the hang goes away.  Of course, that isn't an
acceptable solution, but at least it's a clue about what's busted.  The content
of the desktop system's /etc/sysconfig/iptables isn't sensitive, so I'll include
it here:

# Generated by iptables-save v1.2.11 on Sun Jun 26 16:23:21 2005
*filter
:INPUT ACCEPT [66:4388]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [2618:555042]
:illicit - [0:0]
:trusted - [0:0]
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT 
-A INPUT -i lo+ -m state --state NEW -j ACCEPT 
-A INPUT -i eth+ -m state --state NEW -j ACCEPT 
-A INPUT -i tun+ -m state --state NEW -j ACCEPT 
-A INPUT -j illicit 
-A FORWARD -m state --state RELATED,ESTABLISHED -j ACCEPT 
-A FORWARD -i eth+ -o tun+ -m state --state NEW -j ACCEPT 
-A FORWARD -i tun+ -o eth+ -m state --state NEW -j ACCEPT 
-A FORWARD -i eth+ -o eth+ -m state --state NEW -j ACCEPT 
-A FORWARD -j illicit 
-A illicit -i ppp0 -m limit --limit 1/min -j LOG --log-prefix "iptables: in: " 
-A illicit -i eth0 -m limit --limit 1/min -j LOG --log-prefix "iptables: in: " 
-A illicit -i tun0 -m limit --limit 1/min -j LOG --log-prefix "iptables: in: " 
-A illicit -m limit --limit 1/min -j LOG --log-prefix "iptables: in: " 
COMMIT
# Completed on Sun Jun 26 16:23:21 2005

I removed that file, booted the machine successfully, then reinstated those
iptables rules, and then tried to restart ypbind by hand with
/etc/rc.d/init.d/ypbind restart.  When I did that, I got errors like the following:

iptables: in: IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00
SRC=127.0.0.1 DST=127.0.0.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=UDP
SPT=781 DPT=111 LEN=64 
iptables: in: IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00
SRC=127.0.0.1 DST=127.0.0.1 LEN=56 TOS=0x00 PREC=0x00 TTL=64 ID=67 DF PROTO=UDP
SPT=111 DPT=781 LEN=36 
iptables: in: IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00
SRC=127.0.0.1 DST=127.0.0.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=UDP
SPT=782 DPT=111 LEN=64 
iptables: in: IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00
SRC=127.0.0.1 DST=127.0.0.1 LEN=56 TOS=0x00 PREC=0x00 TTL=64 ID=68 DF PROTO=UDP
SPT=111 DPT=782 LEN=36 
iptables: in: IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00
SRC=127.0.0.1 DST=127.0.0.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=UDP
SPT=848 DPT=111 LEN=64 
iptables: in: IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00
SRC=127.0.0.1 DST=127.0.0.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=UDP
SPT=32779 DPT=111 LEN=64 

Now why should that be?  The lo+ rule should render those packets acceptable.
As an experiment, I tried adding another rule identical to the lo+ rule but
without the + sign.  That is:

-A INPUT -i lo -m state --state NEW -j ACCEPT 

It gets the same results.

So, my best guess now is that it's a problem with iptables, where it still is
refusing the loopback packets.  Because the packets are dropped, that probably
is why rpcinfo just hangs instead of returning an error.  :(

But I don't know why iptables is misbehaving.

I guess I'll stick with kernel-2.6.11-1.14_FC3 for the time being.  And I'll
have to postpone upgrading to Fedora Core 4, because presumably this bug is
there, too.

Is anybody out there paying attention?  I reported this a month ago and haven't
heard a peep.

Comment 2 Dave Jones 2005-07-15 19:25:08 UTC

An update has been released for Fedora Core 3 (kernel-2.6.12-1.1372_FC3) which
may contain a fix for your problem.   Please update to this new kernel, and
report whether or not it fixes your problem.

If you have updated to Fedora Core 4 since this bug was opened, and the problem
still occurs with the latest updates for that release, please change the version
field of this bug to 'fc4'.

Thank you.

Comment 3 Todd Allen 2005-08-01 17:38:59 UTC

This is fixed in 2.6.12-1.1372_FC3.

Comment 4 Todd Allen 2005-08-01 19:26:04 UTC

I take it back.  It's true that the kernel no longer hangs on boot.  But there
still is something wrong with iptables.  With the iptables configuration that I
mentioned earlier, it still is complaining about packets like these (with IP
addresses changed to x.y.z.n), for no good reason that I can see:

kernel: iptables: in: IN=eth0 OUT= MAC= SRC=x.y.z.6 DST=x.y.z.255 LEN=112
TOS=0x00 PREC=0x00 TTL=64 ID=3 DF PROTO=UDP SPT=513 DPT=513 LEN=92
kernel: iptables: in: IN=eth0 OUT= MAC= SRC=x.y.z.6 DST=x.y.z.255 LEN=269
TOS=0x00 PREC=0x00 TTL=64 ID=14 DF PROTO=UDP SPT=138 DPT=138 LEN=249

And on this other machine, with the following rules:

# Generated by iptables-save v1.2.11 on Sun Jun 26 16:19:38 2005
*filter
:INPUT ACCEPT [96:7488]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [90:9795]
:illicit - [0:0]
:trusted - [0:0]
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT 
-A INPUT -i lo -m state --state NEW -j ACCEPT 
-A INPUT -i lo+ -m state --state NEW -j ACCEPT 
-A INPUT -s w.v.0.0/255.255.0.0 -i eth+ -m state --state NEW -j ACCEPT 
-A INPUT -s x.y.0.0/255.255.0.0 -i eth+ -m state --state NEW -j ACCEPT 
-A INPUT -i tun+ -m state --state NEW -j ACCEPT 
-A INPUT -j illicit 
-A FORWARD -m state --state RELATED,ESTABLISHED -j ACCEPT 
-A FORWARD -i eth+ -o tun+ -m state --state NEW -j ACCEPT 
-A FORWARD -i tun+ -o eth+ -m state --state NEW -j ACCEPT 
-A FORWARD -i eth+ -o eth+ -m state --state NEW -j ACCEPT 
-A FORWARD -j illicit 
-A illicit -i ppp0 -m limit --limit 1/min -j LOG --log-prefix "iptables: in: " 
-A illicit -i eth0 -m limit --limit 1/min -j LOG --log-prefix "iptables: in: " 
-A illicit -i tun0 -m limit --limit 1/min -j LOG --log-prefix "iptables: in: " 
-A illicit -m limit --limit 1/min -j LOG --log-prefix "iptables: in: " 
COMMIT
# Completed on Sun Jun 26 16:19:38 2005

... it's complaining about the same packets as above, but also:

kernel: iptables: in: IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00
SRC=127.0.0.1 DST=127.0.0.1 LEN=58 TOS=0x00 PREC=0x00 TTL=64 ID=57515 DF
PROTO=UDP SPT=32789 DPT=53 LEN=38

It's just that these problems no longer hold up the boot.

Comment 5 Dave Jones 2005-08-01 23:46:39 UTC

you're logging everything which doesn't have state.  UDP is stateless.

you want something like iptables -A INPUT -i $interface -p udp -j DROP (or
ACCEPT or whatever) to make those go away.

Comment 6 Todd Allen 2005-08-18 20:59:11 UTC

Although UDP is stateless, the iptables state module still deals with UDP
packets.  Right now it's considering those UDP packets to have state INVALID. 
I'm pretty sure that's a change in behavior, and I consider it a surprising one.
 Honestly, I don't know if they used to be considered NEW or ESTABLISHED, but
either one seems preferable to INVALID.

Also, I'm noticing that, when attempting to connect to localhost on a port with
no listening daemon (e.g. telnet 9107 or somesuch), the reset response that
comes back is considered to be INVALID by the state module, too.  I think that's
odd, too.

But if those are intentional changes, I'll change my iptables around to cope.

Comment 7 Dave Jones 2005-08-19 21:55:28 UTC

hmm, interesting.
I suggest bringing this up upstream on the netfilter list if it isn't already
fixed in the latest upstream kernels. (I'll take a look over the 2.6.12 ->
2.6.13rc changes later, but if theres some fix for this, this should have been
backported to the 2.6.12.y kernels already).

Note You need to log in before you can comment on or make changes to this bug.