103516 – ONBOOT=yes cipe channel halts boot if eth0 start fails

Bug 103516 - ONBOOT=yes cipe channel halts boot if eth0 start fails

Summary: ONBOOT=yes cipe channel halts boot if eth0 start fails

Keywords:
Status:	CLOSED DUPLICATE of bug 107995
Alias:	None
Product:	Red Hat Enterprise Linux 3
Classification:	Red Hat
Component:	cipe
Sub Component:
Version:	3.0
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Nalin Dahyabhai
QA Contact:	David Lawrence
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2003-09-01 16:38 UTC by Alexandre Oliva
Modified:	2007-11-30 22:06 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2006-02-21 18:58:22 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Alexandre Oliva 2003-09-01 16:38:17 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030703

Description of problem:
If a cipe interface is configured to come up at boot time, but the network
interface that provides the default route fails to be brought up on boot (e.g.,
unplugged laptop or failed ppp0 authentication with ISP), the boot will fail, in
that bringing up the cipe interface will never complete, and the only way out of
this is a reboot, either after connecting the computer to a network, or in
single-user mode, disabling the ONBOOT configuration of the cipe channels, then
letting it complete the boot.

Ideally, bringing up the cipe channel should time out just like other network
interfaces.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.Set up a cipe channel configured to come up at boot time
2.Unplug the host from the network
3.Reboot it


Actual Results:  It will fail to bring eth0 up, and then bringing up of the cipe
channel will never complete

Expected Results:  It should time out in a few seconds

Additional info:

Comment 1 Alexandre Oliva 2003-09-03 23:10:49 UTC

FWIW, the cipe options file on all machines I've seen affected by this problem
look like:
cttl 64
key (omitted)
maxerr -1
# mtu 1250
dynip yes
ping 60

I suppose dynip could be related with it.  I'll look into removing it to see
what happens.

Comment 2 Bill Nottingham 2003-09-03 23:12:19 UTC

It's probably just trying to talk to the cipe server on the other end, without
ever timing out. If you put a strace in ifup-cipcb where it runs ciped; is that
where it sticks?

Comment 3 Alexandre Oliva 2003-09-04 00:52:12 UTC

That's the tricky part.  strace makes the problem go away!

It seems to be a race condition in ciped.  It clone()s, changes some signal
masks, then pause()s.  Within strace, the cloned process gets control only after
pause(), but I guess without strace, it gets control first, and exit()s before
the parent blocks SIG_CHLD to pause(), so it handles the signal, then blocks it
and pause()s forever.

Here's some supporting evidence:

[pid  4475] clone(Process 4477 attached
child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0xbf518778) = 4477
[pid  4475] rt_sigaction(SIGUSR1, {0x804a770, [HUP INT TERM CHLD], SA_RESTORER,
0x9a1d38}, NULL, 8) = 0
[pid  4475] rt_sigaction(SIGCHLD, {0x804a770, [HUP INT TERM CHLD], SA_RESTORER,
0x9a1d38}, NULL, 8) = 0
[pid  4475] pause( <unfinished ...>
[...]
[pid  4477] send(3, "<27>Sep  3 21:41:21 ciped-cb[447"..., 73, 0) = 73
[pid  4477] rt_sigaction(SIGPIPE, {0x804a770, [HUP INT TERM CHLD], SA_RESTORER,
0x9a1d38}, NULL, 8) = 0
[pid  4477] close(1)                    = 0
[pid  4477] exit_group(1)               = ?
Process 4477 detached
[pid  4475] <... pause resumed> )       = ? ERESTARTNOHAND (To be restarted)
[pid  4475] --- SIGCHLD (Child exited) @ 0 (0) ---

The send() is logging a `Network unreachable' message to syslog.

Comment 4 Bill Nottingham 2003-09-04 00:54:45 UTC

OK, sounds like a problem within ciped.

Comment 6 Alexandre Oliva 2004-10-05 16:37:59 UTC


*** This bug has been marked as a duplicate of 107995 ***

Comment 7 Red Hat Bugzilla 2006-02-21 18:58:22 UTC

Changed to 'CLOSED' state since 'RESOLVED' has been deprecated.

Note You need to log in before you can comment on or make changes to this bug.