Bug 103516 - ONBOOT=yes cipe channel halts boot if eth0 start fails
ONBOOT=yes cipe channel halts boot if eth0 start fails
Status: CLOSED DUPLICATE of bug 107995
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: cipe (Show other bugs)
3.0
i386 Linux
medium Severity medium
: ---
: ---
Assigned To: Nalin Dahyabhai
David Lawrence
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2003-09-01 12:38 EDT by Alexandre Oliva
Modified: 2007-11-30 17:06 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-02-21 13:58:22 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Alexandre Oliva 2003-09-01 12:38:17 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030703

Description of problem:
If a cipe interface is configured to come up at boot time, but the network
interface that provides the default route fails to be brought up on boot (e.g.,
unplugged laptop or failed ppp0 authentication with ISP), the boot will fail, in
that bringing up the cipe interface will never complete, and the only way out of
this is a reboot, either after connecting the computer to a network, or in
single-user mode, disabling the ONBOOT configuration of the cipe channels, then
letting it complete the boot.

Ideally, bringing up the cipe channel should time out just like other network
interfaces.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.Set up a cipe channel configured to come up at boot time
2.Unplug the host from the network
3.Reboot it


Actual Results:  It will fail to bring eth0 up, and then bringing up of the cipe
channel will never complete

Expected Results:  It should time out in a few seconds

Additional info:
Comment 1 Alexandre Oliva 2003-09-03 19:10:49 EDT
FWIW, the cipe options file on all machines I've seen affected by this problem
look like:
cttl 64
key (omitted)
maxerr -1
# mtu 1250
dynip yes
ping 60

I suppose dynip could be related with it.  I'll look into removing it to see
what happens.
Comment 2 Bill Nottingham 2003-09-03 19:12:19 EDT
It's probably just trying to talk to the cipe server on the other end, without
ever timing out. If you put a strace in ifup-cipcb where it runs ciped; is that
where it sticks?
Comment 3 Alexandre Oliva 2003-09-03 20:52:12 EDT
That's the tricky part.  strace makes the problem go away!

It seems to be a race condition in ciped.  It clone()s, changes some signal
masks, then pause()s.  Within strace, the cloned process gets control only after
pause(), but I guess without strace, it gets control first, and exit()s before
the parent blocks SIG_CHLD to pause(), so it handles the signal, then blocks it
and pause()s forever.

Here's some supporting evidence:

[pid  4475] clone(Process 4477 attached
child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0xbf518778) = 4477
[pid  4475] rt_sigaction(SIGUSR1, {0x804a770, [HUP INT TERM CHLD], SA_RESTORER,
0x9a1d38}, NULL, 8) = 0
[pid  4475] rt_sigaction(SIGCHLD, {0x804a770, [HUP INT TERM CHLD], SA_RESTORER,
0x9a1d38}, NULL, 8) = 0
[pid  4475] pause( <unfinished ...>
[...]
[pid  4477] send(3, "<27>Sep  3 21:41:21 ciped-cb[447"..., 73, 0) = 73
[pid  4477] rt_sigaction(SIGPIPE, {0x804a770, [HUP INT TERM CHLD], SA_RESTORER,
0x9a1d38}, NULL, 8) = 0
[pid  4477] close(1)                    = 0
[pid  4477] exit_group(1)               = ?
Process 4477 detached
[pid  4475] <... pause resumed> )       = ? ERESTARTNOHAND (To be restarted)
[pid  4475] --- SIGCHLD (Child exited) @ 0 (0) ---

The send() is logging a `Network unreachable' message to syslog.
Comment 4 Bill Nottingham 2003-09-03 20:54:45 EDT
OK, sounds like a problem within ciped.
Comment 6 Alexandre Oliva 2004-10-05 12:37:59 EDT

*** This bug has been marked as a duplicate of 107995 ***
Comment 7 Red Hat Bugzilla 2006-02-21 13:58:22 EST
Changed to 'CLOSED' state since 'RESOLVED' has been deprecated.

Note You need to log in before you can comment on or make changes to this bug.