Bug 435978 - Rescue mode networking fails to start
Rescue mode networking fails to start
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: anaconda (Show other bugs)
5.2
All Linux
low Severity low
: rc
: ---
Assigned To: David Cantrell
Alexander Todorov
: Regression
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-03-04 13:55 EST by James Laska
Modified: 2013-09-02 02:24 EDT (History)
2 users (show)

See Also:
Fixed In Version: RHBA-2008-0397
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-05-21 11:33:22 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
gdb backtrace from isys.pumpNetDevice('eth0', None) (1.78 KB, text/plain)
2008-03-04 13:55 EST, James Laska
no flags Details
0001-Make-sure-DHCP-works-in-rescue-mode-435978.patch (4.45 KB, patch)
2008-03-19 19:48 EDT, David Cantrell
no flags Details | Diff
backtrace-dhcptest.log (5.77 KB, text/plain)
2008-04-07 19:33 EDT, David Cantrell
no flags Details
dhcptest.c (3.43 KB, text/plain)
2008-04-07 19:33 EDT, David Cantrell
no flags Details
backtrace-dhcptest2.log (12.85 KB, text/plain)
2008-04-07 20:54 EDT, David Cantrell
no flags Details
dhcptest2.c (2.18 KB, text/plain)
2008-04-07 20:55 EDT, David Cantrell
no flags Details
0001-Make-isys.dhcpNetDevice-work-in-rescue-mode-4359.patch (9.40 KB, patch)
2008-04-08 06:26 EDT, David Cantrell
no flags Details | Diff
dhcp-3.0.5-libdhcp4client.patch (36.80 KB, patch)
2008-04-08 06:28 EDT, David Cantrell
no flags Details | Diff

  None (edit)
Description James Laska 2008-03-04 13:55:49 EST
Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. linux rescue
2.  When prompted to setup networking, say Yes
                    ┌────────┤ Setup Networking ├─────────┐                     
                    │                                     │                     
                    │ Do you want to start the network    │                     
                    │ interfaces on this system?          │                     
                    │                                     │                     
                    │      ┌─────┐           ┌────┐       │                     
                    │      │ Yes │           │ No │       │                     
                    │      └─────┘           └────┘       │                     
                    │                                     │                     
                    │                                     │                     
                    └─────────────────────────────────────┘                     

3. When prompted to start eth0, say yes

                   ┌────┤ Configure Network Interface ├─────┐                   
                   │                                        │                   
                   │ Would you like to configure the eth0   │                   
                   │ network interface in your system?      │                   
                   │                                        │                   
                   │       ┌─────┐            ┌────┐        │                   
                   │       │ Yes │            │ No │        │                   
                   │       └─────┘            └────┘        │                   
                   │                                        │                   
                   │                                        │                   
                   └────────────────────────────────────────┘                   
4. configure device parameters

                      ┌┤ Network Configuration for eth0 ├┐                      
                      │                                  │                      
                      │     IBM Virtual Ethernet         │                      
                      │     52:0E:30:00:30:02            │                      
                      │                                  │                      
                      │     [*] Enable IPv4 support      │                      
                      │     [ ] Enable IPv6 support      │                      
                      │                                  │                      
                      │       ┌────┐    ┌──────┐         │                      
                      │       │ OK │    │ Back │         │                      
                      │       └────┘    └──────┘         │                      
                      │                                  │                      
                      │                                  │                      
                      └──────────────────────────────────┘                      
                  ┌─────┤ IPv4 Configuration for eth0 ├─────┐                   
                  │                                         │                   
                  │ IBM Virtual Ethernet                    │                   
                  │ 52:0E:30:00:30:02                       │                   
                  │                                         │                   
                  │ (*) Dynamic IP configuration (DHCP)     │                   
                  │ ( ) Manual address configuration        │                   
                  │                                         │                   
                  │     IP Address         Prefix (Netmask) │                   
                  │     ________________ / ________________ │                   
                  │                                         │                   
                  │       ┌────┐            ┌──────┐        │                   
                  │       │ OK │            │ Back │        │                   
                  │       └────┘            └──────┘        │                   
                  │                                         │                   
                  │                                         │                   
                  └─────────────────────────────────────────┘                   

Actual results:
the system sits at the following prompt forever:

                         ┌──┤ Starting Interface ├───┐                          
                         │                           │                          
                         │ Attempting to start eth0  │                          
                         │                           │                          
                         └───────────────────────────┘                          
 - tty2 shows that anaconda is maxing out the cpu

Expected results:
networking started

Additional info:
 - hitting this on i386, x86_64 and ppc so far ... so I don't believe this is
arch dependent

$ strace python -c "import  sys; sys.path.append('/usr/lib/anaconda'); import
isys; print isys.pumpNetDevice('eth0', None);"

socket(PF_PACKET, SOCK_RAW, 3)          = 13
ioctl(13, SIOCGIFINDEX, {ifr_name="eth0", ifr_index=3}) = 0
bind(13, {sa_family=AF_PACKET, proto=0000, if3, pkttype=PACKET_HOST, addr(0)={0,
}, 20) = 0
setsockopt(13, SOL_PACKET, 0x8 /* PACKET_??? */, [1], 4) = 0
setsockopt(13, SOL_SOCKET, SO_ATTACH_FILTER, "\0\v\0\0\17z`\204", 8) = 0
fcntl64(13, F_SETFD, FD_CLOEXEC)        = 0
close(12)                               = 0
socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP) = 12
setsockopt(12, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
bind(12, {sa_family=AF_INET, sin_port=htons(68), sin_addr=inet_addr("0.0.0.0")},
16) = 0
fcntl64(12, F_SETFD, FD_CLOEXEC)        = 0
time(NULL)                              = 1204653121
write(13,
"\377\377\377\377\377\377R\0160\0000\2\10\0E\20\1H\0\0\0\0\20\21\251\226\0\0\0\0\377\377"...,
342) = 342
select(14, [12 13], [], [], {6, 288597}) = 1 (in [13], left {6, 276000})
recvmsg(13, {msg_name(0)=NULL, msg_iov(1)=[{"R\0160\0000\2R\0160\0
\2\10\0E\20\1H\0\0\0\0\20\21\344\337\300\250!\262\300\250"..., 1536}],
msg_controllen=28, {cmsg_len=28, cmsg_level=SOL_PACKET, cmsg_type=, ...},
msg_flags=0}, 0) = 342
time(NULL)                              = 1204653121
write(13,
"\377\377\377\377\377\377R\0160\0000\2\10\0E\20\1H\0\0\0\0\20\21\251\226\0\0\0\0\377\377"...,
342) = 342
select(14, [12 13], [], [], {7, 274642}) = 1 (in [13], left {7, 254000})
recvmsg(13, {msg_name(0)=NULL, msg_iov(1)=[{"R\0160\0000\2R\0160\0
\2\10\0E\20\1H\0\0\0\0\20\21\344\337\300\250!\262\300\250"..., 1536}],
msg_controllen=28, {cmsg_len=28, cmsg_level=SOL_PACKET, cmsg_type=, ...},
msg_flags=0}, 0) = 342
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV +++
Comment 1 James Laska 2008-03-04 13:55:49 EST
Created attachment 296779 [details]
gdb backtrace from isys.pumpNetDevice('eth0', None)
Comment 2 James Laska 2008-03-04 14:03:14 EST
Note: this appears to only happen when starting rescue mode from a local
installation source (DVD or CDROM).  A remote installation source appears to
setup networking correctly when it pulls down stage2.img.
Comment 3 Jeremy Katz 2008-03-04 14:15:43 EST
segv is in libdhcp
Comment 4 James Laska 2008-03-05 07:42:11 EST
Tested in RHEL-5-Server/U1 and it proceeds with enabling networking on the same
system U2 fails with.  Adding Regression keyword.

$ python -c "import  sys; sys.path.append('/usr/lib/anaconda'); import
isys; print isys.pumpNetDevice('eth0', None);"
192.168.32.2

Let me know if there is any additional information needed on the failing systems


Comment 6 David Cantrell 2008-03-19 19:47:59 EDT
I am able to reproduce this locally on RHEL5.2-Server-20080319.nightly on at least i386.

This is similar to bug #432011, which was handled by working around limitations in libdhcp since that 
library is very fragile and essentially broken.  Working on it any more seems like a waste of time.  
Changing the component of this bug to anaconda and attaching my patch.  Also giving it a devel-ack.
Comment 7 David Cantrell 2008-03-19 19:48:49 EDT
Created attachment 298612 [details]
0001-Make-sure-DHCP-works-in-rescue-mode-435978.patch
Comment 8 David Cantrell 2008-03-19 19:50:12 EDT
There's the patch.  If I had to take a guess as to which platform this might cause a regression on for 5.2, it 
would be s390x.  That platform is nothing but problems.

jlaska,
If you want to give it a qa-ack, I'll seek out the other magic flags (unless you do) to get it in to 5.2.
Comment 11 David Cantrell 2008-03-20 15:05:23 EDT
Committed, fix will be in anaconda-11.1.2.109-1.
Comment 14 David Cantrell 2008-04-03 23:42:35 EDT
Working on tracking this one down still.  The same code is run via rescue mode as is run during stage1 
when performing a network install, so something odd is happening with what is passed to libdhcp during 
rescue mode.

More info to follow.
Comment 15 David Cantrell 2008-04-07 19:31:06 EDT
Have written a local test program that simulates the doDhcpNetDevice() function from isys.c in 
anaconda.  It's far easier to test this way than to keep recreating test ISOs to test rescue mode.

Basically, all that rescue mode does to bring up networking is (1) ask you the device to use and (2) call 
isys.dhcpNetDevice() passing the device name (e.g., 'eth0').

The isys.dhcpNetDevice() function (and pumpNetDevice, for that matter) are passthrough Python 
functions in isys.py that call _isys.dhcpNetDevice.  The function in _isys is the C function that calls 
libdhcp, which is what I have reimplemented locally.

The SIGSEGV occurs after we are bound to an IPv4 address.  It happens during option parsing.  Still 
working on a solution, but I attaching my test program and current findings to this bug.
Comment 16 David Cantrell 2008-04-07 19:33:23 EDT
Created attachment 301575 [details]
backtrace-dhcptest.log
Comment 17 David Cantrell 2008-04-07 19:33:53 EDT
Created attachment 301576 [details]
dhcptest.c
Comment 18 David Cantrell 2008-04-07 20:52:36 EDT
Further testing with a version of dhcptest.c that execs the DHCP client in the same process as dhcptest.  
Same crash and same cause (attaching new source and backtrace of that test).
Comment 19 David Cantrell 2008-04-07 20:54:29 EDT
Created attachment 301584 [details]
backtrace-dhcptest2.log
Comment 20 David Cantrell 2008-04-07 20:55:16 EDT
Created attachment 301585 [details]
dhcptest2.c
Comment 21 David Cantrell 2008-04-07 21:33:26 EDT
Numerous problems exist with this bug:

1) The fix for rhbz#216158 prevents the shared library for libdhcp4client from functioning correctly 
with the shared _isys.so module.  This explains why the same code for dhcp works in loader (static 
linked) and fails from _isys (dynamic linked).  This is a major problem with libdhcp and is just 
something we have to live with for now.

2) Problem #1 can be worked around by linking _isys.so with the .a versions of the dhcp libraries.  Not 
great, but it does the trick.

3) The shared libdhcp6client library is hopelessly broken at this point.  The static one fails with _isys.so 
still and the problem is that it's not honoring various LIBDHCP flags.  I've patched dhcpv6 for that and 
it's working now.

So what is the fix?  There are two things I need to do:

1) Patch anaconda to link _isys.so with the .a versions of the dhcp libraries.
2) Patch dhcpv6 to correctly honor the LIBDHCP flags so it doesn't SIGSEGV.

Patches coming next.
Comment 22 David Cantrell 2008-04-08 06:26:09 EDT
Created attachment 301618 [details]
0001-Make-isys.dhcpNetDevice-work-in-rescue-mode-4359.patch

Fix for anaconda to get isys.dhcpNetDevice() working in rescue mode.  Requires
a dhcp package with the following patch incorporated.

Note, this only gets DHCP working in rescue mode as DHCPv6 still crashes for
some reason.  Including dhclient and dhcp6c in the rescue image for users to
run manually as a temporary solution.
Comment 23 David Cantrell 2008-04-08 06:28:46 EDT
Created attachment 301619 [details]
dhcp-3.0.5-libdhcp4client.patch

Updated libdhcp4client visibility patch.
Comment 24 David Cantrell 2008-04-08 06:31:43 EDT
Went ahead and built a new dhcp package with the fix in comment #23, new package will be dhcp-3.0.5-
13.el5.

Changed committed to anaconda, so the next anaconda build will this fix in it.
Comment 27 David Cantrell 2008-04-11 17:42:12 EDT
(In reply to comment #26)
> David which tree are supposed the fixed to be in? 
> I have this failing with snap #5 which has:
> dhcp-3.0.5-13.el5
> anaconda-11.1.2.112-1
> 
> Fails if booting from CDROM and one or both of IPv4/IPv6 is enabled and DHCP is
> used. If this FAILS_QA or I have to wait for another tree?
> 
> Thanks.

Those are the right versions.  I think I also need to do a libdhcp rebuild.  Let me do that.
Comment 28 David Cantrell 2008-04-11 17:47:00 EDT
I've rebuilt libdhcp and am updating the erratum for that.  We'll also need to rebuild anaconda.  So what 
you'll want in the tree is:

dhcp-3.0.5-13.el5
anaconda-11.1.2.113-1
libdhcp-1.20-5.el5
Comment 33 errata-xmlrpc 2008-05-21 11:33:22 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0397.html

Note You need to log in before you can comment on or make changes to this bug.