Bug 435978 - Rescue mode networking fails to start
Summary: Rescue mode networking fails to start
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: anaconda
Version: 5.2
Hardware: All
OS: Linux
low
low
Target Milestone: rc
: ---
Assignee: David Cantrell
QA Contact: Alexander Todorov
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-03-04 18:55 UTC by James Laska
Modified: 2013-09-02 06:24 UTC (History)
2 users (show)

Fixed In Version: RHBA-2008-0397
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-05-21 15:33:22 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
gdb backtrace from isys.pumpNetDevice('eth0', None) (1.78 KB, text/plain)
2008-03-04 18:55 UTC, James Laska
no flags Details
0001-Make-sure-DHCP-works-in-rescue-mode-435978.patch (4.45 KB, patch)
2008-03-19 23:48 UTC, David Cantrell
no flags Details | Diff
backtrace-dhcptest.log (5.77 KB, text/plain)
2008-04-07 23:33 UTC, David Cantrell
no flags Details
dhcptest.c (3.43 KB, text/plain)
2008-04-07 23:33 UTC, David Cantrell
no flags Details
backtrace-dhcptest2.log (12.85 KB, text/plain)
2008-04-08 00:54 UTC, David Cantrell
no flags Details
dhcptest2.c (2.18 KB, text/plain)
2008-04-08 00:55 UTC, David Cantrell
no flags Details
0001-Make-isys.dhcpNetDevice-work-in-rescue-mode-4359.patch (9.40 KB, patch)
2008-04-08 10:26 UTC, David Cantrell
no flags Details | Diff
dhcp-3.0.5-libdhcp4client.patch (36.80 KB, patch)
2008-04-08 10:28 UTC, David Cantrell
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2008:0397 0 normal SHIPPED_LIVE anaconda bug fix and enhancement update 2008-05-19 23:11:23 UTC

Description James Laska 2008-03-04 18:55:49 UTC
Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. linux rescue
2.  When prompted to setup networking, say Yes
                    ┌────────┤ Setup Networking ├─────────┐                     
                    │                                     │                     
                    │ Do you want to start the network    │                     
                    │ interfaces on this system?          │                     
                    │                                     │                     
                    │      ┌─────┐           ┌────┐       │                     
                    │      │ Yes │           │ No │       │                     
                    │      └─────┘           └────┘       │                     
                    │                                     │                     
                    │                                     │                     
                    └─────────────────────────────────────┘                     

3. When prompted to start eth0, say yes

                   ┌────┤ Configure Network Interface ├─────┐                   
                   │                                        │                   
                   │ Would you like to configure the eth0   │                   
                   │ network interface in your system?      │                   
                   │                                        │                   
                   │       ┌─────┐            ┌────┐        │                   
                   │       │ Yes │            │ No │        │                   
                   │       └─────┘            └────┘        │                   
                   │                                        │                   
                   │                                        │                   
                   └────────────────────────────────────────┘                   
4. configure device parameters

                      ┌┤ Network Configuration for eth0 ├┐                      
                      │                                  │                      
                      │     IBM Virtual Ethernet         │                      
                      │     52:0E:30:00:30:02            │                      
                      │                                  │                      
                      │     [*] Enable IPv4 support      │                      
                      │     [ ] Enable IPv6 support      │                      
                      │                                  │                      
                      │       ┌────┐    ┌──────┐         │                      
                      │       │ OK │    │ Back │         │                      
                      │       └────┘    └──────┘         │                      
                      │                                  │                      
                      │                                  │                      
                      └──────────────────────────────────┘                      
                  ┌─────┤ IPv4 Configuration for eth0 ├─────┐                   
                  │                                         │                   
                  │ IBM Virtual Ethernet                    │                   
                  │ 52:0E:30:00:30:02                       │                   
                  │                                         │                   
                  │ (*) Dynamic IP configuration (DHCP)     │                   
                  │ ( ) Manual address configuration        │                   
                  │                                         │                   
                  │     IP Address         Prefix (Netmask) │                   
                  │     ________________ / ________________ │                   
                  │                                         │                   
                  │       ┌────┐            ┌──────┐        │                   
                  │       │ OK │            │ Back │        │                   
                  │       └────┘            └──────┘        │                   
                  │                                         │                   
                  │                                         │                   
                  └─────────────────────────────────────────┘                   

Actual results:
the system sits at the following prompt forever:

                         ┌──┤ Starting Interface ├───┐                          
                         │                           │                          
                         │ Attempting to start eth0  │                          
                         │                           │                          
                         └───────────────────────────┘                          
 - tty2 shows that anaconda is maxing out the cpu

Expected results:
networking started

Additional info:
 - hitting this on i386, x86_64 and ppc so far ... so I don't believe this is
arch dependent

$ strace python -c "import  sys; sys.path.append('/usr/lib/anaconda'); import
isys; print isys.pumpNetDevice('eth0', None);"

socket(PF_PACKET, SOCK_RAW, 3)          = 13
ioctl(13, SIOCGIFINDEX, {ifr_name="eth0", ifr_index=3}) = 0
bind(13, {sa_family=AF_PACKET, proto=0000, if3, pkttype=PACKET_HOST, addr(0)={0,
}, 20) = 0
setsockopt(13, SOL_PACKET, 0x8 /* PACKET_??? */, [1], 4) = 0
setsockopt(13, SOL_SOCKET, SO_ATTACH_FILTER, "\0\v\0\0\17z`\204", 8) = 0
fcntl64(13, F_SETFD, FD_CLOEXEC)        = 0
close(12)                               = 0
socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP) = 12
setsockopt(12, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
bind(12, {sa_family=AF_INET, sin_port=htons(68), sin_addr=inet_addr("0.0.0.0")},
16) = 0
fcntl64(12, F_SETFD, FD_CLOEXEC)        = 0
time(NULL)                              = 1204653121
write(13,
"\377\377\377\377\377\377R\0160\0000\2\10\0E\20\1H\0\0\0\0\20\21\251\226\0\0\0\0\377\377"...,
342) = 342
select(14, [12 13], [], [], {6, 288597}) = 1 (in [13], left {6, 276000})
recvmsg(13, {msg_name(0)=NULL, msg_iov(1)=[{"R\0160\0000\2R\0160\0
\2\10\0E\20\1H\0\0\0\0\20\21\344\337\300\250!\262\300\250"..., 1536}],
msg_controllen=28, {cmsg_len=28, cmsg_level=SOL_PACKET, cmsg_type=, ...},
msg_flags=0}, 0) = 342
time(NULL)                              = 1204653121
write(13,
"\377\377\377\377\377\377R\0160\0000\2\10\0E\20\1H\0\0\0\0\20\21\251\226\0\0\0\0\377\377"...,
342) = 342
select(14, [12 13], [], [], {7, 274642}) = 1 (in [13], left {7, 254000})
recvmsg(13, {msg_name(0)=NULL, msg_iov(1)=[{"R\0160\0000\2R\0160\0
\2\10\0E\20\1H\0\0\0\0\20\21\344\337\300\250!\262\300\250"..., 1536}],
msg_controllen=28, {cmsg_len=28, cmsg_level=SOL_PACKET, cmsg_type=, ...},
msg_flags=0}, 0) = 342
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV +++

Comment 1 James Laska 2008-03-04 18:55:49 UTC
Created attachment 296779 [details]
gdb backtrace from isys.pumpNetDevice('eth0', None)

Comment 2 James Laska 2008-03-04 19:03:14 UTC
Note: this appears to only happen when starting rescue mode from a local
installation source (DVD or CDROM).  A remote installation source appears to
setup networking correctly when it pulls down stage2.img.

Comment 3 Jeremy Katz 2008-03-04 19:15:43 UTC
segv is in libdhcp

Comment 4 James Laska 2008-03-05 12:42:11 UTC
Tested in RHEL-5-Server/U1 and it proceeds with enabling networking on the same
system U2 fails with.  Adding Regression keyword.

$ python -c "import  sys; sys.path.append('/usr/lib/anaconda'); import
isys; print isys.pumpNetDevice('eth0', None);"
192.168.32.2

Let me know if there is any additional information needed on the failing systems




Comment 6 David Cantrell 2008-03-19 23:47:59 UTC
I am able to reproduce this locally on RHEL5.2-Server-20080319.nightly on at least i386.

This is similar to bug #432011, which was handled by working around limitations in libdhcp since that 
library is very fragile and essentially broken.  Working on it any more seems like a waste of time.  
Changing the component of this bug to anaconda and attaching my patch.  Also giving it a devel-ack.

Comment 7 David Cantrell 2008-03-19 23:48:49 UTC
Created attachment 298612 [details]
0001-Make-sure-DHCP-works-in-rescue-mode-435978.patch

Comment 8 David Cantrell 2008-03-19 23:50:12 UTC
There's the patch.  If I had to take a guess as to which platform this might cause a regression on for 5.2, it 
would be s390x.  That platform is nothing but problems.

jlaska,
If you want to give it a qa-ack, I'll seek out the other magic flags (unless you do) to get it in to 5.2.

Comment 11 David Cantrell 2008-03-20 19:05:23 UTC
Committed, fix will be in anaconda-11.1.2.109-1.

Comment 14 David Cantrell 2008-04-04 03:42:35 UTC
Working on tracking this one down still.  The same code is run via rescue mode as is run during stage1 
when performing a network install, so something odd is happening with what is passed to libdhcp during 
rescue mode.

More info to follow.

Comment 15 David Cantrell 2008-04-07 23:31:06 UTC
Have written a local test program that simulates the doDhcpNetDevice() function from isys.c in 
anaconda.  It's far easier to test this way than to keep recreating test ISOs to test rescue mode.

Basically, all that rescue mode does to bring up networking is (1) ask you the device to use and (2) call 
isys.dhcpNetDevice() passing the device name (e.g., 'eth0').

The isys.dhcpNetDevice() function (and pumpNetDevice, for that matter) are passthrough Python 
functions in isys.py that call _isys.dhcpNetDevice.  The function in _isys is the C function that calls 
libdhcp, which is what I have reimplemented locally.

The SIGSEGV occurs after we are bound to an IPv4 address.  It happens during option parsing.  Still 
working on a solution, but I attaching my test program and current findings to this bug.

Comment 16 David Cantrell 2008-04-07 23:33:23 UTC
Created attachment 301575 [details]
backtrace-dhcptest.log

Comment 17 David Cantrell 2008-04-07 23:33:53 UTC
Created attachment 301576 [details]
dhcptest.c

Comment 18 David Cantrell 2008-04-08 00:52:36 UTC
Further testing with a version of dhcptest.c that execs the DHCP client in the same process as dhcptest.  
Same crash and same cause (attaching new source and backtrace of that test).


Comment 19 David Cantrell 2008-04-08 00:54:29 UTC
Created attachment 301584 [details]
backtrace-dhcptest2.log

Comment 20 David Cantrell 2008-04-08 00:55:16 UTC
Created attachment 301585 [details]
dhcptest2.c

Comment 21 David Cantrell 2008-04-08 01:33:26 UTC
Numerous problems exist with this bug:

1) The fix for rhbz#216158 prevents the shared library for libdhcp4client from functioning correctly 
with the shared _isys.so module.  This explains why the same code for dhcp works in loader (static 
linked) and fails from _isys (dynamic linked).  This is a major problem with libdhcp and is just 
something we have to live with for now.

2) Problem #1 can be worked around by linking _isys.so with the .a versions of the dhcp libraries.  Not 
great, but it does the trick.

3) The shared libdhcp6client library is hopelessly broken at this point.  The static one fails with _isys.so 
still and the problem is that it's not honoring various LIBDHCP flags.  I've patched dhcpv6 for that and 
it's working now.

So what is the fix?  There are two things I need to do:

1) Patch anaconda to link _isys.so with the .a versions of the dhcp libraries.
2) Patch dhcpv6 to correctly honor the LIBDHCP flags so it doesn't SIGSEGV.

Patches coming next.

Comment 22 David Cantrell 2008-04-08 10:26:09 UTC
Created attachment 301618 [details]
0001-Make-isys.dhcpNetDevice-work-in-rescue-mode-4359.patch

Fix for anaconda to get isys.dhcpNetDevice() working in rescue mode.  Requires
a dhcp package with the following patch incorporated.

Note, this only gets DHCP working in rescue mode as DHCPv6 still crashes for
some reason.  Including dhclient and dhcp6c in the rescue image for users to
run manually as a temporary solution.

Comment 23 David Cantrell 2008-04-08 10:28:46 UTC
Created attachment 301619 [details]
dhcp-3.0.5-libdhcp4client.patch

Updated libdhcp4client visibility patch.

Comment 24 David Cantrell 2008-04-08 10:31:43 UTC
Went ahead and built a new dhcp package with the fix in comment #23, new package will be dhcp-3.0.5-
13.el5.

Changed committed to anaconda, so the next anaconda build will this fix in it.

Comment 27 David Cantrell 2008-04-11 21:42:12 UTC
(In reply to comment #26)
> David which tree are supposed the fixed to be in? 
> I have this failing with snap #5 which has:
> dhcp-3.0.5-13.el5
> anaconda-11.1.2.112-1
> 
> Fails if booting from CDROM and one or both of IPv4/IPv6 is enabled and DHCP is
> used. If this FAILS_QA or I have to wait for another tree?
> 
> Thanks.

Those are the right versions.  I think I also need to do a libdhcp rebuild.  Let me do that.

Comment 28 David Cantrell 2008-04-11 21:47:00 UTC
I've rebuilt libdhcp and am updating the erratum for that.  We'll also need to rebuild anaconda.  So what 
you'll want in the tree is:

dhcp-3.0.5-13.el5
anaconda-11.1.2.113-1
libdhcp-1.20-5.el5

Comment 33 errata-xmlrpc 2008-05-21 15:33:22 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0397.html



Note You need to log in before you can comment on or make changes to this bug.