Bug 435978

Summary: Rescue mode networking fails to start
Product: Red Hat Enterprise Linux 5 Reporter: James Laska <jlaska>
Component: anacondaAssignee: David Cantrell <dcantrell>
Status: CLOSED ERRATA QA Contact: Alexander Todorov <atodorov>
Severity: low Docs Contact:
Priority: low    
Version: 5.2CC: atodorov, jturner
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2008-0397 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-05-21 15:33:22 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
gdb backtrace from isys.pumpNetDevice('eth0', None)
none
0001-Make-sure-DHCP-works-in-rescue-mode-435978.patch
none
backtrace-dhcptest.log
none
dhcptest.c
none
backtrace-dhcptest2.log
none
dhcptest2.c
none
0001-Make-isys.dhcpNetDevice-work-in-rescue-mode-4359.patch
none
dhcp-3.0.5-libdhcp4client.patch none

Description James Laska 2008-03-04 18:55:49 UTC
Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. linux rescue
2.  When prompted to setup networking, say Yes
                    ┌────────┤ Setup Networking ├─────────┐                     
                    │                                     │                     
                    │ Do you want to start the network    │                     
                    │ interfaces on this system?          │                     
                    │                                     │                     
                    │      ┌─────┐           ┌────┐       │                     
                    │      │ Yes │           │ No │       │                     
                    │      └─────┘           └────┘       │                     
                    │                                     │                     
                    │                                     │                     
                    └─────────────────────────────────────┘                     

3. When prompted to start eth0, say yes

                   ┌────┤ Configure Network Interface ├─────┐                   
                   │                                        │                   
                   │ Would you like to configure the eth0   │                   
                   │ network interface in your system?      │                   
                   │                                        │                   
                   │       ┌─────┐            ┌────┐        │                   
                   │       │ Yes │            │ No │        │                   
                   │       └─────┘            └────┘        │                   
                   │                                        │                   
                   │                                        │                   
                   └────────────────────────────────────────┘                   
4. configure device parameters

                      ┌┤ Network Configuration for eth0 ├┐                      
                      │                                  │                      
                      │     IBM Virtual Ethernet         │                      
                      │     52:0E:30:00:30:02            │                      
                      │                                  │                      
                      │     [*] Enable IPv4 support      │                      
                      │     [ ] Enable IPv6 support      │                      
                      │                                  │                      
                      │       ┌────┐    ┌──────┐         │                      
                      │       │ OK │    │ Back │         │                      
                      │       └────┘    └──────┘         │                      
                      │                                  │                      
                      │                                  │                      
                      └──────────────────────────────────┘                      
                  ┌─────┤ IPv4 Configuration for eth0 ├─────┐                   
                  │                                         │                   
                  │ IBM Virtual Ethernet                    │                   
                  │ 52:0E:30:00:30:02                       │                   
                  │                                         │                   
                  │ (*) Dynamic IP configuration (DHCP)     │                   
                  │ ( ) Manual address configuration        │                   
                  │                                         │                   
                  │     IP Address         Prefix (Netmask) │                   
                  │     ________________ / ________________ │                   
                  │                                         │                   
                  │       ┌────┐            ┌──────┐        │                   
                  │       │ OK │            │ Back │        │                   
                  │       └────┘            └──────┘        │                   
                  │                                         │                   
                  │                                         │                   
                  └─────────────────────────────────────────┘                   

Actual results:
the system sits at the following prompt forever:

                         ┌──┤ Starting Interface ├───┐                          
                         │                           │                          
                         │ Attempting to start eth0  │                          
                         │                           │                          
                         └───────────────────────────┘                          
 - tty2 shows that anaconda is maxing out the cpu

Expected results:
networking started

Additional info:
 - hitting this on i386, x86_64 and ppc so far ... so I don't believe this is
arch dependent

$ strace python -c "import  sys; sys.path.append('/usr/lib/anaconda'); import
isys; print isys.pumpNetDevice('eth0', None);"

socket(PF_PACKET, SOCK_RAW, 3)          = 13
ioctl(13, SIOCGIFINDEX, {ifr_name="eth0", ifr_index=3}) = 0
bind(13, {sa_family=AF_PACKET, proto=0000, if3, pkttype=PACKET_HOST, addr(0)={0,
}, 20) = 0
setsockopt(13, SOL_PACKET, 0x8 /* PACKET_??? */, [1], 4) = 0
setsockopt(13, SOL_SOCKET, SO_ATTACH_FILTER, "\0\v\0\0\17z`\204", 8) = 0
fcntl64(13, F_SETFD, FD_CLOEXEC)        = 0
close(12)                               = 0
socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP) = 12
setsockopt(12, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
bind(12, {sa_family=AF_INET, sin_port=htons(68), sin_addr=inet_addr("0.0.0.0")},
16) = 0
fcntl64(12, F_SETFD, FD_CLOEXEC)        = 0
time(NULL)                              = 1204653121
write(13,
"\377\377\377\377\377\377R\0160\0000\2\10\0E\20\1H\0\0\0\0\20\21\251\226\0\0\0\0\377\377"...,
342) = 342
select(14, [12 13], [], [], {6, 288597}) = 1 (in [13], left {6, 276000})
recvmsg(13, {msg_name(0)=NULL, msg_iov(1)=[{"R\0160\0000\2R\0160\0
\2\10\0E\20\1H\0\0\0\0\20\21\344\337\300\250!\262\300\250"..., 1536}],
msg_controllen=28, {cmsg_len=28, cmsg_level=SOL_PACKET, cmsg_type=, ...},
msg_flags=0}, 0) = 342
time(NULL)                              = 1204653121
write(13,
"\377\377\377\377\377\377R\0160\0000\2\10\0E\20\1H\0\0\0\0\20\21\251\226\0\0\0\0\377\377"...,
342) = 342
select(14, [12 13], [], [], {7, 274642}) = 1 (in [13], left {7, 254000})
recvmsg(13, {msg_name(0)=NULL, msg_iov(1)=[{"R\0160\0000\2R\0160\0
\2\10\0E\20\1H\0\0\0\0\20\21\344\337\300\250!\262\300\250"..., 1536}],
msg_controllen=28, {cmsg_len=28, cmsg_level=SOL_PACKET, cmsg_type=, ...},
msg_flags=0}, 0) = 342
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV +++

Comment 1 James Laska 2008-03-04 18:55:49 UTC
Created attachment 296779 [details]
gdb backtrace from isys.pumpNetDevice('eth0', None)

Comment 2 James Laska 2008-03-04 19:03:14 UTC
Note: this appears to only happen when starting rescue mode from a local
installation source (DVD or CDROM).  A remote installation source appears to
setup networking correctly when it pulls down stage2.img.

Comment 3 Jeremy Katz 2008-03-04 19:15:43 UTC
segv is in libdhcp

Comment 4 James Laska 2008-03-05 12:42:11 UTC
Tested in RHEL-5-Server/U1 and it proceeds with enabling networking on the same
system U2 fails with.  Adding Regression keyword.

$ python -c "import  sys; sys.path.append('/usr/lib/anaconda'); import
isys; print isys.pumpNetDevice('eth0', None);"
192.168.32.2

Let me know if there is any additional information needed on the failing systems




Comment 6 David Cantrell 2008-03-19 23:47:59 UTC
I am able to reproduce this locally on RHEL5.2-Server-20080319.nightly on at least i386.

This is similar to bug #432011, which was handled by working around limitations in libdhcp since that 
library is very fragile and essentially broken.  Working on it any more seems like a waste of time.  
Changing the component of this bug to anaconda and attaching my patch.  Also giving it a devel-ack.

Comment 7 David Cantrell 2008-03-19 23:48:49 UTC
Created attachment 298612 [details]
0001-Make-sure-DHCP-works-in-rescue-mode-435978.patch

Comment 8 David Cantrell 2008-03-19 23:50:12 UTC
There's the patch.  If I had to take a guess as to which platform this might cause a regression on for 5.2, it 
would be s390x.  That platform is nothing but problems.

jlaska,
If you want to give it a qa-ack, I'll seek out the other magic flags (unless you do) to get it in to 5.2.

Comment 11 David Cantrell 2008-03-20 19:05:23 UTC
Committed, fix will be in anaconda-11.1.2.109-1.

Comment 14 David Cantrell 2008-04-04 03:42:35 UTC
Working on tracking this one down still.  The same code is run via rescue mode as is run during stage1 
when performing a network install, so something odd is happening with what is passed to libdhcp during 
rescue mode.

More info to follow.

Comment 15 David Cantrell 2008-04-07 23:31:06 UTC
Have written a local test program that simulates the doDhcpNetDevice() function from isys.c in 
anaconda.  It's far easier to test this way than to keep recreating test ISOs to test rescue mode.

Basically, all that rescue mode does to bring up networking is (1) ask you the device to use and (2) call 
isys.dhcpNetDevice() passing the device name (e.g., 'eth0').

The isys.dhcpNetDevice() function (and pumpNetDevice, for that matter) are passthrough Python 
functions in isys.py that call _isys.dhcpNetDevice.  The function in _isys is the C function that calls 
libdhcp, which is what I have reimplemented locally.

The SIGSEGV occurs after we are bound to an IPv4 address.  It happens during option parsing.  Still 
working on a solution, but I attaching my test program and current findings to this bug.

Comment 16 David Cantrell 2008-04-07 23:33:23 UTC
Created attachment 301575 [details]
backtrace-dhcptest.log

Comment 17 David Cantrell 2008-04-07 23:33:53 UTC
Created attachment 301576 [details]
dhcptest.c

Comment 18 David Cantrell 2008-04-08 00:52:36 UTC
Further testing with a version of dhcptest.c that execs the DHCP client in the same process as dhcptest.  
Same crash and same cause (attaching new source and backtrace of that test).


Comment 19 David Cantrell 2008-04-08 00:54:29 UTC
Created attachment 301584 [details]
backtrace-dhcptest2.log

Comment 20 David Cantrell 2008-04-08 00:55:16 UTC
Created attachment 301585 [details]
dhcptest2.c

Comment 21 David Cantrell 2008-04-08 01:33:26 UTC
Numerous problems exist with this bug:

1) The fix for rhbz#216158 prevents the shared library for libdhcp4client from functioning correctly 
with the shared _isys.so module.  This explains why the same code for dhcp works in loader (static 
linked) and fails from _isys (dynamic linked).  This is a major problem with libdhcp and is just 
something we have to live with for now.

2) Problem #1 can be worked around by linking _isys.so with the .a versions of the dhcp libraries.  Not 
great, but it does the trick.

3) The shared libdhcp6client library is hopelessly broken at this point.  The static one fails with _isys.so 
still and the problem is that it's not honoring various LIBDHCP flags.  I've patched dhcpv6 for that and 
it's working now.

So what is the fix?  There are two things I need to do:

1) Patch anaconda to link _isys.so with the .a versions of the dhcp libraries.
2) Patch dhcpv6 to correctly honor the LIBDHCP flags so it doesn't SIGSEGV.

Patches coming next.

Comment 22 David Cantrell 2008-04-08 10:26:09 UTC
Created attachment 301618 [details]
0001-Make-isys.dhcpNetDevice-work-in-rescue-mode-4359.patch

Fix for anaconda to get isys.dhcpNetDevice() working in rescue mode.  Requires
a dhcp package with the following patch incorporated.

Note, this only gets DHCP working in rescue mode as DHCPv6 still crashes for
some reason.  Including dhclient and dhcp6c in the rescue image for users to
run manually as a temporary solution.

Comment 23 David Cantrell 2008-04-08 10:28:46 UTC
Created attachment 301619 [details]
dhcp-3.0.5-libdhcp4client.patch

Updated libdhcp4client visibility patch.

Comment 24 David Cantrell 2008-04-08 10:31:43 UTC
Went ahead and built a new dhcp package with the fix in comment #23, new package will be dhcp-3.0.5-
13.el5.

Changed committed to anaconda, so the next anaconda build will this fix in it.

Comment 27 David Cantrell 2008-04-11 21:42:12 UTC
(In reply to comment #26)
> David which tree are supposed the fixed to be in? 
> I have this failing with snap #5 which has:
> dhcp-3.0.5-13.el5
> anaconda-11.1.2.112-1
> 
> Fails if booting from CDROM and one or both of IPv4/IPv6 is enabled and DHCP is
> used. If this FAILS_QA or I have to wait for another tree?
> 
> Thanks.

Those are the right versions.  I think I also need to do a libdhcp rebuild.  Let me do that.

Comment 28 David Cantrell 2008-04-11 21:47:00 UTC
I've rebuilt libdhcp and am updating the erratum for that.  We'll also need to rebuild anaconda.  So what 
you'll want in the tree is:

dhcp-3.0.5-13.el5
anaconda-11.1.2.113-1
libdhcp-1.20-5.el5

Comment 33 errata-xmlrpc 2008-05-21 15:33:22 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0397.html