Bug 500775 - anaconda crashes in loader when getting IP address
Summary: anaconda crashes in loader when getting IP address
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: libdhcp
Version: 5.4
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: David Cantrell
QA Contact: Release Test Team
URL:
Whiteboard:
: 500848 501005 (view as bug list)
Depends On:
Blocks: 444919
TreeView+ depends on / blocked
 
Reported: 2009-05-14 08:06 UTC by Alexander Todorov
Modified: 2009-09-02 09:53 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-09-02 09:53:00 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
PATCH: non working attempt to fix this. (3.29 KB, patch)
2009-05-14 20:43 UTC, Hans de Goede
no flags Details | Diff
Patch to avoid sigsev. (on top of hans' patch) (460 bytes, patch)
2009-05-15 19:15 UTC, Joel Andres Granados
no flags Details | Diff
libdhcp-1.20-use-libnl.patch (33.31 KB, patch)
2009-05-16 03:45 UTC, David Cantrell
no flags Details | Diff
libdhcp-1.20-use-libnl.patch (36.25 KB, patch)
2009-05-19 03:42 UTC, David Cantrell
no flags Details | Diff
libdhcp-1.20-use-libnl.patch (17.43 KB, patch)
2009-05-22 22:22 UTC, David Cantrell
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2009:1306 0 normal SHIPPED_LIVE anaconda bug fix and enhancement update 2009-09-01 10:20:11 UTC
Red Hat Product Errata RHBA-2009:1333 0 normal SHIPPED_LIVE libdhcp bug fix update 2009-09-01 10:39:17 UTC

Description Alexander Todorov 2009-05-14 08:06:44 UTC
Description of problem:
Anaconda crashes in loader after issuing DHCP request for the IP address. The console log is:

+--------+ Loading SCSI driver +---------+
|                                        | 
| Loading lpfc driver...                 | 
|                                        | 
+----------------------------------------+ 
                                          

                                          
                                           
                                           
                                           
                                           
                                          

+-------------------------------------------------------+
|                                                       | 
| Sending request for IP information for eth0...        | 
|                                                       | 
+-------------------------------------------------------+ 
                                                         
*** glibc detected *** /sbin/loader: free(): invalid next size (fast): 0x0933aad8 ***
======= Backtrace: =========
[0x817a7c2]
[0x817e03b]
[0x80a5161]
[0x80a5a10]
[0x809a9d4]
[0x809b2aa]
[0x809c025]
[0x8063cce]
[0x8067f08]
[0x8069c35]
[0x80613ec]
[0x8061900]
[0x80578b1]
[0x804b8d9]
[0x8164108]
[0x8048131]
======= Memory map: ========
00876000-00877000 r-xp 00876000 00:00 0          [vdso]
08048000-08275000 r-xp 00000000 00:01 45         /sbin/loader
08275000-08280000 rw-p 0022d000 00:01 45         /sbin/loader
08280000-082ae000 rw-p 08280000 00:00 0 
092a0000-09362000 rw-p 092a0000 00:00 0          [heap]
b7e00000-b7e25000 rw-p b7e00000 00:00 0 
b7e25000-b7f00000 ---p b7e25000 00:00 0 
b7f0a000-b7f0b000 rw-s 00000000 00:09 0          /SYSV5012031d (deleted)
b7f0b000-b7f0f000 rw-p b7f0b000 00:00 0 
bfc11000-bfc26000 rw-p bffea000 00:00 0          [stack]


Version-Release number of selected component (if applicable):
anaconda-11.1.2.174-1

How reproducible:
not sure (always)

Steps to Reproduce:
1. initiate install of latest 5.4 compose
2.
3.
  
Actual results:
crash

Expected results:
install completes

Additional info:

Comment 2 Hans de Goede 2009-05-14 09:13:35 UTC
I can reproduce, this is definitely a blocker, digging into it now.

Comment 4 RHEL Program Management 2009-05-14 09:20:18 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 5 Hans de Goede 2009-05-14 11:34:20 UTC
This is caused by the new libdhcp, doing an anaconda build with libdhcp reverted to 1.20-6 fixes this.

The likely cause for this is the patch in 1.20-8, this is clearly wrong:
"    if ((nic = calloc(0, sizeof(*nic))) == NULL) {"

As is this, as ipbuf == NULL at this point, but should point to a buffer
of atleast INET6_ADDRSTRLEN bytes.
"    if ((ipbuf = nl_addr2str(addr, ipbuf, INET6_ADDRSTRLEN)) == NULL) {"

Comment 6 Hans de Goede 2009-05-14 11:35:52 UTC
dcantrell, adding you to the CC as this seems to be caused by libhdcp.

Comment 7 Hans de Goede 2009-05-14 20:42:18 UTC
Ok, I've made a patch which fixes a few issues I've found, but unfortunately that does not fix things.

Here is a backtrace of a locally build loader, linked against a locally build libdhcp-1.20-9 *with my patch applied*:

[root@localhost ~]# eu-addr2line -e t/sbin/loader 
0x527796
/usr/src/redhat/BUILD/anaconda-11.1.2.174/stubs/unicode-lite.c:28
0x52b2d7
/usr/src/redhat/BUILD/anaconda-11.1.2.174/stubs/unicode-lite.c:28
0x451e7e
/usr/src/redhat/BUILD/libdhcp-1.20.new/nic.c:340
0x451fee
/usr/src/redhat/BUILD/libdhcp-1.20.new/nic.c:381
0x44c02d
/usr/src/redhat/BUILD/libdhcp-1.20/dhcp_nic.c:106


Note that this is with libdhcp's nic.c conpiled without -O2 in an attempt to
get a more sensible backtrace, when compiled with -O2, the
nic.c:381 becomes nic.c:379, which makes more sense.

I'll attach my patch.

Note that my test system has 3 nics, eth0 and eth1 are 2 identical
realtech nics (with different mac's ofcourse) and eth2 is a gigabit intel
server nic. Eth2 is the one I'm doing (trying) dhcp on.

p.s.

We also have bug 500848 open which might be the same, but I'm not sure, that
one should be retested after this one is fixed, I'll re-assign it to you.

Comment 8 Hans de Goede 2009-05-14 20:43:48 UTC
Created attachment 344046 [details]
PATCH: non working attempt to fix this.

Comment 9 Joel Andres Granados 2009-05-15 13:59:55 UTC
*** Bug 501005 has been marked as a duplicate of this bug. ***

Comment 10 Joel Andres Granados 2009-05-15 19:15:41 UTC
Created attachment 344218 [details]
Patch to avoid sigsev. (on top of hans' patch)

After using the patch hans posted I further patched libdhcp to avoid the sigsev.  Unfortunately, on my tests, the darn thing does not want to continue past dhcp screen.  It searches for IP, does not find it and goes back to the dhcp screen (the one where you select if you want ipv4 and/or ipv6).

Comment 11 David Cantrell 2009-05-16 03:45:46 UTC
Created attachment 344251 [details]
libdhcp-1.20-use-libnl.patch

Latest patch, but still not working.  Still working on a netlink error.

Comment 12 Hans de Goede 2009-05-16 07:02:00 UTC
(In reply to comment #10)
> Created an attachment (id=344218) [details]
> Patch to avoid sigsev. (on top of hans' patch)
>

Dang, I missed that one free, and addr2line pointed me to the wrong line, good catch Joel !

Comment 13 Denise Dumas 2009-05-18 15:56:23 UTC
Temporarily backed out libdhcp and adjusted anaconda. Tonights compose should avoid this problem. Work on fix continues.

Comment 14 David Cantrell 2009-05-19 03:41:50 UTC
Wanted to bring everyone up to speed on the progress on this bug.  Hans,
Joel...thank you for your help with the earlier patches.  As you've probably
noticed, libdhcp is absolutely annoying.  My patience for it is very limited.

The latest patch is configuring the interface, but installation does not
continue.  There are a couple of problems I'm working on narrowing down:

1) DNS lookups are not working after the interface is configured (easily
solved).

2) Static configuration works, dhcp is not.  Loader crashes as soon as it tries
to mount the NFS source on my local network.

The core problem that we were seeing earlier had to do with rtnl_link_change()
not working as described (my fault for trusting the API docs).  The problem is
still there as of libnl-1.1, so I'm filing a bug upstream for that (and
hopefully a patch, but we'll see if I have time).  I can use
rtnl_link_change_build_request() and nl_send_auto_complete() to do the same
thing.

So that's where I am.  Hoping to have this wrapped up either tonight or
tomorrow, just wanted to let everyone know my current status.

Comment 15 David Cantrell 2009-05-19 03:42:21 UTC
Created attachment 344559 [details]
libdhcp-1.20-use-libnl.patch

Comment 17 David Cantrell 2009-05-22 22:22:08 UTC
Problem corrected in libdhcp-1.20-10.el5.  Attaching latest patch since I've been attaching patches to this bug.

Comment 18 David Cantrell 2009-05-22 22:22:54 UTC
Created attachment 345170 [details]
libdhcp-1.20-use-libnl.patch

Comment 21 Alexander Todorov 2009-05-27 13:22:25 UTC
QE Note: latest tree has libdhcp-1.20-6.el5. Need to test once libdhcp-1.20-10.el5 gets pulled in.

Comment 24 Denise Dumas 2009-06-05 14:48:33 UTC
*** Bug 500848 has been marked as a duplicate of this bug. ***

Comment 26 errata-xmlrpc 2009-09-02 09:53:00 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-1306.html


Note You need to log in before you can comment on or make changes to this bug.