Description of problem: Anaconda crashes in loader after issuing DHCP request for the IP address. The console log is: +--------+ Loading SCSI driver +---------+ | | | Loading lpfc driver... | | | +----------------------------------------+ +-------------------------------------------------------+ | | | Sending request for IP information for eth0... | | | +-------------------------------------------------------+ *** glibc detected *** /sbin/loader: free(): invalid next size (fast): 0x0933aad8 *** ======= Backtrace: ========= [0x817a7c2] [0x817e03b] [0x80a5161] [0x80a5a10] [0x809a9d4] [0x809b2aa] [0x809c025] [0x8063cce] [0x8067f08] [0x8069c35] [0x80613ec] [0x8061900] [0x80578b1] [0x804b8d9] [0x8164108] [0x8048131] ======= Memory map: ======== 00876000-00877000 r-xp 00876000 00:00 0 [vdso] 08048000-08275000 r-xp 00000000 00:01 45 /sbin/loader 08275000-08280000 rw-p 0022d000 00:01 45 /sbin/loader 08280000-082ae000 rw-p 08280000 00:00 0 092a0000-09362000 rw-p 092a0000 00:00 0 [heap] b7e00000-b7e25000 rw-p b7e00000 00:00 0 b7e25000-b7f00000 ---p b7e25000 00:00 0 b7f0a000-b7f0b000 rw-s 00000000 00:09 0 /SYSV5012031d (deleted) b7f0b000-b7f0f000 rw-p b7f0b000 00:00 0 bfc11000-bfc26000 rw-p bffea000 00:00 0 [stack] Version-Release number of selected component (if applicable): anaconda-11.1.2.174-1 How reproducible: not sure (always) Steps to Reproduce: 1. initiate install of latest 5.4 compose 2. 3. Actual results: crash Expected results: install completes Additional info:
I can reproduce, this is definitely a blocker, digging into it now.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
This is caused by the new libdhcp, doing an anaconda build with libdhcp reverted to 1.20-6 fixes this. The likely cause for this is the patch in 1.20-8, this is clearly wrong: " if ((nic = calloc(0, sizeof(*nic))) == NULL) {" As is this, as ipbuf == NULL at this point, but should point to a buffer of atleast INET6_ADDRSTRLEN bytes. " if ((ipbuf = nl_addr2str(addr, ipbuf, INET6_ADDRSTRLEN)) == NULL) {"
dcantrell, adding you to the CC as this seems to be caused by libhdcp.
Ok, I've made a patch which fixes a few issues I've found, but unfortunately that does not fix things. Here is a backtrace of a locally build loader, linked against a locally build libdhcp-1.20-9 *with my patch applied*: [root@localhost ~]# eu-addr2line -e t/sbin/loader 0x527796 /usr/src/redhat/BUILD/anaconda-11.1.2.174/stubs/unicode-lite.c:28 0x52b2d7 /usr/src/redhat/BUILD/anaconda-11.1.2.174/stubs/unicode-lite.c:28 0x451e7e /usr/src/redhat/BUILD/libdhcp-1.20.new/nic.c:340 0x451fee /usr/src/redhat/BUILD/libdhcp-1.20.new/nic.c:381 0x44c02d /usr/src/redhat/BUILD/libdhcp-1.20/dhcp_nic.c:106 Note that this is with libdhcp's nic.c conpiled without -O2 in an attempt to get a more sensible backtrace, when compiled with -O2, the nic.c:381 becomes nic.c:379, which makes more sense. I'll attach my patch. Note that my test system has 3 nics, eth0 and eth1 are 2 identical realtech nics (with different mac's ofcourse) and eth2 is a gigabit intel server nic. Eth2 is the one I'm doing (trying) dhcp on. p.s. We also have bug 500848 open which might be the same, but I'm not sure, that one should be retested after this one is fixed, I'll re-assign it to you.
Created attachment 344046 [details] PATCH: non working attempt to fix this.
*** Bug 501005 has been marked as a duplicate of this bug. ***
Created attachment 344218 [details] Patch to avoid sigsev. (on top of hans' patch) After using the patch hans posted I further patched libdhcp to avoid the sigsev. Unfortunately, on my tests, the darn thing does not want to continue past dhcp screen. It searches for IP, does not find it and goes back to the dhcp screen (the one where you select if you want ipv4 and/or ipv6).
Created attachment 344251 [details] libdhcp-1.20-use-libnl.patch Latest patch, but still not working. Still working on a netlink error.
(In reply to comment #10) > Created an attachment (id=344218) [details] > Patch to avoid sigsev. (on top of hans' patch) > Dang, I missed that one free, and addr2line pointed me to the wrong line, good catch Joel !
Temporarily backed out libdhcp and adjusted anaconda. Tonights compose should avoid this problem. Work on fix continues.
Wanted to bring everyone up to speed on the progress on this bug. Hans, Joel...thank you for your help with the earlier patches. As you've probably noticed, libdhcp is absolutely annoying. My patience for it is very limited. The latest patch is configuring the interface, but installation does not continue. There are a couple of problems I'm working on narrowing down: 1) DNS lookups are not working after the interface is configured (easily solved). 2) Static configuration works, dhcp is not. Loader crashes as soon as it tries to mount the NFS source on my local network. The core problem that we were seeing earlier had to do with rtnl_link_change() not working as described (my fault for trusting the API docs). The problem is still there as of libnl-1.1, so I'm filing a bug upstream for that (and hopefully a patch, but we'll see if I have time). I can use rtnl_link_change_build_request() and nl_send_auto_complete() to do the same thing. So that's where I am. Hoping to have this wrapped up either tonight or tomorrow, just wanted to let everyone know my current status.
Created attachment 344559 [details] libdhcp-1.20-use-libnl.patch
Problem corrected in libdhcp-1.20-10.el5. Attaching latest patch since I've been attaching patches to this bug.
Created attachment 345170 [details] libdhcp-1.20-use-libnl.patch
QE Note: latest tree has libdhcp-1.20-6.el5. Need to test once libdhcp-1.20-10.el5 gets pulled in.
*** Bug 500848 has been marked as a duplicate of this bug. ***
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2009-1306.html