Bug 435978
Summary: | Rescue mode networking fails to start | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | James Laska <jlaska> | ||||||||||||||||||
Component: | anaconda | Assignee: | David Cantrell <dcantrell> | ||||||||||||||||||
Status: | CLOSED ERRATA | QA Contact: | Alexander Todorov <atodorov> | ||||||||||||||||||
Severity: | low | Docs Contact: | |||||||||||||||||||
Priority: | low | ||||||||||||||||||||
Version: | 5.2 | CC: | atodorov, jturner | ||||||||||||||||||
Target Milestone: | rc | Keywords: | Regression | ||||||||||||||||||
Target Release: | --- | ||||||||||||||||||||
Hardware: | All | ||||||||||||||||||||
OS: | Linux | ||||||||||||||||||||
Whiteboard: | |||||||||||||||||||||
Fixed In Version: | RHBA-2008-0397 | Doc Type: | Bug Fix | ||||||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||||||
Clone Of: | Environment: | ||||||||||||||||||||
Last Closed: | 2008-05-21 15:33:22 UTC | Type: | --- | ||||||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||||
Embargoed: | |||||||||||||||||||||
Attachments: |
|
Description
James Laska
2008-03-04 18:55:49 UTC
Created attachment 296779 [details]
gdb backtrace from isys.pumpNetDevice('eth0', None)
Note: this appears to only happen when starting rescue mode from a local installation source (DVD or CDROM). A remote installation source appears to setup networking correctly when it pulls down stage2.img. segv is in libdhcp Tested in RHEL-5-Server/U1 and it proceeds with enabling networking on the same system U2 fails with. Adding Regression keyword. $ python -c "import sys; sys.path.append('/usr/lib/anaconda'); import isys; print isys.pumpNetDevice('eth0', None);" 192.168.32.2 Let me know if there is any additional information needed on the failing systems I am able to reproduce this locally on RHEL5.2-Server-20080319.nightly on at least i386. This is similar to bug #432011, which was handled by working around limitations in libdhcp since that library is very fragile and essentially broken. Working on it any more seems like a waste of time. Changing the component of this bug to anaconda and attaching my patch. Also giving it a devel-ack. Created attachment 298612 [details]
0001-Make-sure-DHCP-works-in-rescue-mode-435978.patch
There's the patch. If I had to take a guess as to which platform this might cause a regression on for 5.2, it would be s390x. That platform is nothing but problems. jlaska, If you want to give it a qa-ack, I'll seek out the other magic flags (unless you do) to get it in to 5.2. Committed, fix will be in anaconda-11.1.2.109-1. Working on tracking this one down still. The same code is run via rescue mode as is run during stage1 when performing a network install, so something odd is happening with what is passed to libdhcp during rescue mode. More info to follow. Have written a local test program that simulates the doDhcpNetDevice() function from isys.c in anaconda. It's far easier to test this way than to keep recreating test ISOs to test rescue mode. Basically, all that rescue mode does to bring up networking is (1) ask you the device to use and (2) call isys.dhcpNetDevice() passing the device name (e.g., 'eth0'). The isys.dhcpNetDevice() function (and pumpNetDevice, for that matter) are passthrough Python functions in isys.py that call _isys.dhcpNetDevice. The function in _isys is the C function that calls libdhcp, which is what I have reimplemented locally. The SIGSEGV occurs after we are bound to an IPv4 address. It happens during option parsing. Still working on a solution, but I attaching my test program and current findings to this bug. Created attachment 301575 [details]
backtrace-dhcptest.log
Created attachment 301576 [details]
dhcptest.c
Further testing with a version of dhcptest.c that execs the DHCP client in the same process as dhcptest. Same crash and same cause (attaching new source and backtrace of that test). Created attachment 301584 [details]
backtrace-dhcptest2.log
Created attachment 301585 [details]
dhcptest2.c
Numerous problems exist with this bug: 1) The fix for rhbz#216158 prevents the shared library for libdhcp4client from functioning correctly with the shared _isys.so module. This explains why the same code for dhcp works in loader (static linked) and fails from _isys (dynamic linked). This is a major problem with libdhcp and is just something we have to live with for now. 2) Problem #1 can be worked around by linking _isys.so with the .a versions of the dhcp libraries. Not great, but it does the trick. 3) The shared libdhcp6client library is hopelessly broken at this point. The static one fails with _isys.so still and the problem is that it's not honoring various LIBDHCP flags. I've patched dhcpv6 for that and it's working now. So what is the fix? There are two things I need to do: 1) Patch anaconda to link _isys.so with the .a versions of the dhcp libraries. 2) Patch dhcpv6 to correctly honor the LIBDHCP flags so it doesn't SIGSEGV. Patches coming next. Created attachment 301618 [details]
0001-Make-isys.dhcpNetDevice-work-in-rescue-mode-4359.patch
Fix for anaconda to get isys.dhcpNetDevice() working in rescue mode. Requires
a dhcp package with the following patch incorporated.
Note, this only gets DHCP working in rescue mode as DHCPv6 still crashes for
some reason. Including dhclient and dhcp6c in the rescue image for users to
run manually as a temporary solution.
Created attachment 301619 [details]
dhcp-3.0.5-libdhcp4client.patch
Updated libdhcp4client visibility patch.
Went ahead and built a new dhcp package with the fix in comment #23, new package will be dhcp-3.0.5- 13.el5. Changed committed to anaconda, so the next anaconda build will this fix in it. (In reply to comment #26) > David which tree are supposed the fixed to be in? > I have this failing with snap #5 which has: > dhcp-3.0.5-13.el5 > anaconda-11.1.2.112-1 > > Fails if booting from CDROM and one or both of IPv4/IPv6 is enabled and DHCP is > used. If this FAILS_QA or I have to wait for another tree? > > Thanks. Those are the right versions. I think I also need to do a libdhcp rebuild. Let me do that. I've rebuilt libdhcp and am updating the erratum for that. We'll also need to rebuild anaconda. So what you'll want in the tree is: dhcp-3.0.5-13.el5 anaconda-11.1.2.113-1 libdhcp-1.20-5.el5 An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0397.html |