Bug 195766 - Installer crashes because networking fails
Summary: Installer crashes because networking fails
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: anaconda
Version: rawhide
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
Assignee: David Cantrell
QA Contact: Mike McLean
URL:
Whiteboard:
Depends On:
Blocks: FC6Blocker
TreeView+ depends on / blocked
 
Reported: 2006-06-17 15:03 UTC by Joachim Frieben
Modified: 2007-11-30 22:11 UTC (History)
8 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2006-07-13 18:38:40 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Messages on virtual terminal 1 (8.19 KB, image/png)
2006-06-17 16:37 UTC, Markku Kolkka
no flags Details
Messages on virtual terminal 3 (14.38 KB, image/png)
2006-06-17 16:38 UTC, Markku Kolkka
no flags Details
Messages on virtual terminal 4 (13.58 KB, image/png)
2006-06-17 16:39 UTC, Markku Kolkka
no flags Details

Description Joachim Frieben 2006-06-17 15:03:58 UTC
Description of problem:
For the latest versions of "anaconda", network installs exit because
it fails to establish a working network connection. The system is an
"IBM ThinkPad T23", the network driver is "e100". An "HTTP" install
exits after entering server name and download directory.

Version-Release number of selected component (if applicable):
anaconda-11.1.0.45-1.i386.rpm

How reproducible:
Always.

Steps to Reproduce:
1. Boot from current "boot.iso" image.
2. Select "HTTP" install.
3. Choose "DHCP" protocol. 
4. Enter download repository and confirm.
 
Actual results:
Installer terminates and reports:

  "install exited abnormally -- received signal 6"

Expected results:

  Installer should proceed to downloading "stage2.img".

Additional info:

  vt3 is filled with some relevant messages:
  "ERROR: nic_configure: failed to configure resolver."
  "ERROR: DHCPv4 interface configuration failed."
  "INFO: result of setupInterface in DHCP configuration failed
     -1 Operation not permitted."

  It is of course possible to configure the network without "DHCP".
  In this case, the installer simply sits there after entering the
  repository parameters instead of reporting that it is retrieving
  "stage2.img".

  All of the above works again when a boot image of "FC5" is used
  instead.

  It is possible to boot from the rescue image, too. In this case,
  network support with "DHCP" protocol is also chosen and the
  repository parameters are entered. However, the following screen
  appears instantaneously - too fast to have carried out the request.
  Again there is an error message in a virtual console: "ERROR:
  Error trying to start eth0 in rescue.py::startNetworking()".
  A "ping" to some explicit IP number is rejected due to absence
  of an active network connection. 
  The issue occurred more than about a week ago. Before, everything
  worked flawlessly.

Comment 1 Joachim Frieben 2006-06-17 15:11:38 UTC
Sorry, when booting from the rescue image, network activation and
"DHCP" are checked, of course, and that's all (no download server
name entered, etc.).

Comment 2 Markku Kolkka 2006-06-17 16:36:11 UTC
I'm seeing a similar crash while trying to do a FTP install on a VMWare virtual
machine. The installer crashes immediately after hitting "OK" on the screen for
selecting the FTP server and directory. I'll send screen captures of VT1, 3 and
4 with the error messages.

Comment 3 Markku Kolkka 2006-06-17 16:37:20 UTC
Created attachment 131102 [details]
Messages on virtual terminal 1

Comment 4 Markku Kolkka 2006-06-17 16:38:56 UTC
Created attachment 131103 [details]
Messages on virtual terminal 3

Comment 5 Markku Kolkka 2006-06-17 16:39:49 UTC
Created attachment 131104 [details]
Messages on virtual terminal 4

Comment 6 Joachim Frieben 2006-06-17 18:02:56 UTC
Yes, that's exactly what happens to me. I could reproduce this in "qemu"
until a few days ago myself, but since Thursday, unfortunately, the
kernel would hang when booting. Your "VMWare" screenshots are very
useful!

Comment 7 David Nielsen 2006-06-18 19:57:11 UTC
Happens for me as well on x86_64 using the forcedeth driver doing an HTTP install.

Comment 8 Deji Akingunola 2006-06-18 20:48:52 UTC
Just want to add a 'me too' if that matters. Crashes on x86_64 using both
forcedeth and skge drivers with both ftp and http install attempt. 

Comment 9 Paul W. Frields 2006-06-18 21:51:16 UTC
Another 'me too'.

Comment 10 Thomas J. Baker 2006-06-19 12:56:00 UTC
I have the same problem trying to do a network install. Essentially the same
errors as #3. Broadcom gigE nic.

Comment 11 Jeremy Katz 2006-06-19 14:09:06 UTC
Can people seeing this problem try booting with 'linux noipv6' and see if that
helps?

Also, please be sure to give *any* command line arguments you're passing, no
matter how inconsequential they may seem.  

Comment 12 David Nielsen 2006-06-19 14:19:25 UTC
using 'linux noipv6' yields the same issue here although the dhcp probing seems
to take a lot shorter time it still does the amazing crash after givin anaconda
the mirror location.

Comment 13 Jeremy Katz 2006-06-19 14:46:24 UTC
Aha, finally reproduced by iterating through some DHCP options.

Is anyone actually getting back a DNS server in the DHCP server replies?  When I
have a DNS server, I can't get things to fail -- when I don't, I can hit the
segfault.  Debugging from ther enow.

Comment 14 Konrad Rzeszutek 2006-06-19 14:51:50 UTC
I am hitting this with x232 IBM server (hadn't tried other machines yet). I
tried: "linux askmethod noipv6" and have the same hang.

In regards to the DNS server in the DHCP server replies, I think I do get it as
the HTTP site (redhat.download.fedoraproject.org hostmae gets resolved). The log
(Vc-3), shows:

resjult of setupInterface is DHCP configuration failed - 1 Operation Not Permitted
reverse name lookup worked
starting to STEP_URL


Comment 15 Joachim Frieben 2006-06-19 15:17:29 UTC
Booting with "noipv6" behaves exactly as before. The error messages are
those posted in my initial report. The "DHCP" request is honored
succesfully but then this error ""ERROR: nic_configure: failed to
configure resolver." occurs which indeed points to a "DNS" problem. As
pointed out before, entering the download server address numerically
allows to avoid the termination, but then the system simpliy sits there.

Btw, the message "ERROR: DHCPv4 interface configuration failed." reported
earlier already shows that "IPV6" is not even used. I suppose that
explains why there is no difference when adding "noipv6".

Comment 16 Markku Kolkka 2006-06-19 16:18:44 UTC
Using "noipv6" makes no difference and the installer crashes at the same point
even if I define the IP information statically, so it's not DHCP related.

Comment 17 Jesse Keating 2006-06-19 18:21:31 UTC
This is being moved to a FC6 blocker.  We may be able to get out an updated
boot.iso after Test1 releases but this is not enough to hold up Test1.

Comment 18 Joachim Frieben 2006-06-19 19:04:58 UTC
I do agree. As long as installable media are provided for download,
which will of course be the case for FC6 test1, this is only a minor
loss of functionality. Thanks for inquiring.

Comment 19 Yanko Kaneti 2006-06-21 09:18:10 UTC
The beginning of that signal 6 crash backtrace reads something like

*** glibc-detected *** /sbin/loader: corrupted double-linked-list...

As for wether the dhcp server replies correctly, the same server works fine for
fc[12345] network installs,  DNS servers and all..

What might be particular here is that the install images are loaded via pxe and
there are two network adapters. Using the first (eth0, 3c59x) one.

Comment 20 Joachim Frieben 2006-06-21 10:58:17 UTC
Crashes exaclty as before for "anaconda-11.1.0.46-1".

Comment 21 David Cantrell 2006-06-27 20:44:41 UTC
These reports look exactly like all of the things I fixed today in rawhide. 
But, there's still probably some cases where loader will sigsegv.  I've fixed
HTTP installs in rawhide now (fixes the corrupted double-linked list, double
frees, etc that people are seeing).  Next anaconda build will have this stuff
included.

Closing this bug as RAWHIDE, but please feel free to reopen if the next anaconda
build is broken the same way.

For HTTP and FTP installs on an IPv4 network, there were two double free() calls
and one corrupted double linked list problem.  These happened after you'd see
"enter STEP_URL" on tty3.

Comment 22 Joachim Frieben 2006-07-01 10:07:41 UTC
Still broken in "anaconda-11.1.0.50-1".

Comment 23 Joachim Frieben 2006-07-03 18:27:52 UTC
Still broken in "anaconda-11.1.0.51-1". 

Comment 24 David Cantrell 2006-07-06 16:24:03 UTC
What about today's rawhide?  I'm not able to reproduce this problem under
anaconda 11.1.0.53.

Comment 25 Joachim Frieben 2006-07-06 20:42:57 UTC
Tested "anaconda-11.1.0.53-1" on 2 systems: "Dell GX280SF" and "IBM ThinkPad T23".

   Results:

   1. Installer crashes on both systems unless option "noipv6" is added.

   2. Installer crashes despite adding "noipv6" unless keyboard layout
      "us" is left unaltered. Changing the keyboard layout to
      "de-latin1-nodeadkeys" lets the installer crash at the same stage,
      namely after setting up the network connection.

   3. In all cases and regardless of any further option, the graphical
      installer does not launch properly. After the initial message of
      launching the "X" server, the screen turns black, and it is even
      impossible to switch to a text console. However, the system still
      responds to "ctrl-alt-del".

   4. Honoring 1. and 2. plus "text" option allow to install the system
      via the network, albeit in text mode only :(

PS: Even in text mode there is a lot of junk popping up in the background.
    Looks like armenian or some other exotic charset. At least, it does
    screw up the installation process.

Comment 26 Joachim Frieben 2006-07-06 20:46:04 UTC
Of course, I meant: "At least, it does *not* screw up the installation
process."

Comment 27 Joachim Frieben 2006-07-08 20:00:13 UTC
The text installer stops working in "anaconda-11.1.0.54-1". After
scanning the disk for existing FC installations, the following error
message by "anaconda" is displayed:

 'File "/usr/lib/anaconda/upgrade.py".
  line 68 in findRootParts
     for (dev, fs, meta) in
  anaconda.id.rootParts:
  ValueError: too many values to unpack'

The graphical installer still gets stuck when the "X" server is launched.
This happens on the 2 systems mentioned above ("S3" and "ATI" chip).

However: under "qemu", the graphical installer starts up nicely. IPv6
  was not enabled, neither by adding the "noipv6" option nor by
  unchecking the related networking option. The disk image was empty
  without any partitions which might explain why the installer did
  not crash in "upgrade.py". Choosing a German keyboard layout still
  crashes the installer.

Comment 28 David Cantrell 2006-07-13 18:38:40 UTC
Too many bugs listed here.  Please open new bugs as you find different problems.

Regarding the original problem, the loader2 network failures that prevent moving
to stage2 have been fixed in rawhide.  They will be in the next build of
anaconda.  The fixes won't appear until at least anaconda-11.1.0.57.

The log message "result of setupInterface is DHCP configuration failed - 1
Operation Not Permitted" was a problem in libdhcp and has been fixed as of
libdhcp-1.8.

Comment 29 Joachim Frieben 2006-07-15 17:48:13 UTC
Ok, submitted new report for bug 199015. "anaconda-11.1.0.57-1" crashes
exactly as reported in my initial report producing the same screen
output as submitted in comment 3 (default language/keyboard "English/US").
However, no "DHCP" errors are reported anymore. It seems that the origin
of this issue needs to be looked for somewhere else.


Note You need to log in before you can comment on or make changes to this bug.