Red Hat Bugzilla – Bug 433755
Crash when installing Xen paravirt guest with >=16 NICS
Last modified: 2008-04-16 07:59:55 EDT
Description of problem:
Anaconda will crash during stage 1 (loader) when you try to install a Xen
paravirt guest with 16 or more network cards. Up to 15 is fine. According to
David Cantrell the problem is in the underlying libnl code.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Start a Xen paravirt guest with 16 or more NICs, http installation.
2. In loader select language and keyboard layout
3. Right after that anaconda crashes and the installation is aborted. I think
this is just before the screen where you can select which NIC to use for
Installation proceeds as normal
Created attachment 295489 [details]
Xen console showing that vifs don't have drivers (shows up even with 10 nics)
Created attachment 295490 [details]
traceback (need be very quick to capture that)
Created attachment 295492 [details]
script used to start the virtual guest
* execute as root
* script is appendign -m $MAC -b $bridge parameters to emulate higher number of
* MAC addresses do not conflict with other on the network or other Xen guests
According to David Cantrell it might be a bug in libnl not communicating with
the Xen kernel correctly.
This request was evaluated by Red Hat Product Management for
inclusion, but this component is not scheduled to be updated in
the current Red Hat Enterprise Linux release. If you would like
this request to be reviewed for the next minor release, ask your
support representative to set the next rhel-x.y flag to "?".
Add Thomas Graf...
The traceback in comment 2 is a kernel panic, right? If that's a kernel panic,
the kernel shouldn't be crashing based on anything a userspace program would do;
so a kernel patch would be in order here. Any thoughts Thomas?
Even if libnl is the trigger of the problem, it is definitely a kernel bug.
Would it be possible to define the environment variables NLCB=debug before the
program using libnl is invoked? The output would clarify what exactly is sent to
the kernel causing it to crash.
Created attachment 299373 [details]
initrd.img for RHEL5.2 Server 20080320.0/x86_64
sets the requested environment variable as the first line in init.c
Created attachment 299374 [details]
boot.iso for the same tree with the updated initrd.img
Can't test personally at the moment. Xen is causing my machine to crash and I
still don't know why.
Created attachment 299446 [details]
screen dump of UI
screen dump of the UI with initrd.img including init binary that setenv("NLCB",
this screen sump differs slightly from the one in comment #2
Created attachment 299449 [details]
Text output from 'xm console linux'
subsequent tests don't catch all the text.
is the provided information enough or you'll need something else? Although
NLCB=debug is set I don't know it this is providing the information you are
looking for or if it is set in the correct place.
Back to thomas; whenever this gets figured out I'm happy to patch up libnl for
5.2 and push through QE.
ping, thomas? thoughts?
This is not a libnl bug, it's a kernel bug entirely.
ok, over to kernel then... can you redirect as appropriate? Thanks!
It looks like a memory corruption issue. I'm currently away from home so is
there a test machine that I could use to test this until I get home next week?
Herbert, can we close this as not a bug for RHEL 5.2?
Yes, we can close this as a duplicate of #441390. Thanks!
Closing as a duplicate.
*** This bug has been marked as a duplicate of 441390 ***