Bug 433755 - Crash when installing Xen paravirt guest with >=16 NICS
Crash when installing Xen paravirt guest with >=16 NICS
Status: CLOSED DUPLICATE of bug 441390
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen (Show other bugs)
5.2
x86_64 Linux
high Severity high
: rc
: ---
Assigned To: Herbert Xu
desktop-bugs@redhat.com
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-02-21 04:47 EST by Alexander Todorov
Modified: 2008-04-16 07:59 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-04-16 07:59:55 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Xen console showing that vifs don't have drivers (shows up even with 10 nics) (31.64 KB, image/png)
2008-02-21 04:47 EST, Alexander Todorov
no flags Details
traceback (need be very quick to capture that) (33.87 KB, image/png)
2008-02-21 04:48 EST, Alexander Todorov
no flags Details
script used to start the virtual guest (447 bytes, text/plain)
2008-02-21 04:52 EST, Alexander Todorov
no flags Details
initrd.img for RHEL5.2 Server 20080320.0/x86_64 (5.48 MB, application/octet-stream)
2008-03-27 14:11 EDT, Alexander Todorov
no flags Details
boot.iso for the same tree with the updated initrd.img (34.00 KB, application/x-cd-image)
2008-03-27 14:12 EDT, Alexander Todorov
no flags Details
screen dump of UI (423.82 KB, image/png)
2008-03-28 06:43 EDT, Alexander Todorov
no flags Details
text output (2.83 KB, text/plain)
2008-03-28 07:01 EDT, Alexander Todorov
no flags Details

  None (edit)
Description Alexander Todorov 2008-02-21 04:47:35 EST
Description of problem:
Anaconda will crash during stage 1 (loader) when you try to install a Xen
paravirt guest with 16 or more network cards. Up to 15 is fine. According to
David Cantrell the problem is in the underlying libnl code.

Version-Release number of selected component (if applicable):
libnl-1.0-0.10.pre5.4.x86_64.rpm
anaconda-11.1.2.101-1.x86_64.rpm


How reproducible:
100%

Steps to Reproduce:
1. Start a Xen paravirt guest with 16 or more NICs, http installation.
2. In loader select language and keyboard layout
3. Right after that anaconda crashes and the installation is aborted. I think
this is just before the screen where you can select which NIC to use for
installation.
  
Actual results:
Crash

Expected results:
Installation proceeds as normal

Additional info:
See attachements
Comment 1 Alexander Todorov 2008-02-21 04:47:35 EST
Created attachment 295489 [details]
Xen console showing that vifs don't have drivers (shows up even with 10 nics)
Comment 2 Alexander Todorov 2008-02-21 04:48:59 EST
Created attachment 295490 [details]
traceback (need be very quick to capture that)
Comment 3 Alexander Todorov 2008-02-21 04:52:18 EST
Created attachment 295492 [details]
script used to start the virtual guest

* execute as root
* script is appendign -m $MAC -b $bridge parameters to emulate higher number of
NICs
* MAC addresses do not conflict with other on the network or other Xen guests
Comment 4 Alexander Todorov 2008-02-21 04:55:43 EST
CC'ing kernel-mgr@redhat.com.
According to David Cantrell it might be a bug in libnl not communicating with
the Xen kernel correctly.
Comment 5 RHEL Product and Program Management 2008-02-21 04:57:32 EST
This request was evaluated by Red Hat Product Management for
inclusion, but this component is not scheduled to be updated in
the current Red Hat Enterprise Linux release. If you would like
this request to be reviewed for the next minor release, ask your
support representative to set the next rhel-x.y flag to "?".
Comment 6 Dan Williams 2008-02-21 09:30:23 EST
Add Thomas Graf...

The traceback in comment 2 is a kernel panic, right?  If that's a kernel panic,
the kernel shouldn't be crashing based on anything a userspace program would do;
so a kernel patch would be in order here.  Any thoughts Thomas?
Comment 7 Thomas Graf 2008-02-21 15:06:01 EST
Even if libnl is the trigger of the problem, it is definitely a kernel bug.

Would it be possible to define the environment variables NLCB=debug before the
program using libnl is invoked? The output would clarify what exactly is sent to
the kernel causing it to crash.
Comment 8 Alexander Todorov 2008-03-27 14:11:32 EDT
Created attachment 299373 [details]
initrd.img for RHEL5.2 Server 20080320.0/x86_64 

sets the requested environment variable as the first line in init.c
Comment 9 Alexander Todorov 2008-03-27 14:12:29 EDT
Created attachment 299374 [details]
boot.iso for the same tree with the updated initrd.img

Can't test personally at the moment. Xen is causing my machine to crash and I
still don't know why.
Comment 10 Alexander Todorov 2008-03-28 06:43:58 EDT
Created attachment 299446 [details]
screen dump of UI

screen dump of the UI with initrd.img including init binary that setenv("NLCB",
"debug", 1)

this screen sump differs slightly from the one in comment #2
Comment 11 Alexander Todorov 2008-03-28 07:01:31 EDT
Created attachment 299449 [details]
text output

Text output from 'xm console linux'
subsequent tests don't catch all the text.
Comment 12 Alexander Todorov 2008-03-28 07:03:58 EDT
Dan,
is the provided information enough or you'll need something else? Although
NLCB=debug is set I don't know it this is providing the information you are
looking for or if it is set in the correct place.
Comment 13 Dan Williams 2008-03-28 14:29:42 EDT
Back to thomas; whenever this gets figured out I'm happy to patch up libnl for
5.2 and push through QE.
Comment 17 Dan Williams 2008-04-04 14:01:00 EDT
ping, thomas?  thoughts?
Comment 18 Thomas Graf 2008-04-04 14:30:24 EDT
This is not a libnl bug, it's a kernel bug entirely.
Comment 19 Dan Williams 2008-04-04 14:40:07 EDT
ok, over to kernel then...  can you redirect as appropriate?  Thanks!
Comment 20 Herbert Xu 2008-04-10 10:19:27 EDT
It looks like a memory corruption issue.  I'm currently away from home so is
there a test machine that I could use to test this until I get home next week?
Thanks!
Comment 27 Bill Burns 2008-04-15 11:03:10 EDT
Herbert, can we close this as not a bug for RHEL 5.2?
Comment 28 Herbert Xu 2008-04-15 21:23:44 EDT
Yes, we can close this as a duplicate of #441390.  Thanks!
Comment 29 Bill Burns 2008-04-16 07:59:55 EDT
Closing as a duplicate.


*** This bug has been marked as a duplicate of 441390 ***

Note You need to log in before you can comment on or make changes to this bug.