Bug 433755

Summary: Crash when installing Xen paravirt guest with >=16 NICS
Product: Red Hat Enterprise Linux 5 Reporter: Alexander Todorov <atodorov>
Component: kernel-xenAssignee: Herbert Xu <herbert.xu>
Status: CLOSED DUPLICATE QA Contact: desktop-bugs <desktop-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 5.2CC: herbert.xu, jlaska, kernel-mgr, tgraf, xen-maint
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-04-16 11:59:55 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Xen console showing that vifs don't have drivers (shows up even with 10 nics)
none
traceback (need be very quick to capture that)
none
script used to start the virtual guest
none
initrd.img for RHEL5.2 Server 20080320.0/x86_64
none
boot.iso for the same tree with the updated initrd.img
none
screen dump of UI
none
text output none

Description Alexander Todorov 2008-02-21 09:47:35 UTC
Description of problem:
Anaconda will crash during stage 1 (loader) when you try to install a Xen
paravirt guest with 16 or more network cards. Up to 15 is fine. According to
David Cantrell the problem is in the underlying libnl code.

Version-Release number of selected component (if applicable):
libnl-1.0-0.10.pre5.4.x86_64.rpm
anaconda-11.1.2.101-1.x86_64.rpm


How reproducible:
100%

Steps to Reproduce:
1. Start a Xen paravirt guest with 16 or more NICs, http installation.
2. In loader select language and keyboard layout
3. Right after that anaconda crashes and the installation is aborted. I think
this is just before the screen where you can select which NIC to use for
installation.
  
Actual results:
Crash

Expected results:
Installation proceeds as normal

Additional info:
See attachements

Comment 1 Alexander Todorov 2008-02-21 09:47:35 UTC
Created attachment 295489 [details]
Xen console showing that vifs don't have drivers (shows up even with 10 nics)

Comment 2 Alexander Todorov 2008-02-21 09:48:59 UTC
Created attachment 295490 [details]
traceback (need be very quick to capture that)

Comment 3 Alexander Todorov 2008-02-21 09:52:18 UTC
Created attachment 295492 [details]
script used to start the virtual guest

* execute as root
* script is appendign -m $MAC -b $bridge parameters to emulate higher number of
NICs
* MAC addresses do not conflict with other on the network or other Xen guests

Comment 4 Alexander Todorov 2008-02-21 09:55:43 UTC
CC'ing kernel-mgr.
According to David Cantrell it might be a bug in libnl not communicating with
the Xen kernel correctly.

Comment 5 RHEL Program Management 2008-02-21 09:57:32 UTC
This request was evaluated by Red Hat Product Management for
inclusion, but this component is not scheduled to be updated in
the current Red Hat Enterprise Linux release. If you would like
this request to be reviewed for the next minor release, ask your
support representative to set the next rhel-x.y flag to "?".

Comment 6 Dan Williams 2008-02-21 14:30:23 UTC
Add Thomas Graf...

The traceback in comment 2 is a kernel panic, right?  If that's a kernel panic,
the kernel shouldn't be crashing based on anything a userspace program would do;
so a kernel patch would be in order here.  Any thoughts Thomas?

Comment 7 Thomas Graf 2008-02-21 20:06:01 UTC
Even if libnl is the trigger of the problem, it is definitely a kernel bug.

Would it be possible to define the environment variables NLCB=debug before the
program using libnl is invoked? The output would clarify what exactly is sent to
the kernel causing it to crash.

Comment 8 Alexander Todorov 2008-03-27 18:11:32 UTC
Created attachment 299373 [details]
initrd.img for RHEL5.2 Server 20080320.0/x86_64 

sets the requested environment variable as the first line in init.c

Comment 9 Alexander Todorov 2008-03-27 18:12:29 UTC
Created attachment 299374 [details]
boot.iso for the same tree with the updated initrd.img

Can't test personally at the moment. Xen is causing my machine to crash and I
still don't know why.

Comment 10 Alexander Todorov 2008-03-28 10:43:58 UTC
Created attachment 299446 [details]
screen dump of UI

screen dump of the UI with initrd.img including init binary that setenv("NLCB",
"debug", 1)

this screen sump differs slightly from the one in comment #2

Comment 11 Alexander Todorov 2008-03-28 11:01:31 UTC
Created attachment 299449 [details]
text output

Text output from 'xm console linux'
subsequent tests don't catch all the text.

Comment 12 Alexander Todorov 2008-03-28 11:03:58 UTC
Dan,
is the provided information enough or you'll need something else? Although
NLCB=debug is set I don't know it this is providing the information you are
looking for or if it is set in the correct place.

Comment 13 Dan Williams 2008-03-28 18:29:42 UTC
Back to thomas; whenever this gets figured out I'm happy to patch up libnl for
5.2 and push through QE.

Comment 17 Dan Williams 2008-04-04 18:01:00 UTC
ping, thomas?  thoughts?

Comment 18 Thomas Graf 2008-04-04 18:30:24 UTC
This is not a libnl bug, it's a kernel bug entirely.

Comment 19 Dan Williams 2008-04-04 18:40:07 UTC
ok, over to kernel then...  can you redirect as appropriate?  Thanks!

Comment 20 Herbert Xu 2008-04-10 14:19:27 UTC
It looks like a memory corruption issue.  I'm currently away from home so is
there a test machine that I could use to test this until I get home next week?
Thanks!

Comment 27 Bill Burns 2008-04-15 15:03:10 UTC
Herbert, can we close this as not a bug for RHEL 5.2?

Comment 28 Herbert Xu 2008-04-16 01:23:44 UTC
Yes, we can close this as a duplicate of #441390.  Thanks!

Comment 29 Bill Burns 2008-04-16 11:59:55 UTC
Closing as a duplicate.


*** This bug has been marked as a duplicate of 441390 ***