Bug 151491

Summary: Xen boots, networking fails, tracebacks on console and in dmesg
Product: [Fedora] Fedora Reporter: Keith Sharp <kms>
Component: xenAssignee: Rik van Riel <riel>
Status: CLOSED UPSTREAM QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 4CC: jgarzik, kaf24
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-05-03 10:32:44 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Attachments:
Description Flags
dmesg output
none
Output from lspci
none
Tracebacks from console booting with /lib/tls
none
Tracebacks from console booting with /lib/tls moved to /lib/tls.disabled
none
Console log from testing 2.6.11-1.1185_FC4xen0
none
dmesg output when in runlevel 3 (network up) no xend running none

Description Keith Sharp 2005-03-18 10:55:58 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050309 Epiphany/1.5.8

Description of problem:
It looks like Xen is not working with my ethernet card.  I boot in single user mode and everything is fine.  I then try to change to runlevel 3 and I get periodic tracebacks on the console and the whole thing is VERY slow.  Once I finally get to runlevel 3 I can login as root but the networking is not functioning.

The interface, eth0, appears to be up according to ifconfig and routing looks ok according to netstat, but I am unable to ping external address, nor are they able to ping me.  Looking in dmesg there are a bunch more tracebacks.

Version-Release number of selected component (if applicable):
kernel-xen0-2.6.11-1.1177_FC4 xen-2-20050308

How reproducible:
Always

Steps to Reproduce:
See description

Additional info:
Comment 1 Keith Sharp 2005-03-18 10:57:43 EST
Created attachment 112128 [details]
dmesg output

Running Xen dom0 kernel in runlevel 3 logged in as root
Comment 2 Keith Sharp 2005-03-18 10:58:57 EST
Created attachment 112129 [details]
Output from lspci
Comment 3 Keith Sharp 2005-03-18 10:59:33 EST
Created attachment 112130 [details]
Tracebacks from console booting with /lib/tls
Comment 4 Keith Sharp 2005-03-18 11:00:33 EST
Created attachment 112131 [details]
Tracebacks from console booting with /lib/tls moved to /lib/tls.disabled
Comment 5 Keith Sharp 2005-03-18 16:44:46 EST
Created attachment 112144 [details]
Console log from testing 2.6.11-1.1185_FC4xen0

Tried again with most recent kernel - looks like the same problem.
Comment 6 Rik van Riel 2005-04-26 14:00:26 EDT
Do things work when xend is not running ?

I have never seen that bug on my systems, so ...
Comment 7 Keith Sharp 2005-04-28 09:48:19 EDT
Same result both with and without xend.  Tested with kernel 2.6.11-1.1268_FC4
and xen 2-20050424.
Comment 8 Keith Sharp 2005-04-28 09:50:17 EDT
Created attachment 113778 [details]
dmesg output when in runlevel 3 (network up) no xend running
Comment 9 Keith Sharp 2005-04-28 09:54:49 EDT
I also raised this on the fedora-test list, but no response.  My email includes
more log files, etc:

https://www.redhat.com/archives/fedora-test-list/2005-April/msg00397.html
Comment 10 Rik van Riel 2005-04-28 09:57:14 EDT
Ohhh, looks like something (the e100 driver) is allocating memory with the wrong
flags. 

Jeff, does this look familiar to you?
Comment 11 Jeff Garzik 2005-04-28 12:23:18 EDT
The traceback is screwed, and therefore (unfortunately) not very useful.  If you
compare the e100 trace to the actual code, you see that the request_irq() code
path is never ever called from an interrupt.

Something weird and non-obvious is going on.

Comment 12 Keith Sharp 2005-04-28 12:37:53 EDT
As an additional data point I tried booting the Xen Live CD - I get, what looks
like, the same problem.
Comment 13 Rik van Riel 2005-04-28 12:59:36 EDT
In that case, your best bet would be to open a bug at http://bugzilla.xensource.com/

I get the bugs from that bugzilla too, but resolving the bug together with the
xen developers is probably going to get things along a bit faster - not to
mention they've only got 12 bugs in their bugzilla while I've got a few dozen
just on my own list ;)))
Comment 14 Keith Sharp 2005-04-28 15:08:33 EDT
Bug number 13 opened in the Xen bugzilla:

http://bugzilla.xensource.com/cgi-bin/bugzilla/show_bug.cgi?id=13
Comment 15 Keir Fraser 2005-04-29 05:06:48 EDT
(In reply to comment #11)
> The traceback is screwed, and therefore (unfortunately) not very useful.  If you
> compare the e100 trace to the actual code, you see that the request_irq() code
> path is never ever called from an interrupt.
> 
> Something weird and non-obvious is going on.

Something non-obvious is indeed going on: the root of all these problems is that the dev_watchdog is 
firing because packets are not being transmitted on the wire.

*However*, it also highlights a bug in the e100 driver --- it is not valid to call request_irq() from the 
tx_timeout handler. That handler is called in softirq context but request_irq() can sleep. This is bad. :-)

Comment 16 Keith Sharp 2005-05-03 10:32:44 EDT
Keir Fraser has identifed the problem as the lack of ACPI support in the Xen
kernels.  I have confirmed this by booting a standard Fedora kernel with ACPI
disabled and experiencing all sorts of problems related to the network card.

What do you want to do with this bug?  Leave it open until ACPI support lands in
the Xen tree and makes its way into the Fedora Xen Kernels?  Or close it as
being worked on upstream?
Comment 17 Rik van Riel 2005-05-03 10:45:52 EDT
You did the right thing by closing it here - cluttering this bugzilla with
issues that should be fixed upstream is just a distraction from the bugs that
should be fixed here.