Red Hat Bugzilla – Bug 151491
Xen boots, networking fails, tracebacks on console and in dmesg
Last modified: 2007-11-30 17:11:02 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050309 Epiphany/1.5.8
Description of problem:
It looks like Xen is not working with my ethernet card. I boot in single user mode and everything is fine. I then try to change to runlevel 3 and I get periodic tracebacks on the console and the whole thing is VERY slow. Once I finally get to runlevel 3 I can login as root but the networking is not functioning.
The interface, eth0, appears to be up according to ifconfig and routing looks ok according to netstat, but I am unable to ping external address, nor are they able to ping me. Looking in dmesg there are a bunch more tracebacks.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
Created attachment 112128 [details]
Running Xen dom0 kernel in runlevel 3 logged in as root
Created attachment 112129 [details]
Output from lspci
Created attachment 112130 [details]
Tracebacks from console booting with /lib/tls
Created attachment 112131 [details]
Tracebacks from console booting with /lib/tls moved to /lib/tls.disabled
Created attachment 112144 [details]
Console log from testing 2.6.11-1.1185_FC4xen0
Tried again with most recent kernel - looks like the same problem.
Do things work when xend is not running ?
I have never seen that bug on my systems, so ...
Same result both with and without xend. Tested with kernel 2.6.11-1.1268_FC4
and xen 2-20050424.
Created attachment 113778 [details]
dmesg output when in runlevel 3 (network up) no xend running
I also raised this on the fedora-test list, but no response. My email includes
more log files, etc:
Ohhh, looks like something (the e100 driver) is allocating memory with the wrong
Jeff, does this look familiar to you?
The traceback is screwed, and therefore (unfortunately) not very useful. If you
compare the e100 trace to the actual code, you see that the request_irq() code
path is never ever called from an interrupt.
Something weird and non-obvious is going on.
As an additional data point I tried booting the Xen Live CD - I get, what looks
like, the same problem.
In that case, your best bet would be to open a bug at http://bugzilla.xensource.com/
I get the bugs from that bugzilla too, but resolving the bug together with the
xen developers is probably going to get things along a bit faster - not to
mention they've only got 12 bugs in their bugzilla while I've got a few dozen
just on my own list ;)))
Bug number 13 opened in the Xen bugzilla:
(In reply to comment #11)
> The traceback is screwed, and therefore (unfortunately) not very useful. If you
> compare the e100 trace to the actual code, you see that the request_irq() code
> path is never ever called from an interrupt.
> Something weird and non-obvious is going on.
Something non-obvious is indeed going on: the root of all these problems is that the dev_watchdog is
firing because packets are not being transmitted on the wire.
*However*, it also highlights a bug in the e100 driver --- it is not valid to call request_irq() from the
tx_timeout handler. That handler is called in softirq context but request_irq() can sleep. This is bad. :-)
Keir Fraser has identifed the problem as the lack of ACPI support in the Xen
kernels. I have confirmed this by booting a standard Fedora kernel with ACPI
disabled and experiencing all sorts of problems related to the network card.
What do you want to do with this bug? Leave it open until ACPI support lands in
the Xen tree and makes its way into the Fedora Xen Kernels? Or close it as
being worked on upstream?
You did the right thing by closing it here - cluttering this bugzilla with
issues that should be fixed upstream is just a distraction from the bugs that
should be fixed here.