Bug 151491 - Xen boots, networking fails, tracebacks on console and in dmesg
Xen boots, networking fails, tracebacks on console and in dmesg
Status: CLOSED UPSTREAM
Product: Fedora
Classification: Fedora
Component: xen (Show other bugs)
4
i386 Linux
medium Severity medium
: ---
: ---
Assigned To: Rik van Riel
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-03-18 10:55 EST by Keith Sharp
Modified: 2007-11-30 17:11 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-05-03 10:32:44 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
dmesg output (15.95 KB, text/plain)
2005-03-18 10:57 EST, Keith Sharp
no flags Details
Output from lspci (1.63 KB, text/plain)
2005-03-18 10:58 EST, Keith Sharp
no flags Details
Tracebacks from console booting with /lib/tls (6.32 KB, text/plain)
2005-03-18 10:59 EST, Keith Sharp
no flags Details
Tracebacks from console booting with /lib/tls moved to /lib/tls.disabled (4.27 KB, text/plain)
2005-03-18 11:00 EST, Keith Sharp
no flags Details
Console log from testing 2.6.11-1.1185_FC4xen0 (15.71 KB, text/plain)
2005-03-18 16:44 EST, Keith Sharp
no flags Details
dmesg output when in runlevel 3 (network up) no xend running (16.13 KB, text/plain)
2005-04-28 09:50 EDT, Keith Sharp
no flags Details

  None (edit)
Description Keith Sharp 2005-03-18 10:55:58 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050309 Epiphany/1.5.8

Description of problem:
It looks like Xen is not working with my ethernet card.  I boot in single user mode and everything is fine.  I then try to change to runlevel 3 and I get periodic tracebacks on the console and the whole thing is VERY slow.  Once I finally get to runlevel 3 I can login as root but the networking is not functioning.

The interface, eth0, appears to be up according to ifconfig and routing looks ok according to netstat, but I am unable to ping external address, nor are they able to ping me.  Looking in dmesg there are a bunch more tracebacks.

Version-Release number of selected component (if applicable):
kernel-xen0-2.6.11-1.1177_FC4 xen-2-20050308

How reproducible:
Always

Steps to Reproduce:
See description

Additional info:
Comment 1 Keith Sharp 2005-03-18 10:57:43 EST
Created attachment 112128 [details]
dmesg output

Running Xen dom0 kernel in runlevel 3 logged in as root
Comment 2 Keith Sharp 2005-03-18 10:58:57 EST
Created attachment 112129 [details]
Output from lspci
Comment 3 Keith Sharp 2005-03-18 10:59:33 EST
Created attachment 112130 [details]
Tracebacks from console booting with /lib/tls
Comment 4 Keith Sharp 2005-03-18 11:00:33 EST
Created attachment 112131 [details]
Tracebacks from console booting with /lib/tls moved to /lib/tls.disabled
Comment 5 Keith Sharp 2005-03-18 16:44:46 EST
Created attachment 112144 [details]
Console log from testing 2.6.11-1.1185_FC4xen0

Tried again with most recent kernel - looks like the same problem.
Comment 6 Rik van Riel 2005-04-26 14:00:26 EDT
Do things work when xend is not running ?

I have never seen that bug on my systems, so ...
Comment 7 Keith Sharp 2005-04-28 09:48:19 EDT
Same result both with and without xend.  Tested with kernel 2.6.11-1.1268_FC4
and xen 2-20050424.
Comment 8 Keith Sharp 2005-04-28 09:50:17 EDT
Created attachment 113778 [details]
dmesg output when in runlevel 3 (network up) no xend running
Comment 9 Keith Sharp 2005-04-28 09:54:49 EDT
I also raised this on the fedora-test list, but no response.  My email includes
more log files, etc:

https://www.redhat.com/archives/fedora-test-list/2005-April/msg00397.html
Comment 10 Rik van Riel 2005-04-28 09:57:14 EDT
Ohhh, looks like something (the e100 driver) is allocating memory with the wrong
flags. 

Jeff, does this look familiar to you?
Comment 11 Jeff Garzik 2005-04-28 12:23:18 EDT
The traceback is screwed, and therefore (unfortunately) not very useful.  If you
compare the e100 trace to the actual code, you see that the request_irq() code
path is never ever called from an interrupt.

Something weird and non-obvious is going on.

Comment 12 Keith Sharp 2005-04-28 12:37:53 EDT
As an additional data point I tried booting the Xen Live CD - I get, what looks
like, the same problem.
Comment 13 Rik van Riel 2005-04-28 12:59:36 EDT
In that case, your best bet would be to open a bug at http://bugzilla.xensource.com/

I get the bugs from that bugzilla too, but resolving the bug together with the
xen developers is probably going to get things along a bit faster - not to
mention they've only got 12 bugs in their bugzilla while I've got a few dozen
just on my own list ;)))
Comment 14 Keith Sharp 2005-04-28 15:08:33 EDT
Bug number 13 opened in the Xen bugzilla:

http://bugzilla.xensource.com/cgi-bin/bugzilla/show_bug.cgi?id=13
Comment 15 Keir Fraser 2005-04-29 05:06:48 EDT
(In reply to comment #11)
> The traceback is screwed, and therefore (unfortunately) not very useful.  If you
> compare the e100 trace to the actual code, you see that the request_irq() code
> path is never ever called from an interrupt.
> 
> Something weird and non-obvious is going on.

Something non-obvious is indeed going on: the root of all these problems is that the dev_watchdog is 
firing because packets are not being transmitted on the wire.

*However*, it also highlights a bug in the e100 driver --- it is not valid to call request_irq() from the 
tx_timeout handler. That handler is called in softirq context but request_irq() can sleep. This is bad. :-)

Comment 16 Keith Sharp 2005-05-03 10:32:44 EDT
Keir Fraser has identifed the problem as the lack of ACPI support in the Xen
kernels.  I have confirmed this by booting a standard Fedora kernel with ACPI
disabled and experiencing all sorts of problems related to the network card.

What do you want to do with this bug?  Leave it open until ACPI support lands in
the Xen tree and makes its way into the Fedora Xen Kernels?  Or close it as
being worked on upstream?
Comment 17 Rik van Riel 2005-05-03 10:45:52 EDT
You did the right thing by closing it here - cluttering this bugzilla with
issues that should be fixed upstream is just a distraction from the bugs that
should be fixed here.

Note You need to log in before you can comment on or make changes to this bug.