Bug 223199 - T60p with XEN guest running a dedicated PCI device looses SATA disk when creating disk IO
Summary: T60p with XEN guest running a dedicated PCI device looses SATA disk when crea...
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen
Version: 5.0
Hardware: i386
OS: Linux
medium
high
Target Milestone: ---
: ---
Assignee: Xen Maintainance List
QA Contact: Martin Jenner
URL:
Whiteboard:
Depends On:
Blocks: 492570
TreeView+ depends on / blocked
 
Reported: 2007-01-18 12:13 UTC by Joachim Schröder
Modified: 2009-07-29 09:52 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-07-29 09:52:58 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
screenshot of the irq lost message (508.40 KB, image/jpeg)
2007-01-18 12:19 UTC, Joachim Schröder
no flags Details
dmesg (23.27 KB, text/plain)
2007-01-23 09:36 UTC, Joachim Schröder
no flags Details
xm dmesg (6.45 KB, text/plain)
2007-01-23 09:37 UTC, Joachim Schröder
no flags Details
interrupts being in use from /proc/interrupts (1.35 KB, text/plain)
2007-01-23 09:58 UTC, Joachim Schröder
no flags Details

Description Joachim Schröder 2007-01-18 12:13:20 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.4 (like Gecko)

Description of problem:
I have got one XEN instance called "gateway" running; it owns the ipw3945 
device to route WLAN traffic to the internal eth:
---snip ---
[root@t60p-jschrode ~]# xm list
Name                                      ID Mem(MiB) VCPUs State   Time(s)
Domain-0                                   0     1756     2 r-----     61.3
gateway                                    3      256     1 -b----     18.2

[root@t60p-jschrode ~]# cat /data/xen/gateway.xencfg
# xen config file

# this is a paravirtualized guest
name = "gateway"
memory = "256"
uuid = "8223c51d-3059-07fd-7236-6c324403cbcb"

# installation time:
#kernel="/inst/images/xen/vmlinuz"
#ramdisk="/inst/images/xen/initrd.img"
# for installation while being connected to a network with DHCP, remove the ip 
and netmask parts in the following line:
#extra="ks=http://192.168.60.1/ks/gateway/gateway-ks.cfg ip=192.168.60.254 
netmask=255.255.255.0"
#extra="ks=http://gateway.demo.redhat.com/cobbler/kickstarts/rhel5s-xen/ks.cfg"
#extra="linux rescue"
#on_reboot   = 'destroy'

# installed system:
bootloader="/usr/bin/pygrub"
on_reboot   = 'restart'

# PCI hideback for LAN and WLAN
#pci = [ '02:00.0' , '03:00.0' ]
# WLAN only
pci = [ '03:00.0' ]

# installation time:
#disk = 
[ 'phy:/dev/VolGroup00/gateway.boot,xvda,w', 'phy:/dev/VolGroup00/gateway.root,xvdb,w', 'phy:/dev/VolGroup00/gateway.swap,xvdd,w', ]
# installed system:
disk = 
[ 'phy:/dev/VolGroup00/gateway.boot,xvda,w', 'phy:/dev/VolGroup00/gateway.root,xvdb1,w', 'phy:/dev/VolGroup00/gateway.swap,xvdb2,w', ]

vif = [ 'mac=00:16:3e:a8:3c:fe, bridge=xenbr0', ]

on_crash    = 'destroy'
vnc=1
vncunused=0
--- snap ---

The system is registered against webqa. If I try to run an update or create 
any other heavy disk/net IO, the system looses the disk:
[root@t60p-jschrode ~]# yum update
[...]
irq21: nobody cared
[...]
Disabling IRQ #21

Now the system disk is not accessible anymore, journal cannot be committed.
I attach a screenshot with the complete message.


Version-Release number of selected component (if applicable):


How reproducible:
Always


Steps to Reproduce:
1. Start a XEN guest with a dedicated PCI (network) device on a T60p
2. run "yum update" or produce heavy net/disk IO


Actual Results:
IRQ ist lost, SATA disk not accessible anymore, working with the system not 
possible anymore

Expected Results:
./.

Additional info:

Comment 1 Joachim Schröder 2007-01-18 12:19:28 UTC
Created attachment 145906 [details]
screenshot of the irq lost message

Comment 2 Stephen Tweedie 2007-01-18 13:29:12 UTC
Can you please post full dom0 boot logs, both "xm dmesg" and kernel "dmesg"?


Comment 3 Joachim Schröder 2007-01-23 09:36:59 UTC
Created attachment 146276 [details]
dmesg

this is the dmesg, xm dmesg will follow
both are _before_ the system IO freezes, because I can't access the system
after the oops anymore

Comment 4 Joachim Schröder 2007-01-23 09:37:36 UTC
Created attachment 146277 [details]
xm dmesg

Comment 5 Joachim Schröder 2007-01-23 09:58:02 UTC
Created attachment 146278 [details]
interrupts being in use from /proc/interrupts

Comment 7 Stephen Tweedie 2007-01-24 23:52:30 UTC
There was a nasty bug in 1.3002 which might just be implicated here; can you
please try to reproduce with 2.6.18-4.el5?  Thanks.


Comment 8 Joachim Schröder 2007-01-25 12:40:03 UTC
sorry for putting the bug to the wrong state!

I now updated dom0 to latest webqa kernel (2.6.18-5.el5xen) but still 
experienced the problem.

FYI, in between I also switched the SATA mode in the T60p's BIOS from AHCI to 
Compatible (using module ata_piix instead of ahci) but the problem still 
existed, so I switched back to AHCI.

Comment 9 Joachim Schröder 2007-02-02 11:22:25 UTC
I just tried with the latest kernel from webqa, 2.6.18-8.el5xen, still the 
same behaviour, after downloading several megabytes via ipw3945 in the virtual 
guest the host looses IRQ #21 and is left unusable.
Is there anything I can do to help?

Comment 10 Stephen Tweedie 2007-02-02 12:13:52 UTC
It would be helpful to get the _full_ log of the error when it hits.  Can you
please try to set up serial console or netconsole?  For a laptop, serial console
likely implies a docking station these days, but netconsole should still work. 
(You'll need to modprobe netconsole manually; "modinfo netconsole" should show
you the required parameters, and you can redirect the output to syslogd on
another host.  If you set it up once xend is running, you'll need to run it on
peth0, not eth0, too.)


Comment 11 Red Hat Bugzilla 2007-07-25 01:23:24 UTC
change QA contact

Comment 12 Chris Lalancette 2009-07-29 09:52:58 UTC
This has been in NEEDINFO for over 2 years.  I'm going to close it out for now; if you are still having the problem, and want to pursue the problem, please feel free to re-open.

Chris Lalancette


Note You need to log in before you can comment on or make changes to this bug.