Bug 223199 - T60p with XEN guest running a dedicated PCI device looses SATA disk when creating disk IO
T60p with XEN guest running a dedicated PCI device looses SATA disk when crea...
Status: CLOSED INSUFFICIENT_DATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen (Show other bugs)
5.0
i386 Linux
medium Severity high
: ---
: ---
Assigned To: Xen Maintainance List
Martin Jenner
:
Depends On:
Blocks: 492570
  Show dependency treegraph
 
Reported: 2007-01-18 07:13 EST by Joachim Schröder
Modified: 2009-07-29 05:52 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-07-29 05:52:58 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
screenshot of the irq lost message (508.40 KB, image/jpeg)
2007-01-18 07:19 EST, Joachim Schröder
no flags Details
dmesg (23.27 KB, text/plain)
2007-01-23 04:36 EST, Joachim Schröder
no flags Details
xm dmesg (6.45 KB, text/plain)
2007-01-23 04:37 EST, Joachim Schröder
no flags Details
interrupts being in use from /proc/interrupts (1.35 KB, text/plain)
2007-01-23 04:58 EST, Joachim Schröder
no flags Details

  None (edit)
Description Joachim Schröder 2007-01-18 07:13:20 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.4 (like Gecko)

Description of problem:
I have got one XEN instance called "gateway" running; it owns the ipw3945 
device to route WLAN traffic to the internal eth:
---snip ---
[root@t60p-jschrode ~]# xm list
Name                                      ID Mem(MiB) VCPUs State   Time(s)
Domain-0                                   0     1756     2 r-----     61.3
gateway                                    3      256     1 -b----     18.2

[root@t60p-jschrode ~]# cat /data/xen/gateway.xencfg
# xen config file

# this is a paravirtualized guest
name = "gateway"
memory = "256"
uuid = "8223c51d-3059-07fd-7236-6c324403cbcb"

# installation time:
#kernel="/inst/images/xen/vmlinuz"
#ramdisk="/inst/images/xen/initrd.img"
# for installation while being connected to a network with DHCP, remove the ip 
and netmask parts in the following line:
#extra="ks=http://192.168.60.1/ks/gateway/gateway-ks.cfg ip=192.168.60.254 
netmask=255.255.255.0"
#extra="ks=http://gateway.demo.redhat.com/cobbler/kickstarts/rhel5s-xen/ks.cfg"
#extra="linux rescue"
#on_reboot   = 'destroy'

# installed system:
bootloader="/usr/bin/pygrub"
on_reboot   = 'restart'

# PCI hideback for LAN and WLAN
#pci = [ '02:00.0' , '03:00.0' ]
# WLAN only
pci = [ '03:00.0' ]

# installation time:
#disk = 
[ 'phy:/dev/VolGroup00/gateway.boot,xvda,w', 'phy:/dev/VolGroup00/gateway.root,xvdb,w', 'phy:/dev/VolGroup00/gateway.swap,xvdd,w', ]
# installed system:
disk = 
[ 'phy:/dev/VolGroup00/gateway.boot,xvda,w', 'phy:/dev/VolGroup00/gateway.root,xvdb1,w', 'phy:/dev/VolGroup00/gateway.swap,xvdb2,w', ]

vif = [ 'mac=00:16:3e:a8:3c:fe, bridge=xenbr0', ]

on_crash    = 'destroy'
vnc=1
vncunused=0
--- snap ---

The system is registered against webqa. If I try to run an update or create 
any other heavy disk/net IO, the system looses the disk:
[root@t60p-jschrode ~]# yum update
[...]
irq21: nobody cared
[...]
Disabling IRQ #21

Now the system disk is not accessible anymore, journal cannot be committed.
I attach a screenshot with the complete message.


Version-Release number of selected component (if applicable):


How reproducible:
Always


Steps to Reproduce:
1. Start a XEN guest with a dedicated PCI (network) device on a T60p
2. run "yum update" or produce heavy net/disk IO


Actual Results:
IRQ ist lost, SATA disk not accessible anymore, working with the system not 
possible anymore

Expected Results:
./.

Additional info:
Comment 1 Joachim Schröder 2007-01-18 07:19:28 EST
Created attachment 145906 [details]
screenshot of the irq lost message
Comment 2 Stephen Tweedie 2007-01-18 08:29:12 EST
Can you please post full dom0 boot logs, both "xm dmesg" and kernel "dmesg"?
Comment 3 Joachim Schröder 2007-01-23 04:36:59 EST
Created attachment 146276 [details]
dmesg

this is the dmesg, xm dmesg will follow
both are _before_ the system IO freezes, because I can't access the system
after the oops anymore
Comment 4 Joachim Schröder 2007-01-23 04:37:36 EST
Created attachment 146277 [details]
xm dmesg
Comment 5 Joachim Schröder 2007-01-23 04:58:02 EST
Created attachment 146278 [details]
interrupts being in use from /proc/interrupts
Comment 7 Stephen Tweedie 2007-01-24 18:52:30 EST
There was a nasty bug in 1.3002 which might just be implicated here; can you
please try to reproduce with 2.6.18-4.el5?  Thanks.
Comment 8 Joachim Schröder 2007-01-25 07:40:03 EST
sorry for putting the bug to the wrong state!

I now updated dom0 to latest webqa kernel (2.6.18-5.el5xen) but still 
experienced the problem.

FYI, in between I also switched the SATA mode in the T60p's BIOS from AHCI to 
Compatible (using module ata_piix instead of ahci) but the problem still 
existed, so I switched back to AHCI.
Comment 9 Joachim Schröder 2007-02-02 06:22:25 EST
I just tried with the latest kernel from webqa, 2.6.18-8.el5xen, still the 
same behaviour, after downloading several megabytes via ipw3945 in the virtual 
guest the host looses IRQ #21 and is left unusable.
Is there anything I can do to help?
Comment 10 Stephen Tweedie 2007-02-02 07:13:52 EST
It would be helpful to get the _full_ log of the error when it hits.  Can you
please try to set up serial console or netconsole?  For a laptop, serial console
likely implies a docking station these days, but netconsole should still work. 
(You'll need to modprobe netconsole manually; "modinfo netconsole" should show
you the required parameters, and you can redirect the output to syslogd on
another host.  If you set it up once xend is running, you'll need to run it on
peth0, not eth0, too.)
Comment 11 Red Hat Bugzilla 2007-07-24 21:23:24 EDT
change QA contact
Comment 12 Chris Lalancette 2009-07-29 05:52:58 EDT
This has been in NEEDINFO for over 2 years.  I'm going to close it out for now; if you are still having the problem, and want to pursue the problem, please feel free to re-open.

Chris Lalancette

Note You need to log in before you can comment on or make changes to this bug.