Bug 223199

Summary: T60p with XEN guest running a dedicated PCI device looses SATA disk when creating disk IO
Product: Red Hat Enterprise Linux 5 Reporter: Joachim Schröder <jschrode>
Component: kernel-xenAssignee: Xen Maintainance List <xen-maint>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Martin Jenner <mjenner>
Severity: high Docs Contact:
Priority: medium    
Version: 5.0CC: clalance, xen-maint
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-07-29 09:52:58 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 492570    
Attachments:
Description Flags
screenshot of the irq lost message
none
dmesg
none
xm dmesg
none
interrupts being in use from /proc/interrupts none

Description Joachim Schröder 2007-01-18 12:13:20 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.4 (like Gecko)

Description of problem:
I have got one XEN instance called "gateway" running; it owns the ipw3945 
device to route WLAN traffic to the internal eth:
---snip ---
[root@t60p-jschrode ~]# xm list
Name                                      ID Mem(MiB) VCPUs State   Time(s)
Domain-0                                   0     1756     2 r-----     61.3
gateway                                    3      256     1 -b----     18.2

[root@t60p-jschrode ~]# cat /data/xen/gateway.xencfg
# xen config file

# this is a paravirtualized guest
name = "gateway"
memory = "256"
uuid = "8223c51d-3059-07fd-7236-6c324403cbcb"

# installation time:
#kernel="/inst/images/xen/vmlinuz"
#ramdisk="/inst/images/xen/initrd.img"
# for installation while being connected to a network with DHCP, remove the ip 
and netmask parts in the following line:
#extra="ks=http://192.168.60.1/ks/gateway/gateway-ks.cfg ip=192.168.60.254 
netmask=255.255.255.0"
#extra="ks=http://gateway.demo.redhat.com/cobbler/kickstarts/rhel5s-xen/ks.cfg"
#extra="linux rescue"
#on_reboot   = 'destroy'

# installed system:
bootloader="/usr/bin/pygrub"
on_reboot   = 'restart'

# PCI hideback for LAN and WLAN
#pci = [ '02:00.0' , '03:00.0' ]
# WLAN only
pci = [ '03:00.0' ]

# installation time:
#disk = 
[ 'phy:/dev/VolGroup00/gateway.boot,xvda,w', 'phy:/dev/VolGroup00/gateway.root,xvdb,w', 'phy:/dev/VolGroup00/gateway.swap,xvdd,w', ]
# installed system:
disk = 
[ 'phy:/dev/VolGroup00/gateway.boot,xvda,w', 'phy:/dev/VolGroup00/gateway.root,xvdb1,w', 'phy:/dev/VolGroup00/gateway.swap,xvdb2,w', ]

vif = [ 'mac=00:16:3e:a8:3c:fe, bridge=xenbr0', ]

on_crash    = 'destroy'
vnc=1
vncunused=0
--- snap ---

The system is registered against webqa. If I try to run an update or create 
any other heavy disk/net IO, the system looses the disk:
[root@t60p-jschrode ~]# yum update
[...]
irq21: nobody cared
[...]
Disabling IRQ #21

Now the system disk is not accessible anymore, journal cannot be committed.
I attach a screenshot with the complete message.


Version-Release number of selected component (if applicable):


How reproducible:
Always


Steps to Reproduce:
1. Start a XEN guest with a dedicated PCI (network) device on a T60p
2. run "yum update" or produce heavy net/disk IO


Actual Results:
IRQ ist lost, SATA disk not accessible anymore, working with the system not 
possible anymore

Expected Results:
./.

Additional info:

Comment 1 Joachim Schröder 2007-01-18 12:19:28 UTC
Created attachment 145906 [details]
screenshot of the irq lost message

Comment 2 Stephen Tweedie 2007-01-18 13:29:12 UTC
Can you please post full dom0 boot logs, both "xm dmesg" and kernel "dmesg"?


Comment 3 Joachim Schröder 2007-01-23 09:36:59 UTC
Created attachment 146276 [details]
dmesg

this is the dmesg, xm dmesg will follow
both are _before_ the system IO freezes, because I can't access the system
after the oops anymore

Comment 4 Joachim Schröder 2007-01-23 09:37:36 UTC
Created attachment 146277 [details]
xm dmesg

Comment 5 Joachim Schröder 2007-01-23 09:58:02 UTC
Created attachment 146278 [details]
interrupts being in use from /proc/interrupts

Comment 7 Stephen Tweedie 2007-01-24 23:52:30 UTC
There was a nasty bug in 1.3002 which might just be implicated here; can you
please try to reproduce with 2.6.18-4.el5?  Thanks.


Comment 8 Joachim Schröder 2007-01-25 12:40:03 UTC
sorry for putting the bug to the wrong state!

I now updated dom0 to latest webqa kernel (2.6.18-5.el5xen) but still 
experienced the problem.

FYI, in between I also switched the SATA mode in the T60p's BIOS from AHCI to 
Compatible (using module ata_piix instead of ahci) but the problem still 
existed, so I switched back to AHCI.

Comment 9 Joachim Schröder 2007-02-02 11:22:25 UTC
I just tried with the latest kernel from webqa, 2.6.18-8.el5xen, still the 
same behaviour, after downloading several megabytes via ipw3945 in the virtual 
guest the host looses IRQ #21 and is left unusable.
Is there anything I can do to help?

Comment 10 Stephen Tweedie 2007-02-02 12:13:52 UTC
It would be helpful to get the _full_ log of the error when it hits.  Can you
please try to set up serial console or netconsole?  For a laptop, serial console
likely implies a docking station these days, but netconsole should still work. 
(You'll need to modprobe netconsole manually; "modinfo netconsole" should show
you the required parameters, and you can redirect the output to syslogd on
another host.  If you set it up once xend is running, you'll need to run it on
peth0, not eth0, too.)


Comment 11 Red Hat Bugzilla 2007-07-25 01:23:24 UTC
change QA contact

Comment 12 Chris Lalancette 2009-07-29 09:52:58 UTC
This has been in NEEDINFO for over 2 years.  I'm going to close it out for now; if you are still having the problem, and want to pursue the problem, please feel free to re-open.

Chris Lalancette