Bug 202822 - sata_via timeouts under heavy I/O load _IF_ irqbalance running
Summary: sata_via timeouts under heavy I/O load _IF_ irqbalance running
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 5
Hardware: i686
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2006-08-16 16:32 UTC by JuanJo Ciarlante
Modified: 2009-12-26 19:40 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-02-05 13:34:51 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description JuanJo Ciarlante 2006-08-16 16:32:43 UTC
Description of problem:
While copying a whole FC5 installed tree +data (aprox. 60Gb) from PATA to SATA
disk driven by sata_via stops responding (timeout):
   ata1: command 0xca timeout, stat 0x50 host_stat 0x24
   ata1: status=0x50 { DriveReady SeekComplete }
   sda: Current: sense key: No Sense
     Additional sense: No additional sense information
     Info fld=0x5aa8240

I changed SATA cable _and_ SATA HD (same model: Hitachi HDT722516DLA380): same
problem.

Version-Release number of selected component (if applicable):
kernel-smp-2.6.17-1.2157_FC5

How reproducible:
Steps to Reproduce:
1. Be sure that irqbalance is running (it was enabled my FC5 install)
   /etc/init.d/irqbalance start
2. switch to text console (for kernel msgs)
3. do some heavy I/O to SATA disk
   in my case was something like: rsync --sparse -HXavx /home/ /mnt/sata/home/
  
Actual results:
Stuck sata_via controller, dmesg shows:
  ata1: command 0xca timeout, stat 0x50 host_stat 0x24
  ata1: status=0x50 { DriveReady SeekComplete }
  sda: Current: sense key: No Sense
     Additional sense: No additional sense information
     Info fld=0x5aa8240

Expected results:
Finished copy, normal I/O operations.

IMPORTANT NOTE (workaround): if I stop irqbalance I cannot reproduce the problem
(about ~2hs trying by now), tip from:
   http://lkml.org/lkml/2006/7/29/2

Additional info:
# dmesg | egrep -i via (see "FIXUP" msgs)
PCI: Bypassing VIA 8237 APIC De-Assert Message
agpgart: Detected VIA P4M800CE chipset
PCI: VIA IRQ fixup for 0000:00:0f.1, from 255 to 9
VP_IDE: VIA vt8237 (rev 00) IDE UDMA133 controller on pci0000:00:0f.1
sata_via 0000:00:0f.0: version 1.1
sata_via 0000:00:0f.0: routed to hard irq line 11
scsi0 : sata_via
scsi1 : sata_via
via-rhine.c:v1.10-LK1.2.0-2.6 June-10-2004 Written by Donald Becker
eth2: VIA Rhine II at 0xf8122000, 00:16:ec:67:5d:bd, IRQ 193.
PCI: VIA IRQ fixup for 0000:00:10.0, from 10 to 1
PCI: VIA IRQ fixup for 0000:00:10.1, from 10 to 1
PCI: VIA IRQ fixup for 0000:00:10.2, from 11 to 1
PCI: VIA IRQ fixup for 0000:00:10.3, from 11 to 1
Netfilter messages via NETLINK v0.30.
via-rhine.c:v1.10-LK1.2.0-2.6 June-10-2004 Written by Donald Becker
eth0: VIA Rhine II at 0xf8122000, 00:16:ec:67:5d:bd, IRQ 193.
sata_via 0000:00:0f.0: version 1.1
sata_via 0000:00:0f.0: routed to hard irq line 11
scsi2 : sata_via
scsi3 : sata_via

# lspci
00:00.0 Host bridge: VIA Technologies, Inc. P4M800CE Host Bridge
00:00.1 Host bridge: VIA Technologies, Inc. P4M800CE Host Bridge
00:00.2 Host bridge: VIA Technologies, Inc. P4M800CE Host Bridge
00:00.3 Host bridge: VIA Technologies, Inc. PT890 Host Bridge
00:00.4 Host bridge: VIA Technologies, Inc. P4M800CE Host Bridge
00:00.7 Host bridge: VIA Technologies, Inc. P4M800CE Host Bridge
00:01.0 PCI bridge: VIA Technologies, Inc. VT8237 PCI Bridge
00:08.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado] (rev 78)
00:09.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado] (rev 78)
00:0f.0 IDE interface: VIA Technologies, Inc. VIA VT6420 SATA RAID Controller
(rev 80)
00:0f.1 IDE interface: VIA Technologies, Inc.
VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller
(rev 81)
00:10.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller
(rev 81)
00:10.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller
(rev 81)
00:10.3 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller
(rev 81)
00:10.4 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 86)
00:11.0 ISA bridge: VIA Technologies, Inc. VT8237 ISA bridge
[KT600/K8T800/K8T890 South]
00:11.5 Multimedia audio controller: VIA Technologies, Inc. VT8233/A/8235/8237
AC97 Audio Controller (rev 60)
00:12.0 Ethernet controller: VIA Technologies, Inc. VT6102 [Rhine-II] (rev 78)
01:00.0 VGA compatible controller: ATI Technologies Inc RV280 [Radeon 9200 PRO]
(rev 01)
01:00.1 Display controller: ATI Technologies Inc RV280 [Radeon 9200 PRO]
(Secondary) (rev 01)

Comment 1 JuanJo Ciarlante 2006-08-16 16:37:44 UTC
I'd forgotten: 
Given that my primary disk is PATA /dev/hda, I could "revive" SATA access 
(even before doing irqbalance workaround) by:
   echo "scsi remove-single-device 0 0 0 0" > /proc/scsi/scsi
   dmsetup remove_all
   umount -f /mnt/sata/ ...
   rmmod sata_via
   modprobe sata_via
   vgchange -ay
   mount ... useit...
   

Comment 2 Dave Jones 2006-10-16 21:51:01 UTC
A new kernel update has been released (Version: 2.6.18-1.2200.fc5)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

In the last few updates, some users upgrading from FC4->FC5
have reported that installing a kernel update has left their
systems unbootable. If you have been affected by this problem
please check you only have one version of device-mapper & lvm2
installed.  See bug 207474 for further details.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

If this bug has been fixed, but you are now experiencing a different
problem, please file a separate bug for the new problem.

Thank you.

Comment 3 Dave Jones 2006-11-24 22:18:15 UTC
This bug has been mass-closed along with all other bugs that
have been in NEEDINFO state for several months.

Due to the large volume of inactive bugs in bugzilla, this
is the only method we have of cleaning out stale bug reports
where the reporter has disappeared.

If you can reproduce this bug after installing all the
current updates, please reopen this bug.

If you are not the reporter, you can add a comment requesting
it be reopened, and someone will get to it asap.

Thank you.

Comment 4 Klaus Franken 2006-11-30 07:36:19 UTC
(In reply to comment #2)
> A new kernel update has been released (Version: 2.6.18-1.2200.fc5)
> based upon a new upstream kernel release.
> 
> Please retest against this new kernel, as a large number of patches
> go into each upstream release, possibly including changes that
> may address this problem.
> 

It is the same problem with kernel 2.6.18-1.2849.fc6
I tested several versions of kernels (i386 / x86_64, smp / nosmp) for fc5 and 
6. Always the same trouble.

After deactivating irqbalance everything is OK.

Comment 5 Jon Stanley 2008-02-05 13:34:51 UTC
Closing since there was an error in previous mass-close and they remained in
NEEDINFO.

Comment 6 Slawomir Czarko 2009-12-26 19:40:48 UTC
I have the same kind of problem after installing Fedora 12 on a laptop with VIA chipset (A5440[4261] from one.de).

I installed Fedora 12 and then I'd get these time-outs after a few minutes when running initial "yum update" after the installation.

Just disabling irqbalance significantly reduced the frequency of the time-outs. I got the same timeouts again though when running "yum install" to add additional software.

On the same system with disabled irqbalance I later also added irqpoll to kernel command line. This seems to have removed the time-outs completely(?). So far I was able to run "yum update" and "yum install". I also tested the laptop running 10 copies of bonnie++ and haven't seen these time-outs any more.

Let me know what data should I provide to help fixing this bug.


Note You need to log in before you can comment on or make changes to this bug.