Bug 101727 - (IDE CSB6 IDE_TAPE)Hard lockups on Servers with ServerWorks CSB6 SouthBridge
Summary: (IDE CSB6 IDE_TAPE)Hard lockups on Servers with ServerWorks CSB6 SouthBridge
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: kernel
Version: 9
Hardware: i686
OS: Linux
medium
high
Target Milestone: ---
Assignee: Arjan van de Ven
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2003-08-06 09:03 UTC by Peter Robinson
Modified: 2007-04-18 16:56 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2003-08-12 04:09:52 UTC
Embargoed:


Attachments (Terms of Use)

Description Peter Robinson 2003-08-06 09:03:22 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4b)
Gecko/20030516 Mozilla Firebird/0.6

Description of problem:
The problem is that when the IDE bus is underload the machine locks solid with a
powercycle required to recover it. This especially comes to light when the
server is backed up due to the amount of disk activity on the ide bus.

This is reproducable on machines with SCSI Disks and IDE AIT Tape drives, IDE
Disks and SCSI (AIT and others) Tape drives and both IDE disks and AIT Drives.

The machines that this has been experienced most is the Acer Altos G300 and the
G301 Servers (slightly newer revision of the hardware).

This problem has been around for quiet sometime. I have had the problem from
Redhat 7.3 - 9 with both the original kernels and all errata applied.

I have been able to partially work around the problem by placing the following
kernel parameters in the kernel boot string 'ide0=noautotune ide1=noautotune
ide2=noautotune' and then tunin the kernel not to use UltraDMA but only
multiword2 dma. This reduces the amount of lock ups but they still happen from
time to time. I beleive (from the linux kernel) that  the problem existed also
for the CSB5 chipset.

The boot process from proceses the following output when booting
Uniform Multi-Platform E-IDE driver Revision: 7.00beta3-.2.4
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
SvrWks CSB6: IDE controller at PCI slot 00:0e.0
PCI: Guessed IRQ 11 for device 00:0e.0
SvrWks CSB6: chipset revision 160
SvrWks CSB6: 100% native mode on irq 11
    ide2: BM-DMA at 0x1400-0x1407, BIOS settings: hde:DMA, hdf:DMA
SvrWks CSB6: IDE controller at PCI slot 00:0f.1
SvrWks CSB6: chipset revision 160
SvrWks CSB6: not 100% native mode: will probe irqs later
    ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:pio, hdb:pio
    ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc:DMA, hdd:pio
hdc: FX54++W, ATAPI CD/DVD-ROM drive
hde: SONY SDX-420C, ATAPI TAPE drive
ide1 at 0x170-0x177,0x376 on irq 15
ide2 at 0x1e8-0x1ef,0x3ee on irq 11
ide-floppy driver 0.99.newide
ide-floppy driver 0.99.newide

And when a backup is started the following is logged to the syslog before it
locks up:
Aug  5 22:00:00 engineering kernel: hde: attached ide-tape driver.
Aug  5 22:00:00 engineering kernel: ide-tape: hde <-> ht0: SONY SDX-420C rev 0103
Aug  5 22:00:00 engineering kernel: ide-tape: hde: overriding
capabilities->speed (assuming 650KB/sec)
Aug  5 22:00:00 engineering kernel: ide-tape: hde: overriding
capabilities->max_speed (assuming 650KB/sec)
Aug  5 22:00:00 engineering kernel: ide-tape: decreasing stage size
Aug  5 22:00:00 engineering last message repeated 2 times
Aug  5 22:00:00 engineering kernel: ide-tape: hde <-> ht0: 650KBps, 62*32kB
buffer, 6336kB pipeline, 100ms tDSC, DMA

The command that seems to give the most stability without killing the machine
performance is hdparm -A1 -c3 -d0 -u1 -Xmdma2 /dev/hde

The IDE AIT tape drives also don't appear to support setting the hardware
compression using the mt command like you can with the SCSI equivilant.

Version-Release number of selected component (if applicable):
kernel-2.4.20-19.9

How reproducible:
Always

Steps to Reproduce:
1. Boot
2. Try to backup machine to tape using 'tar cvf /dev/ht0 /'
3. Crash!
    

Actual Results:  The machine locked solid, no kernel panic just totally
inresponsive and never returns (even after leaving it for 10 hours over night)

Expected Results:  The system should have been backed up to tape

Additional info:

See Description

Comment 1 Alan Cox 2003-08-08 10:38:14 UTC
It doesn't explain the crash but Sony I believe recommend the use of the
ide-scsi driver for their tapes (and in general we do for the later tape
drives). If you use ide-scsi do you see the same problems ?


Comment 2 Peter Robinson 2003-08-09 06:24:38 UTC
I haven't tried, but will do on Monday when out onsite. It still doesn't explain
the crashes on one setup which has IDE disks in RAID 1 config with SCSI Tape
drive  and the same crashes though.

The machine just locks solid with no real output to the screen/logs.

Comment 3 Peter Robinson 2003-08-12 04:09:52 UTC
Having replaced the ide-tape driver with ide-scsi and used the scsi tape driver
the boxes now appear to be much more stable and don't lockup on a backup (at
least initially).

using the ide-scsi and the st tape driver I still can't use 'mt -f /dev/st0
compression' to enable the hardware compression on the drive which means
software compression is the only way to get more than 35Gs on the tape and the
cpu increase for the duration of the backup is huge. I take it that this is due
to the required interface isn't supported in the ide-scsi driver?


Comment 4 Alan Cox 2003-08-12 11:12:51 UTC
Tapes use ATAPI if I remember rightly so ide-scsi should be able to send any
command to the drive.


Comment 5 Peter Robinson 2003-08-13 15:13:40 UTC
Yes all the IDE tape drives I've seen are ATAPI but don't seem to support all
the things (such as hardware compression) under Linux. If I run the mt command I
get an I/O error (see below) and nothing appears to happen. Whereas the same
command with as basically identical SCSI version will just complete as expected.

[root@mail root]# mt -f /dev/st0 compression
/dev/st0: Input/output error



Note You need to log in before you can comment on or make changes to this bug.