Bug 179711 - RAID with SATA fails on drive un-plug
Summary: RAID with SATA fails on drive un-plug
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 4
Hardware: i386
OS: Linux
medium
high
Target Milestone: ---
Assignee: Dave Jones
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2006-02-02 10:28 UTC by Terry Barnaby
Modified: 2015-01-04 22:24 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-05-05 13:00:26 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Terry Barnaby 2006-02-02 10:28:46 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20050923 Fedora/1.7.12-1.5.1

Description of problem:
I have just set up a Raid 5 disk array using 4 SATA disks on Fedora 4.
To test the setup I unplugged the SATA cable from one of the disk drives.
I was expecting the system to carry on with messages from the Raid system
indicating that there was a disk drive down and an email to root indicating a
problem.

However the Raid 5 partition became completely inaccessable after un-plugging
the drive. The kernel reported disk errors but there was no error messages
from the Raid system and "mdadm -Q --detail /dev/md2" reported that there
was no problems with the Raid array.

Even worse if I access a file there is a long delay and then the program
returns with no error but no data. For example:
"cat /data/test-file" will delay and then exit with status of "0" but no file
contents are displayed. This is VERY VERY BAD ! 

When I rebooted the system (needed a reset) the Raid system reported that
one disk was down and the partition became readable again. This was the expected
behaviour.

I have tried the same test with a SCSI based Raid system and this works fine
as expected.

It appears that there is a bug in the SATA driver that does not react correctly
to a loss of a drive.

The SATA chip set being used is a:
"Intel Corporation 82801FB/FW (ICH6/ICH6W) SATA Controller (rev 04)"

The kernel error messages when a disk is remove are like:
ata2: command 0x35 timeout, stat 0x0 host_stat 0x61
ata2: command 0x25 timeout, stat 0x0 host_stat 0x61




Version-Release number of selected component (if applicable):
kernel-2.6.14-1.1656_FC4smp

How reproducible:
Always

Steps to Reproduce:
1. Set up a Raid 1 or 5 array using SATA disks
2. Unplug the SATA cable from a disk
3. Try and access a file on the raid partition
  

Additional info:

Comment 1 Dave Jones 2006-02-03 06:19:57 UTC
This is a mass-update to all currently open kernel bugs.

A new kernel update has been released (Version: 2.6.15-1.1830_FC4)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO_REPORTER state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

Thank you.


Comment 2 John Thacker 2006-05-05 13:00:26 UTC
Closing due to lack of response.


Note You need to log in before you can comment on or make changes to this bug.