Bug 361061 - SUN/Hitachi SATA NCQ drives currently need to be black listed as NCQ doesn't seem to work properly.
Summary: SUN/Hitachi SATA NCQ drives currently need to be black listed as NCQ doesn't ...
Keywords:
Status: CLOSED DUPLICATE of bug 430293
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.5
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: David Milburn
QA Contact: Martin Jenner
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-10-31 22:16 UTC by Issue Tracker
Modified: 2010-03-19 17:21 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-03-19 17:21:23 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Comment 8 David Milburn 2007-11-30 16:12:15 UTC
Christine,

Currently in RHEL4 U6 we are blacklisting a couple of family of Hitachi drives
that SUN uses

        { "HITACHI HDS7250SASUN500G*", NULL,    ATA_HORKAGE_NONCQ },
        { "HITACHI HDS7225SBSUN250G*", NULL,    ATA_HORKAGE_NONCQ },

If these drives are detected then NCQ will be disabled, this is consistent with
the upstream linux kernel, there are several drives with NCQ problems
blacklisted in the libata module. Currently RHEL4 U6 blacklists other drives
with NCQ problems, 

        { "WDC WD740ADFD-00",   NULL,           ATA_HORKAGE_NONCQ },
        /* http://thread.gmane.org/gmane.linux.ide/14907 */
        { "FUJITSU MHT2060BH",  NULL,           ATA_HORKAGE_NONCQ },
        /* NCQ is broken */
        { "Maxtor *",           "BANC*",        ATA_HORKAGE_NONCQ },
        { "HITACHI HDS7250SASUN500G*", NULL,    ATA_HORKAGE_NONCQ },
        { "HITACHI HDS7225SBSUN250G*", NULL,    ATA_HORKAGE_NONCQ },

        /* Blacklist entries taken from Silicon Image 3124/3132
           Windows driver .inf file - also several Linux problem reports */
        { "HTS541060G9SA00",    "MB3OC60D",     ATA_HORKAGE_NONCQ, },
        { "HTS541080G9SA00",    "MB4OC60D",     ATA_HORKAGE_NONCQ, },
        { "HTS541010G9SA00",    "MBZOC60D",     ATA_HORKAGE_NONCQ, },
        /* Drives which do spurious command completion */
        { "HTS541680J9SA00",    "SB2IC7EP",     ATA_HORKAGE_NONCQ, },
        { "HTS541612J9SA00",    "SBDIC7JP",     ATA_HORKAGE_NONCQ, },
        { "Hitachi HTS541616J9SA00", "SB4OC70P", ATA_HORKAGE_NONCQ, },
        { "WDC WD740ADFD-00NLR1", NULL,         ATA_HORKAGE_NONCQ, },
        { "FUJITSU MHV2080BH",  "00840028",     ATA_HORKAGE_NONCQ, },

My understanding is that Hitachi would eventually like the Hitachi drives to be
un-blacklisted and NCQ enabled, that is the reason that I am opening up a
discussion here hoping to get all the facts straight.

Recently, there has been some information coming up through support that Hitachi
believes that duplicate tags are being sent to the HITACHI *SUN* drives
resulting in an abort leading to the hung state. Can you confirm this?

It was also mentioned that other drives/firmware were better able to deal with a
duplicate tag situation, and the sata_nv driver maybe partly at fault for
allowing duplicate tags to be sent to the drive.

Can you comment?

Thank you,
David

Comment 9 Christine Reid 2008-01-16 23:12:28 UTC
David,

I have been working with the Hitachi team as well as our partners. Apparently 
support for the nVidia controller found in the SUN systems exhibiting the 
problem, was not released until kernel 2.6.20.  The duplicate tag issue has 
also been corrected by a newer device driver.

As for the Hitachi drives in the blacklist as posted above.  There were known 
issues with NCQ in these products that may have been corrected with firmware.  
I could not get approval to disclose firmware level details.  If there are 
other Hitachi model numbers that have been black listed.  I am willing to 
review them to determine if there are known issues.  On some occasions the 
issue is the SATA controller on the host, and there may be a driver update 
available to correct the behavior.  Or a decision may have been made to not fix 
the behavior.  For example, for parts that are no longer being manufactured.

Thank you,
Christine

Comment 10 Jeff Garzik 2008-01-16 23:27:33 UTC
Thanks for that info.

For what it's worth, the general policy in Linux is to

(a) fix our software, and NOT blacklist the device, if software is buggy

(b) blacklist, if buggy device firmware is encountered in the field.

In our experience, Linux users will not always upgrade ATA device firmware as
needed -- therefore in order to ensure 100% data-safe operation, Linux drivers
MUST be aware of problematic device firmware, even if it is for a part no longer
manufactured.

In addition, our software infrastructure regularly delivers kernel updates to
our users.

Given those two factors, we usually update the kernel ATA device errata list
whenever an issue occurs in the field that is worked-around by, e.g., turning
off the NCQ feature for that device.

Our top priority is ensuring that our users' data is stored safely and reliably.
 Performance is secondary.  Without correctness and fault tolerance, we don't
have users :)


Comment 11 Alan Cox 2008-01-17 01:57:15 UTC
Some of the blacklist triggers we now know are false (when we saw a problem we
checked the tags to see what was busy but that is racy at the hardware level).
If there are devices which have specific NCQ problems then it would be great to
know which ones to match and update the blacklist table accordingly. If we can't
get the info from hitachi we'll have to guess and extract it from the windows
drivers. That is likely to blacklist more drives that needed.


Comment 13 David Milburn 2010-03-19 17:21:23 UTC

*** This bug has been marked as a duplicate of bug 430293 ***


Note You need to log in before you can comment on or make changes to this bug.