Christine, Currently in RHEL4 U6 we are blacklisting a couple of family of Hitachi drives that SUN uses { "HITACHI HDS7250SASUN500G*", NULL, ATA_HORKAGE_NONCQ }, { "HITACHI HDS7225SBSUN250G*", NULL, ATA_HORKAGE_NONCQ }, If these drives are detected then NCQ will be disabled, this is consistent with the upstream linux kernel, there are several drives with NCQ problems blacklisted in the libata module. Currently RHEL4 U6 blacklists other drives with NCQ problems, { "WDC WD740ADFD-00", NULL, ATA_HORKAGE_NONCQ }, /* http://thread.gmane.org/gmane.linux.ide/14907 */ { "FUJITSU MHT2060BH", NULL, ATA_HORKAGE_NONCQ }, /* NCQ is broken */ { "Maxtor *", "BANC*", ATA_HORKAGE_NONCQ }, { "HITACHI HDS7250SASUN500G*", NULL, ATA_HORKAGE_NONCQ }, { "HITACHI HDS7225SBSUN250G*", NULL, ATA_HORKAGE_NONCQ }, /* Blacklist entries taken from Silicon Image 3124/3132 Windows driver .inf file - also several Linux problem reports */ { "HTS541060G9SA00", "MB3OC60D", ATA_HORKAGE_NONCQ, }, { "HTS541080G9SA00", "MB4OC60D", ATA_HORKAGE_NONCQ, }, { "HTS541010G9SA00", "MBZOC60D", ATA_HORKAGE_NONCQ, }, /* Drives which do spurious command completion */ { "HTS541680J9SA00", "SB2IC7EP", ATA_HORKAGE_NONCQ, }, { "HTS541612J9SA00", "SBDIC7JP", ATA_HORKAGE_NONCQ, }, { "Hitachi HTS541616J9SA00", "SB4OC70P", ATA_HORKAGE_NONCQ, }, { "WDC WD740ADFD-00NLR1", NULL, ATA_HORKAGE_NONCQ, }, { "FUJITSU MHV2080BH", "00840028", ATA_HORKAGE_NONCQ, }, My understanding is that Hitachi would eventually like the Hitachi drives to be un-blacklisted and NCQ enabled, that is the reason that I am opening up a discussion here hoping to get all the facts straight. Recently, there has been some information coming up through support that Hitachi believes that duplicate tags are being sent to the HITACHI *SUN* drives resulting in an abort leading to the hung state. Can you confirm this? It was also mentioned that other drives/firmware were better able to deal with a duplicate tag situation, and the sata_nv driver maybe partly at fault for allowing duplicate tags to be sent to the drive. Can you comment? Thank you, David
David, I have been working with the Hitachi team as well as our partners. Apparently support for the nVidia controller found in the SUN systems exhibiting the problem, was not released until kernel 2.6.20. The duplicate tag issue has also been corrected by a newer device driver. As for the Hitachi drives in the blacklist as posted above. There were known issues with NCQ in these products that may have been corrected with firmware. I could not get approval to disclose firmware level details. If there are other Hitachi model numbers that have been black listed. I am willing to review them to determine if there are known issues. On some occasions the issue is the SATA controller on the host, and there may be a driver update available to correct the behavior. Or a decision may have been made to not fix the behavior. For example, for parts that are no longer being manufactured. Thank you, Christine
Thanks for that info. For what it's worth, the general policy in Linux is to (a) fix our software, and NOT blacklist the device, if software is buggy (b) blacklist, if buggy device firmware is encountered in the field. In our experience, Linux users will not always upgrade ATA device firmware as needed -- therefore in order to ensure 100% data-safe operation, Linux drivers MUST be aware of problematic device firmware, even if it is for a part no longer manufactured. In addition, our software infrastructure regularly delivers kernel updates to our users. Given those two factors, we usually update the kernel ATA device errata list whenever an issue occurs in the field that is worked-around by, e.g., turning off the NCQ feature for that device. Our top priority is ensuring that our users' data is stored safely and reliably. Performance is secondary. Without correctness and fault tolerance, we don't have users :)
Some of the blacklist triggers we now know are false (when we saw a problem we checked the tags to see what was busy but that is racy at the hardware level). If there are devices which have specific NCQ problems then it would be great to know which ones to match and update the blacklist table accordingly. If we can't get the info from hitachi we'll have to guess and extract it from the windows drivers. That is likely to blacklist more drives that needed.
*** This bug has been marked as a duplicate of bug 430293 ***