Bug 464868 - incorrect ATA7 handing in kernel causing ABRT errors
incorrect ATA7 handing in kernel causing ABRT errors
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.2
All Linux
urgent Severity medium
: rc
: ---
Assigned To: David Milburn
Martin Jenner
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-09-30 18:09 EDT by Kevin Fenzi
Modified: 2009-01-20 14:36 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-01-20 14:36:08 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Kevin Fenzi 2008-09-30 18:09:01 EDT
Greetings. 

We have been running into a problem with Hitachi Deskstar P7K500 500GB 7200 RPM SATA Hard Drives. We run new drives through a cycle of 'badblocks -svw -p 5 /dev/sda' before moving them into production. 

We are seeing the following errors when running badblocks: 

[ 8524.147171] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action
[ 8524.147178] ata1.00: BMDMA stat 0x25
[ 8524.147185] ata1.00: cmd c8/00:80:80:ff:ff/00:00:00:00:00/ef tag 0 dma
6553in
[ 8524.147187] ata1.00: res 51/04:80:80:ff:ff/00:00:00:00:00/ef Emask 0x1
(deve error)
[ 8524.147192] ata1.00: status: { DRDY ERR }
[ 8524.147277] ata1.00: error: { ABRT }

We have tried the drives in a variety of controllers, all with the same results. 

We have contacted Hitachi, and they say that this is a bug in the way the kernel handles ATA7: 

"Further analysis of the screen shot provided shows the drive is aborting
commands.  The host is issuing a 28 bit commands over the 48 bit boundary.
The products that are passing did not abort the command.

In summary, the 500g not passing your burn-in test isn't a drive problem
but rather conforming to the ATA7 spec.  Below is what I extracted from ATA
7 section 4.2.2:
Addressing constraints and error reporting  Devices shall set IDNF to one
or ABRT to one in the Error register and ERR to one in the Status register
in response to any command where the requested LBA number is greater than
or equal to the content of words (61:60) for a 28-bit addressing command or
greater or equal to the contents of words (103:100) for a 48-bit addressing
command.

Your burn-in software should be modified to comply with the ATA7 specs.
Doing so will allow the drive to function without aborted commands."

It would be nice if the kernel could be patched to handle this case. 
As these drives and others conforming to the ATA7 spec are more widespread 
I think this bug could hit a lot of people. 

Happy to provide more information or run tests on the drives.
Comment 1 Alan Cox 2008-10-06 09:18:39 EDT
Its a corner case in the spec that caught everyone out when one vendor shipped some very pedantic (but correct) firmware. Everyone else is quite happy with the LBA28 command for that block so it never got seen

Fix is trivial and is in 2.6.27-rc so can be pulled back easily enough as its simply a change by 1 of a comparison
Comment 3 Kevin Fenzi 2008-10-06 13:53:09 EDT
Excellent news. Let me know if you have a patch or patched kernel to test.
Comment 4 Alan Cox 2008-10-09 10:16:16 EDT
Commit:     97b697a11b07e2ebfa69c488132596cc5eb24119

Gitweb:     http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=97b697a11b07e2ebfa69c488132596cc5eb24119
Comment 5 David Milburn 2008-12-01 18:13:40 EST
Hello Kevin,

Would you be able to do a quick verification on the kernel-2.6.18-124.el5.bz464868.1 test kernel? Thanks.

http://people.redhat.com/dmilburn/
Comment 6 Kevin Fenzi 2008-12-02 16:11:35 EST
Yep. Seems to solve the issue here. ;)
Comment 9 Don Zickus 2008-12-09 16:04:21 EST
in kernel-2.6.18-126.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5
Comment 12 errata-xmlrpc 2009-01-20 14:36:08 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-0225.html

Note You need to log in before you can comment on or make changes to this bug.