Bug 464868 - incorrect ATA7 handing in kernel causing ABRT errors
Summary: incorrect ATA7 handing in kernel causing ABRT errors
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.2
Hardware: All
OS: Linux
urgent
medium
Target Milestone: rc
: ---
Assignee: David Milburn
QA Contact: Martin Jenner
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-09-30 22:09 UTC by Kevin Fenzi
Modified: 2009-01-20 19:36 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-01-20 19:36:08 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2009:0225 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.3 kernel security and bug fix update 2009-01-20 16:06:24 UTC

Description Kevin Fenzi 2008-09-30 22:09:01 UTC
Greetings. 

We have been running into a problem with Hitachi Deskstar P7K500 500GB 7200 RPM SATA Hard Drives. We run new drives through a cycle of 'badblocks -svw -p 5 /dev/sda' before moving them into production. 

We are seeing the following errors when running badblocks: 

[ 8524.147171] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action
[ 8524.147178] ata1.00: BMDMA stat 0x25
[ 8524.147185] ata1.00: cmd c8/00:80:80:ff:ff/00:00:00:00:00/ef tag 0 dma
6553in
[ 8524.147187] ata1.00: res 51/04:80:80:ff:ff/00:00:00:00:00/ef Emask 0x1
(deve error)
[ 8524.147192] ata1.00: status: { DRDY ERR }
[ 8524.147277] ata1.00: error: { ABRT }

We have tried the drives in a variety of controllers, all with the same results. 

We have contacted Hitachi, and they say that this is a bug in the way the kernel handles ATA7: 

"Further analysis of the screen shot provided shows the drive is aborting
commands.  The host is issuing a 28 bit commands over the 48 bit boundary.
The products that are passing did not abort the command.

In summary, the 500g not passing your burn-in test isn't a drive problem
but rather conforming to the ATA7 spec.  Below is what I extracted from ATA
7 section 4.2.2:
Addressing constraints and error reporting  Devices shall set IDNF to one
or ABRT to one in the Error register and ERR to one in the Status register
in response to any command where the requested LBA number is greater than
or equal to the content of words (61:60) for a 28-bit addressing command or
greater or equal to the contents of words (103:100) for a 48-bit addressing
command.

Your burn-in software should be modified to comply with the ATA7 specs.
Doing so will allow the drive to function without aborted commands."

It would be nice if the kernel could be patched to handle this case. 
As these drives and others conforming to the ATA7 spec are more widespread 
I think this bug could hit a lot of people. 

Happy to provide more information or run tests on the drives.

Comment 1 Alan Cox 2008-10-06 13:18:39 UTC
Its a corner case in the spec that caught everyone out when one vendor shipped some very pedantic (but correct) firmware. Everyone else is quite happy with the LBA28 command for that block so it never got seen

Fix is trivial and is in 2.6.27-rc so can be pulled back easily enough as its simply a change by 1 of a comparison

Comment 3 Kevin Fenzi 2008-10-06 17:53:09 UTC
Excellent news. Let me know if you have a patch or patched kernel to test.

Comment 4 Alan Cox 2008-10-09 14:16:16 UTC
Commit:     97b697a11b07e2ebfa69c488132596cc5eb24119

Gitweb:     http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=97b697a11b07e2ebfa69c488132596cc5eb24119

Comment 5 David Milburn 2008-12-01 23:13:40 UTC
Hello Kevin,

Would you be able to do a quick verification on the kernel-2.6.18-124.el5.bz464868.1 test kernel? Thanks.

http://people.redhat.com/dmilburn/

Comment 6 Kevin Fenzi 2008-12-02 21:11:35 UTC
Yep. Seems to solve the issue here. ;)

Comment 9 Don Zickus 2008-12-09 21:04:21 UTC
in kernel-2.6.18-126.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 12 errata-xmlrpc 2009-01-20 19:36:08 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-0225.html


Note You need to log in before you can comment on or make changes to this bug.