464868 – incorrect ATA7 handing in kernel causing ABRT errors

Bug 464868 - incorrect ATA7 handing in kernel causing ABRT errors

Summary: incorrect ATA7 handing in kernel causing ABRT errors

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.2
Hardware:	All
OS:	Linux
Priority:	urgent
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	David Milburn
QA Contact:	Martin Jenner
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2008-09-30 22:09 UTC by Kevin Fenzi
Modified:	2009-01-20 19:36 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2009-01-20 19:36:08 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2009:0225	0	normal	SHIPPED_LIVE	Important: Red Hat Enterprise Linux 5.3 kernel security and bug fix update	2009-01-20 16:06:24 UTC

Description Kevin Fenzi 2008-09-30 22:09:01 UTC

Greetings.

We have been running into a problem with Hitachi Deskstar P7K500 500GB 7200 RPM SATA Hard Drives. We run new drives through a cycle of 'badblocks -svw -p 5 /dev/sda' before moving them into production.

We are seeing the following errors when running badblocks:

[ 8524.147171] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action
[ 8524.147178] ata1.00: BMDMA stat 0x25
[ 8524.147185] ata1.00: cmd c8/00:80:80:ff:ff/00:00:00:00:00/ef tag 0 dma
6553in
[ 8524.147187] ata1.00: res 51/04:80:80:ff:ff/00:00:00:00:00/ef Emask 0x1
(deve error)
[ 8524.147192] ata1.00: status: { DRDY ERR }
[ 8524.147277] ata1.00: error: { ABRT }

We have tried the drives in a variety of controllers, all with the same results.

We have contacted Hitachi, and they say that this is a bug in the way the kernel handles ATA7:

"Further analysis of the screen shot provided shows the drive is aborting
commands. The host is issuing a 28 bit commands over the 48 bit boundary.
The products that are passing did not abort the command.

In summary, the 500g not passing your burn-in test isn't a drive problem
but rather conforming to the ATA7 spec. Below is what I extracted from ATA
7 section 4.2.2:
Addressing constraints and error reporting Devices shall set IDNF to one
or ABRT to one in the Error register and ERR to one in the Status register
in response to any command where the requested LBA number is greater than
or equal to the content of words (61:60) for a 28-bit addressing command or
greater or equal to the contents of words (103:100) for a 48-bit addressing
command.

Your burn-in software should be modified to comply with the ATA7 specs.
Doing so will allow the drive to function without aborted commands."

It would be nice if the kernel could be patched to handle this case.
As these drives and others conforming to the ATA7 spec are more widespread
I think this bug could hit a lot of people.

Happy to provide more information or run tests on the drives.

Comment 1 Alan Cox 2008-10-06 13:18:39 UTC

Its a corner case in the spec that caught everyone out when one vendor shipped some very pedantic (but correct) firmware. Everyone else is quite happy with the LBA28 command for that block so it never got seen

Fix is trivial and is in 2.6.27-rc so can be pulled back easily enough as its simply a change by 1 of a comparison

Comment 3 Kevin Fenzi 2008-10-06 17:53:09 UTC

Excellent news. Let me know if you have a patch or patched kernel to test.

Comment 4 Alan Cox 2008-10-09 14:16:16 UTC

Commit:     97b697a11b07e2ebfa69c488132596cc5eb24119

Gitweb:     http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=97b697a11b07e2ebfa69c488132596cc5eb24119

Comment 5 David Milburn 2008-12-01 23:13:40 UTC

Hello Kevin,

Would you be able to do a quick verification on the kernel-2.6.18-124.el5.bz464868.1 test kernel? Thanks.

http://people.redhat.com/dmilburn/

Comment 6 Kevin Fenzi 2008-12-02 21:11:35 UTC

Yep. Seems to solve the issue here. ;)

Comment 9 Don Zickus 2008-12-09 21:04:21 UTC

in kernel-2.6.18-126.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 12 errata-xmlrpc 2009-01-20 19:36:08 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-0225.html

Note You need to log in before you can comment on or make changes to this bug.