Greetings. We have been running into a problem with Hitachi Deskstar P7K500 500GB 7200 RPM SATA Hard Drives. We run new drives through a cycle of 'badblocks -svw -p 5 /dev/sda' before moving them into production. We are seeing the following errors when running badblocks: [ 8524.147171] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action [ 8524.147178] ata1.00: BMDMA stat 0x25 [ 8524.147185] ata1.00: cmd c8/00:80:80:ff:ff/00:00:00:00:00/ef tag 0 dma 6553in [ 8524.147187] ata1.00: res 51/04:80:80:ff:ff/00:00:00:00:00/ef Emask 0x1 (deve error) [ 8524.147192] ata1.00: status: { DRDY ERR } [ 8524.147277] ata1.00: error: { ABRT } We have tried the drives in a variety of controllers, all with the same results. We have contacted Hitachi, and they say that this is a bug in the way the kernel handles ATA7: "Further analysis of the screen shot provided shows the drive is aborting commands. The host is issuing a 28 bit commands over the 48 bit boundary. The products that are passing did not abort the command. In summary, the 500g not passing your burn-in test isn't a drive problem but rather conforming to the ATA7 spec. Below is what I extracted from ATA 7 section 4.2.2: Addressing constraints and error reporting Devices shall set IDNF to one or ABRT to one in the Error register and ERR to one in the Status register in response to any command where the requested LBA number is greater than or equal to the content of words (61:60) for a 28-bit addressing command or greater or equal to the contents of words (103:100) for a 48-bit addressing command. Your burn-in software should be modified to comply with the ATA7 specs. Doing so will allow the drive to function without aborted commands." It would be nice if the kernel could be patched to handle this case. As these drives and others conforming to the ATA7 spec are more widespread I think this bug could hit a lot of people. Happy to provide more information or run tests on the drives.
Its a corner case in the spec that caught everyone out when one vendor shipped some very pedantic (but correct) firmware. Everyone else is quite happy with the LBA28 command for that block so it never got seen Fix is trivial and is in 2.6.27-rc so can be pulled back easily enough as its simply a change by 1 of a comparison
Excellent news. Let me know if you have a patch or patched kernel to test.
Commit: 97b697a11b07e2ebfa69c488132596cc5eb24119 Gitweb: http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=97b697a11b07e2ebfa69c488132596cc5eb24119
Hello Kevin, Would you be able to do a quick verification on the kernel-2.6.18-124.el5.bz464868.1 test kernel? Thanks. http://people.redhat.com/dmilburn/
Yep. Seems to solve the issue here. ;)
in kernel-2.6.18-126.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-0225.html