Bug 699079 - ATA errors on boot and when running fdisk
Summary: ATA errors on boot and when running fdisk
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 15
Hardware: x86_64
OS: Linux
unspecified
low
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-04-22 23:31 UTC by Loïc Yhuel
Modified: 2012-06-04 18:50 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-06-04 18:50:57 UTC
Type: ---


Attachments (Terms of Use)

Description Loïc Yhuel 2011-04-22 23:31:39 UTC
Description of problem:
In dmesg log, there are a lot of ATA errors (It overflows the dmesg buffer on boot).
The errors happen at boot, and if I run "fdisk -l" on the drive.
The sector number in not always the same, but it's always after the end of the last partition.
The partitions can be mounted and used with no problem.

Here is one error, there are many of those on boot (I can't see how many since the buffer is not big enough), and one each time I run "fdisk -l /dev/sdd" :

First part, which repeats 6 times:
===================================
[   18.882285] ata6.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
[   18.882287] ata6.00: irq_stat 0x40000001
[   18.882290] ata6.00: failed command: READ FPDMA QUEUED
[   18.882296] ata6.00: cmd 60/08:00:f8:06:74/00:00:07:00:00/40 tag 0 ncq 4096 in
[   18.882297]          res 41/14:00:f8:06:74/00:00:07:00:00/40 Emask 0x481 (invalid argument) <F>
[   18.882300] ata6.00: status: { DRDY ERR }
[   18.882302] ata6.00: error: { IDNF ABRT }
[   18.884419] ata6.00: configured for UDMA/133
[   18.884423] ata6: EH complete
==================================
Second part, only once per error :
==================================
[   18.893907] sd 5:0:0:0: [sdd]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[   18.893909] sd 5:0:0:0: [sdd]  Sense Key : Aborted Command [current] [descriptor]
[   18.893912] Descriptor sense data with sense descriptors (in hex):
[   18.893913]         72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00 
[   18.893917]         07 74 06 f8 
[   18.893919] sd 5:0:0:0: [sdd]  Add. Sense: No additional sense information
[   18.893921] sd 5:0:0:0: [sdd] CDB: Read(10): 28 00 07 74 06 f8 00 00 08 00
[   18.893926] end_request: I/O error, dev sdd, sector 125044472
[   18.893938] ata6: EH complete
=================================

Version-Release number of selected component (if applicable):
2.6.38.2-9.fc15.x86_64

Additional info:
Extract from "hdparm -I /dev/sdd" :
	Model Number:       CRUCIAL_CT64M225                        
	Firmware Revision:  1916    

	Logical		max	current
	cylinders	16383	16383
	heads		16	16
	sectors/track	63	63
	--
	CHS current addressable sectors:   16514064
	LBA    user addressable sectors:  125045424
	LBA48  user addressable sectors:  125045424
	Logical  Sector size:                   512 bytes
	Physical Sector size:                   512 bytes
	device size with M = 1024*1024:       61057 MBytes
	device size with M = 1000*1000:       64023 MBytes (64 GB)


"fdisk -l /dev/sdd" :
Disk /dev/sdd: 64.0 GB, 64023257088 bytes
255 heads, 63 sectors/track, 7783 cylinders, total 125045424 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xdb47dbb8

   Device Boot      Start         End      Blocks   Id  System
/dev/sdd1   *        2048      206847      102400    7  HPFS/NTFS/exFAT
/dev/sdd2          206848   125037167    62415160    7  HPFS/NTFS/exFAT


Gigabyte EX58-UD3R
ICH10R in RAID mode (sdd is a non-member disk)
00:1f.2 RAID bus controller: Intel Corporation 82801 SATA RAID Controller

Comment 1 Chuck Ebbert 2011-04-23 07:11:32 UTC
(In reply to comment #1)

> [   18.882302] ata6.00: error: { IDNF ABRT }

This says that the sector does not really exist (IDNF means "sector ID not found"). It's like the drive is pretending to be a certain LBA size but actually only responds to requests for sectors within allocated partitions.

Comment 2 Loïc Yhuel 2011-04-24 03:21:19 UTC
Testing using Fedora Live images :
 - Same problem with Fedora 14
 - No problem with Fedora 13

On Fedora 13, the LBA size is lower.
kernel 2.6.33.3-85.fc13.x86_64

hdparm -I /dev/sdd :
LBA    user addressable sectors:  125037167

hdparm -N /dev/sdd
 max sectors   = 125037167/125045424, HPA is enabled


On Fedora 15 :
hdparm -N /dev/sdd
 max sectors   = 125045424/125045424, HPA is disabled


So something disables HPA on F14/F15, even if /sys/module/libata/parameters/ignore_hpa is 0

Comment 3 Chuck Ebbert 2011-04-25 08:18:36 UTC
(In reply to comment #2)

ignore_hpa=0 is the default anyway. I suspect the first of these two patches is causing the problem:

Subject: libata: unlock HPA if device shrunk
X-Git-Tag: v2.6.34-rc4~63^2
X-Git-Url: http://git.kernel.org/?p=linux%2Fkernel%2Fgit%2Ftorvalds%2Flinux-2.6.git;a=commitdiff_plain;h=445d211b0da4e9a6e6d576edff85085c2aaf53df

Subject: libata: use the enlarged capacity after late HPA unlock
X-Git-Tag: v2.6.35-rc2~42^2~1
X-Git-Url: http://git.kernel.org/?p=linux%2Fkernel%2Fgit%2Ftorvalds%2Flinux-2.6.git;a=commitdiff_plain;h=68939ce5fc17ee9c03ef6e543d4f82bd9f5583d4

You could try booting with kernel option log_buf_len=32M and try to see if any of those new messages are showing in the log.

Comment 4 Loïc Yhuel 2011-04-25 11:52:38 UTC
It's the second patch (confirmed by kernel 2.6.34.8-68.fc13.x86_64 still ok).

[    5.428285] sdd: p2 size 124830320 extends beyond EOD, enabling native capacity
[    5.428297] ata6: hard resetting link
[    5.887114] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    5.887429] ata6.00: n_sectors mismatch 125037167 != 125045424
[    5.887432] ata6.00: new n_sectors matches native, probably late HPA unlock, n_sectors updated

So the last sector of the partition is in the HPA.

Windows doesn't unlock the HPA (physical drive size is 125037167 sectors).
So perhaps there is a Windows bug, creating a too big partition.
Unless the HPA was disabled by the Intel driver version used when I created the partition.

For the errors, the drive may have a bug with the HPA unlock : the sector count is changed in ATA IDENTIFY command, but it refuses to read these sectors.

Comment 5 Loïc Yhuel 2011-04-25 22:11:32 UTC
I fixed this problem by disabling HPA permanently with hdparm on /dev/sdd.

I don't think the kernel can do anything in this case, except if the partially working HPA unlock came from the kernel code and not the disk firmware.

But for other cases it could be good to modify the partition checking code.
See bug 699084.

Comment 6 Chuck Ebbert 2011-04-29 02:22:26 UTC
I think there's an off-by-one error somewhere in the checking code:

In fs/partitions/check.c:

                if (from + size > get_capacity(disk)) {

I think that needs to be "from + size - 1" ?

Comment 7 Loïc Yhuel 2011-04-29 19:15:18 UTC
I think the test is correct.

"from + size - 1" is the index of the last sector in the partition
The last sector of the disk is "get_capacity(disk) - 1"


Note You need to log in before you can comment on or make changes to this bug.