Description of problem: Palimpsest says my 100GB disk is failing due to bad sectors. However, the disk is actually ok. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. Install F11 Live CD x86_64 2. Boot 3. Log in to Gnome Desktop Actual results: Palimpsest says a disk is failing due to bad sectors Expected results: Additional info: sudo fdisk -l /dev/sda " Disk /dev/sda: 100.0 GB, 100030242816 bytes 240 heads, 63 sectors/track, 12921 cylinders Units = cylinders of 15120 * 512 = 7741440 bytes Disk identifier: 0x94e494e4 Device Boot Start End Blocks Id System /dev/sda1 1 2298 17370112 7 HPFS/NTFS /dev/sda2 * 2299 12922 80312904 5 Extended /dev/sda5 2299 2314 120928+ 83 Linux /dev/sda6 2315 3962 12458848+ 83 Linux /dev/sda7 3963 4141 1353208+ 82 Linux swap / Solaris /dev/sda8 4142 11246 53713768+ b W95 FAT32 /dev/sda9 11247 12013 5798488+ 83 Linux /dev/sda10 12014 12921 6864448+ 7 HPFS/NTFS " [root@Xtigyro--fedora log]# devkit-disks --show-info /dev/sda Showing information for /org/freedesktop/DeviceKit/Disks/devices/sda native-path: /sys/devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda device: 8:0 device-file: /dev/sda by-id: /dev/disk/by-id/ata-HTS541010G9SA00_MP2ZX0XLGK4BES by-id: /dev/disk/by-id/scsi-SATA_HTS541010G9SA00_MP2ZX0XLGK4BES by-path: /dev/disk/by-path/pci-0000:00:1f.2-scsi-0:0:0:0 detected at: Tue 16 Jun 2009 02:35:35 PM EEST system internal: 1 removable: 0 has media: 1 (detected at Tue 16 Jun 2009 02:35:35 PM EEST) detects change: 0 detection by polling: 0 detection inhibitable: 0 detection inhibited: 0 is read only: 0 is mounted: 0 mount paths: mounted by uid: 0 presentation hide: 0 presentation name: presentation icon: size: 100030242816 block size: 512 job underway: no usage: type: version: uuid: label: partition table: scheme: mbr count: 8 drive: vendor: ATA model: HTS541010G9SA00 revision: MBZO serial: MP2ZX0XLGK4BES ejectable: 0 require eject: 0 media: compat: interface: ata if speed: (unknown) ATA SMART: Updated at Tue 16 Jun 2009 02:35:35 PM EEST assessment: PASSED bad sectors: Yes attributes: One ore more attributes exceed threshold temperature: 43° C / 109° F powered on: 345 days offline data: successful (645 second(s) to complete) self-test status: success or never (0% remaining) ext./short test: available conveyance test: not available start test: available abort test: available short test: 2 minute(s) recommended polling time ext. test: 66 minute(s) recommended polling time conveyance test: 0 minute(s) recommended polling time =============================================================================== Attribute Current/Worst/Threshold Status Value Type Updates =============================================================================== raw-read-error-rate 100/ 99/ 62 good 0 Prefail Online throughput-performance 106/100/ 40 good 0 Prefail Offline spin-up-time 247/100/ 33 good 1 msec Prefail Online start-stop-count 98/ 98/ 0 n/a 3224 Old-age Online reallocated-sector-count 100/100/ 5 FAIL 1900724 sectors Prefail Online seek-error-rate 100/100/ 67 good 0 Prefail Online seek-time-performance 128/100/ 40 good 0 Prefail Offline power-on-hours 82/ 82/ 0 n/a 345 days Old-age Online spin-retry-count 100/100/ 60 good 0 Prefail Online power-cycle-count 99/ 99/ 0 n/a 2650 Old-age Online g-sense-error-rate 100/ 92/ 0 n/a 0 Old-age Online power-off-retract-count 100/100/ 0 n/a 2228329 Old-age Online load-cycle-count 89/ 89/ 0 n/a 118043 Old-age Online temperature-celsius-2 127/100/ 0 n/a 43C / 109F Old-age Online reallocated-event-count 100/100/ 0 n/a 22 Old-age Online current-pending-sector 100/100/ 0 n/a 0 sectors Old-age Online offline-uncorrectable 100/100/ 0 n/a 0 sectors Old-age Offline udma-crc-error-count 200/253/ 0 n/a 0 Old-age Online [root@Xtigyro--fedora log]# rpm -q DeviceKit-disks gvfs libatasmart DeviceKit-disks-004-3.fc11.x86_64 gvfs-1.2.3-2.fc11.x86_64 libatasmart-0.12-3.fc11.x86_64 [root@Xtigyro--fedora log]# sudo skdump /dev/sda Device: /dev/sda Type: 16 Byte SCSI ATA SAT Passthru Size: 95396 MiB Model: [HTS541010G9SA00] Serial: [MP2ZX0XLGK4BES] Firmware: [MBZOC60P] SMART Available: yes Quirks: Awake: yes SMART Disk Health Good: yes Off-line Data Collection Status: [Off-line data collection activity was completed without error.] Total Time To Complete Off-Line Data Collection: 645 s Self-Test Execution Status: [The previous self-test routine completed without error or no self-test has ever been run.] Percent Self-Test Remaining: 0% Conveyance Self-Test Available: no Short/Extended Self-Test Available: yes Start Self-Test Available: yes Abort Self-Test Available: yes Short Self-Test Polling Time: 2 min Extended Self-Test Polling Time: 66 min Conveyance Self-Test Polling Time: 0 min Bad Sectors: 1900724 sectors Powered On: 11.5 months Power Cycles: 2650 Average Powered On Per Power Cycle: 3.1 h Temperature: 43.0 C Overall Status: BAD_SECTOR ID# Name Value Worst Thres Pretty Raw Type Updates Good 1 raw-read-error-rate 100 99 62 0 0x000000000000 prefail online yes 2 throughput-performance 106 100 40 n/a 0xb71100000000 prefail offline yes 3 spin-up-time 247 100 33 1 ms 0x010000000d00 prefail online yes 4 start-stop-count 98 98 0 3224 0x980c00000000 old-age online n/a 5 reallocated-sector-count 100 100 5 1900724 sectors 0xb4001d000000 prefail online no 7 seek-error-rate 100 100 67 0 0x000000000000 prefail online yes 8 seek-time-performance 128 100 40 n/a 0x240000000000 prefail offline yes 9 power-on-hours 82 82 0 11.5 months 0x562000000000 old-age online n/a 10 spin-retry-count 100 100 60 0 0x000000000000 prefail online yes 12 power-cycle-count 99 99 0 2650 0x5a0a00000000 old-age online n/a 191 g-sense-error-rate 100 92 0 0 0x000000000000 old-age online n/a 192 power-off-retract-count 100 100 0 2228329 0x690022000000 old-age online n/a 193 load-cycle-count 89 89 0 118044 0x1ccd01000000 old-age online n/a 194 temperature-celsius-2 127 100 0 43.0 C 0x2b0003003500 old-age online n/a 196 reallocated-event-count 100 100 0 22 0x160000000000 old-age online n/a 197 current-pending-sector 100 100 0 0 sectors 0x000000000000 old-age online n/a 198 offline-uncorrectable 100 100 0 0 sectors 0x000000000000 old-age offline n/a 199 udma-crc-error-count 200 253 0 0 0x000000000000 old-age online n/a
It looks like your disk does have bad sectors (1900724 according to the SMART report), which could well be correct. Not to worry though - hard drives have spare sectors for this purpose and your drive have reallocated the sectors to fresh spares. According to the SMART attributes, your drive is well above the failure threshold: Current value for reallocated-sector-count is 100 with a threshold of 5 - for a failure the current value should be equal or under the threshold. man smartctl and search for "-A". It looks like palimpsest is just trigger happy. This bug is possibly a dup of bug #498115
Hmm, 1900724 is a very high value. If it is true your hard disk is pretty broken and I wouldn't trust it anymore. However, I am assuming that this is probably just a parse failure, so I'll now disable the parsing of that attribute in libatasmart for your disk.
I now added a quirk upstream for this: http://git.0pointer.de/?p=libatasmart.git;a=commitdiff;h=4fdaf003a3b7277c1f3aec45d52c362f6aa187bc
I don't think this fix is correct - liatasmart should *completely ignore* the real values reported for the purpose of determining if the state of the drive is in error or not. In order to decide if the drive is in failure status, libatasmart should only consult the "Current value" vs. "Threshold", according to the logic described in man smartctl. If the drive considers 1900724 re-allocated sectors is a value of 100 vs a threshold of 5, then libatasmart should not second guess that.
I forgot to mention that as I described in comment #47 bug #498115 , libatasmart describes even 1 reallocated sector as a fail status - which is obviously wrong. Adding a quirk for every such case is not the write way to go.
We decided that it makes more sense to actually second-guess the drive here. Manufacturers tend to set those threshold artifically high, to make their drives look better. However you are right, checking against 0 is a bit too much. This will be changed to check against a threshold that is dependant on the actual size of the disk.
I think that second guessing the manufacturer is not a good idea on the face of it - unless proven that as a rule of thumb implementors of the SMART standard cannot be relied upon to implement it properly, I would think that the default implementation in Linux should be to work with the standard. Baring that this is not likely to happen (and I acknowledge that there may be issues with some drives), I would hope that there would be some switch that allows me to ask libatasmart to honor the standard and the device manufacturer settings, on my machine.
(In reply to comment #7) > I think that second guessing the manufacturer is not a good idea on the face of > it - unless proven that as a rule of thumb implementors of the SMART standard > cannot be relied upon to implement it properly, I would think that the default > implementation in Linux should be to work with the standard. Nice idea. However, that doesn't work. For the simple reason that there is no "SMART standard". There is simply no official spec of the SMART attributes stuff. There was a draft spec which was pulled back. Most vendors do follow that but departed from that in many many ways, sometimes in a compatible way, sometimes in an incompatible way. A good part of the information libatasmart parses is not documented anywhere, it's simply something that was observed that all (or some) manufacturars seem to agree on or follow, even if it isn't set in stone. libatasmart tries to make sense of the data available in the SMART information as good as it can. But since there is no official specification we need to take the data that is a available, distill some information from it, verify that it makes sense and then present that to the user. Also note that libatasmart .14 now compares the number of bad sectors against a threshold that depends on the disk size.
Ok, thanks for the response. I still think there is room in palimpsest for showing where libatasmart thinks there is a failure while the manufacturer's "current value">"threshold value" says its OK(*) and in such cases allow the user to override the notification for just that property so that it reports failures according the the manufacturer's values. (*) even if these values are stupid, like setting the current value to always 100.