Red Hat Bugzilla – Bug 495956
Ignore some failing SMART attributes
Last modified: 2013-03-05 22:58:25 EST
On my desktop machine palimpsest reports a failing disk warning because the worst
value of a temperature attribute is one lower than the threshold. The
current value is well above the threshold. Perhaps palimpsest should have a feature to allow manually ignoring a failed attribute so the warning icon will
show up only if another attribute will give failure?
First, please attach the output of 'devkit-disks --show-info /dev/sdX' for the disk in question.
There are two things we can do
1. Make it possible to ignore ATA SMART on a given drive in the notification
daemon altogether. That way you won't get the warning icon and notification.
Maybe even make it possible to just ignore
- one or more attributes
- bad sector warnings
on a per drive basis. I imagine we can have a "Preferences..." item
in the popup menu, e.g.
that gives you a dialog where this can be configured.
(we also need a way to get to this dialog in Palimpsest.. so you can turn
things back on.. we need this since you can't get to "Preferences..."
in the menu when there is no icon.)
2. Tweak libatasmart to be less picky
Ideally we'd do 2. but since there's a lot of different drives out there we probably have to do 1. as well. I've added Lennart (the libatasmart author) so he can give his feedback.
Created attachment 339741 [details]
Output of devkit-disks command
To make the interface as simple as possible I'd just add a dialog with the description of the problem and "ignore this problem on this drive" button. This dialog would appear when clicking on the warning icon and it would make the warning icon disappear. In the palimpsest interface the error would still be displayed.
> airflow-temperature-celsius 66/ 44/ 45 FAIL 34C / 93.2F Old-age Online
OK, so this one failed in the past but now it's good.
FWIW, I'm seeing this too with one of my devices
# devkit-disks --show-info /dev/sdb |grep spin-up-time
spin-up-time 203/ 1/ 21 FAIL 4.85 secs Prefail Online
and we're just passing the good value we get from libatasmart.
# skdump /dev/sdb|grep spin-up-time
3 spin-up-time 203 1 21 4.9 s 0xf21200000000 prefail online no
Lennart, perhaps libatasmart shouldn't mark attributes that failed in the past as bad (e.g. !good) if they are good now?
(In reply to comment #2)
> To make the interface as simple as possible I'd just add a dialog with the
> description of the problem and "ignore this problem on this drive" button. This
> dialog would appear when clicking on the warning icon and it would make the
> warning icon disappear. In the palimpsest interface the error would still be
Yeah, I think we probably want something simple like that.
Hmm, old age attributes should never result in libatasmart thinking the attr is bad.
Tomas, could you get me the raw smart data from the drive? i.e. 'skdump --save=mysmartdata /dev/sda' or suchlike?
Created attachment 339752 [details]
libatasmart-0.12-1.fc11 should fix the issue.
Unfortunately I have now libatasmart-0.12-2.fc11 and the failing disk icon in the status bar is still there.
The DeviceKit-disks package fixes the problem
since it contains this bugfix
I've tested this against the skdump file and there's no more warning icons.
This will be in F11 once other bits are ready (gvfs is failing to build because of samba issues) and I've mailed the release team etc.
Hmm, David, I think it would be good if you'd still highlight old-age attributes if they went outside the range. That shouldn't be called "failing" or so, but highlighting would be good.
I.e. the check whether a->prefailure is set is too much I think.
This problem was discussed as a blocker in last week's QA meeting. Adding to list for record-keeping purposes.
(In reply to comment #11)
> This problem was discussed as a blocker in last week's QA meeting. Adding to
> list for record-keeping purposes.
Request for F11 inclusion here
(In reply to comment #10)
> Hmm, David, I think it would be good if you'd still highlight old-age
> attributes if they went outside the range. That shouldn't be called "failing"
> or so, but highlighting would be good.
> I.e. the check whether a->prefailure is set is too much I think.
This is just for determining the overall status; it's what libatasmart is doing as well. So this is actually a crucial bugfix since DeviceKit-disks was responsible for crying wolf here (libatasmart is fine).
(The problem is that libatasmart don't use a bitfield; e.g. HAS_BAD_SECTORS take precende over HAS_PREFAIL_ATTRIBUTES_EXCEEDING_THRESHOLD. I want to export both so I just do the same checks. It's not ideal duplicating this in DeviceKit-disks, I know.)
Of course neither "old-age attr exceeds threshold" and "old-age failed in the past" won't cause the "your disk is failing!" notification to be shown but... if the user is actively looking at the ATA SMART attributes we should do a better job. I've filed bugs for that