Bug 495956 - Ignore some failing SMART attributes
Ignore some failing SMART attributes
Status: CLOSED RAWHIDE
Product: Fedora
Classification: Fedora
Component: gnome-disk-utility (Show other bugs)
rawhide
All Linux
low Severity medium
: ---
: ---
Assigned To: David Zeuthen
Fedora Extras Quality Assurance
: FutureFeature
Depends On:
Blocks: F11Blocker/F11FinalBlocker
  Show dependency treegraph
 
Reported: 2009-04-15 13:44 EDT by Tomas Mraz
Modified: 2013-03-05 22:58 EST (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-05-03 09:55:27 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Output of devkit-disks command (3.70 KB, text/plain)
2009-04-15 16:33 EDT, Tomas Mraz
no flags Details
skdump dump (1.54 KB, application/octet-stream)
2009-04-15 17:46 EDT, Tomas Mraz
no flags Details

  None (edit)
Description Tomas Mraz 2009-04-15 13:44:15 EDT
On my desktop machine palimpsest reports a failing disk warning because the worst
value of a temperature attribute is one lower than the threshold. The
current value is well above the threshold. Perhaps palimpsest should have a feature to allow manually ignoring a failed attribute so the warning icon will
show up only if another attribute will give failure?
Comment 1 David Zeuthen 2009-04-15 14:15:54 EDT
First, please attach the output of 'devkit-disks --show-info /dev/sdX' for the disk in question.

There are two things we can do

 1. Make it possible to ignore ATA SMART on a given drive in the notification
    daemon altogether. That way you won't get the warning icon and notification.
    Maybe even make it possible to just ignore 

    - one or more attributes
    - bad sector warnings

    on a per drive basis. I imagine we can have a "Preferences..." item
    in the popup menu, e.g.

      http://people.freedesktop.org/~david/gdu-ata-smart-warning.png

    that gives you a dialog where this can be configured.

    (we also need a way to get to this dialog in Palimpsest.. so you can turn
     things back on.. we need this since you can't get to "Preferences..." 
     in the menu when there is no icon.)

 2. Tweak libatasmart to be less picky

Ideally we'd do 2. but since there's a lot of different drives out there we probably have to do 1. as well. I've added Lennart (the libatasmart author) so he can give his feedback.
Comment 2 Tomas Mraz 2009-04-15 16:33:05 EDT
Created attachment 339741 [details]
Output of devkit-disks command

To make the interface as simple as possible I'd just add a dialog with the description of the problem and "ignore this problem on this drive" button. This dialog would appear when clicking on the warning icon and it would make the warning icon disappear. In the palimpsest interface the error would still be displayed.
Comment 3 David Zeuthen 2009-04-15 16:51:27 EDT
> airflow-temperature-celsius  66/ 44/ 45   FAIL    34C / 93.2F Old-age  Online 

OK, so this one failed in the past but now it's good.

FWIW, I'm seeing this too with one of my devices

# devkit-disks --show-info /dev/sdb |grep spin-up-time
 spin-up-time                203/  1/ 21   FAIL    4.85 secs   Prefail  Online 

and we're just passing the good value we get from libatasmart.

# skdump /dev/sdb|grep spin-up-time
  3 spin-up-time                203     1    21   4.9 s       0xf21200000000 prefail online  no 

Lennart, perhaps libatasmart shouldn't mark attributes that failed in the past as bad (e.g. !good) if they are good now?
Comment 4 David Zeuthen 2009-04-15 16:55:14 EDT
(In reply to comment #2)
> To make the interface as simple as possible I'd just add a dialog with the
> description of the problem and "ignore this problem on this drive" button. This
> dialog would appear when clicking on the warning icon and it would make the
> warning icon disappear. In the palimpsest interface the error would still be
> displayed.  

Yeah, I think we probably want something simple like that.
Comment 5 Lennart Poettering 2009-04-15 17:01:26 EDT
Hmm, old age attributes should never result in libatasmart thinking the attr is bad.

Tomas, could you get me the raw smart data from the drive? i.e. 'skdump --save=mysmartdata /dev/sda' or suchlike?
Comment 6 Tomas Mraz 2009-04-15 17:46:47 EDT
Created attachment 339752 [details]
skdump dump
Comment 7 Lennart Poettering 2009-04-15 18:26:38 EDT
libatasmart-0.12-1.fc11 should fix the issue.

https://fedorahosted.org/rel-eng/ticket/1471
Comment 8 Tomas Mraz 2009-04-28 04:45:39 EDT
Unfortunately I have now libatasmart-0.12-2.fc11 and the failing disk icon in the status bar is still there.
Comment 9 David Zeuthen 2009-05-03 09:55:27 EDT
The DeviceKit-disks package fixes the problem

 http://koji.fedoraproject.org/koji/buildinfo?buildID=100516

since it contains this bugfix

 http://cgit.freedesktop.org/DeviceKit/DeviceKit-disks/commit/?id=c7098688b90b9ba0feb38b24ffe93bd78ada21e2

I've tested this against the skdump file and there's no more warning icons.

This will be in F11 once other bits are ready (gvfs is failing to build because of samba issues) and I've mailed the release team etc.
Comment 10 Lennart Poettering 2009-05-05 12:22:00 EDT
Hmm, David, I think it would be good if you'd still highlight old-age attributes if they went outside the range. That shouldn't be called "failing" or so, but highlighting would be good.

I.e. the check whether a->prefailure is set is too much I think.
Comment 11 Will Woods 2009-05-06 11:19:53 EDT
This problem was discussed as a blocker in last week's QA meeting. Adding to list for record-keeping purposes.
Comment 12 David Zeuthen 2009-05-06 11:42:07 EDT
(In reply to comment #11)
> This problem was discussed as a blocker in last week's QA meeting. Adding to
> list for record-keeping purposes.  

Request for F11 inclusion here

https://fedorahosted.org/rel-eng/ticket/1742
Comment 13 David Zeuthen 2009-05-06 11:54:49 EDT
(In reply to comment #10)
> Hmm, David, I think it would be good if you'd still highlight old-age
> attributes if they went outside the range. That shouldn't be called "failing"
> or so, but highlighting would be good.
> 
> I.e. the check whether a->prefailure is set is too much I think.  

This is just for determining the overall status; it's what libatasmart is doing as well. So this is actually a crucial bugfix since DeviceKit-disks was responsible for crying wolf here (libatasmart is fine).

(The problem is that libatasmart don't use a bitfield; e.g. HAS_BAD_SECTORS take precende over HAS_PREFAIL_ATTRIBUTES_EXCEEDING_THRESHOLD. I want to export both so I just do the same checks. It's not ideal duplicating this in DeviceKit-disks, I know.)

Of course neither "old-age attr exceeds threshold" and "old-age failed in the past" won't cause the "your disk is failing!" notification to be shown but... if the user is actively looking at the ATA SMART attributes we should do a better job. I've filed bugs for that

 https://bugs.freedesktop.org/show_bug.cgi?id=21599
 http://bugzilla.gnome.org/show_bug.cgi?id=581608

Note You need to log in before you can comment on or make changes to this bug.