Bug 495956

Summary: Ignore some failing SMART attributes
Product: [Fedora] Fedora Reporter: Tomas Mraz <tmraz>
Component: gnome-disk-utilityAssignee: David Zeuthen <davidz>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: rawhideCC: bugzilla, chris, davidz, lpoetter, mclasen, robatino, wwoods, yulrottmann
Target Milestone: ---Keywords: FutureFeature
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-05-03 13:55:27 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 446452    
Attachments:
Description Flags
Output of devkit-disks command
none
skdump dump none

Description Tomas Mraz 2009-04-15 17:44:15 UTC
On my desktop machine palimpsest reports a failing disk warning because the worst
value of a temperature attribute is one lower than the threshold. The
current value is well above the threshold. Perhaps palimpsest should have a feature to allow manually ignoring a failed attribute so the warning icon will
show up only if another attribute will give failure?

Comment 1 David Zeuthen 2009-04-15 18:15:54 UTC
First, please attach the output of 'devkit-disks --show-info /dev/sdX' for the disk in question.

There are two things we can do

 1. Make it possible to ignore ATA SMART on a given drive in the notification
    daemon altogether. That way you won't get the warning icon and notification.
    Maybe even make it possible to just ignore 

    - one or more attributes
    - bad sector warnings

    on a per drive basis. I imagine we can have a "Preferences..." item
    in the popup menu, e.g.

      http://people.freedesktop.org/~david/gdu-ata-smart-warning.png

    that gives you a dialog where this can be configured.

    (we also need a way to get to this dialog in Palimpsest.. so you can turn
     things back on.. we need this since you can't get to "Preferences..." 
     in the menu when there is no icon.)

 2. Tweak libatasmart to be less picky

Ideally we'd do 2. but since there's a lot of different drives out there we probably have to do 1. as well. I've added Lennart (the libatasmart author) so he can give his feedback.

Comment 2 Tomas Mraz 2009-04-15 20:33:05 UTC
Created attachment 339741 [details]
Output of devkit-disks command

To make the interface as simple as possible I'd just add a dialog with the description of the problem and "ignore this problem on this drive" button. This dialog would appear when clicking on the warning icon and it would make the warning icon disappear. In the palimpsest interface the error would still be displayed.

Comment 3 David Zeuthen 2009-04-15 20:51:27 UTC
> airflow-temperature-celsius  66/ 44/ 45   FAIL    34C / 93.2F Old-age  Online 

OK, so this one failed in the past but now it's good.

FWIW, I'm seeing this too with one of my devices

# devkit-disks --show-info /dev/sdb |grep spin-up-time
 spin-up-time                203/  1/ 21   FAIL    4.85 secs   Prefail  Online 

and we're just passing the good value we get from libatasmart.

# skdump /dev/sdb|grep spin-up-time
  3 spin-up-time                203     1    21   4.9 s       0xf21200000000 prefail online  no 

Lennart, perhaps libatasmart shouldn't mark attributes that failed in the past as bad (e.g. !good) if they are good now?

Comment 4 David Zeuthen 2009-04-15 20:55:14 UTC
(In reply to comment #2)
> To make the interface as simple as possible I'd just add a dialog with the
> description of the problem and "ignore this problem on this drive" button. This
> dialog would appear when clicking on the warning icon and it would make the
> warning icon disappear. In the palimpsest interface the error would still be
> displayed.  

Yeah, I think we probably want something simple like that.

Comment 5 Lennart Poettering 2009-04-15 21:01:26 UTC
Hmm, old age attributes should never result in libatasmart thinking the attr is bad.

Tomas, could you get me the raw smart data from the drive? i.e. 'skdump --save=mysmartdata /dev/sda' or suchlike?

Comment 6 Tomas Mraz 2009-04-15 21:46:47 UTC
Created attachment 339752 [details]
skdump dump

Comment 7 Lennart Poettering 2009-04-15 22:26:38 UTC
libatasmart-0.12-1.fc11 should fix the issue.

https://fedorahosted.org/rel-eng/ticket/1471

Comment 8 Tomas Mraz 2009-04-28 08:45:39 UTC
Unfortunately I have now libatasmart-0.12-2.fc11 and the failing disk icon in the status bar is still there.

Comment 9 David Zeuthen 2009-05-03 13:55:27 UTC
The DeviceKit-disks package fixes the problem

 http://koji.fedoraproject.org/koji/buildinfo?buildID=100516

since it contains this bugfix

 http://cgit.freedesktop.org/DeviceKit/DeviceKit-disks/commit/?id=c7098688b90b9ba0feb38b24ffe93bd78ada21e2

I've tested this against the skdump file and there's no more warning icons.

This will be in F11 once other bits are ready (gvfs is failing to build because of samba issues) and I've mailed the release team etc.

Comment 10 Lennart Poettering 2009-05-05 16:22:00 UTC
Hmm, David, I think it would be good if you'd still highlight old-age attributes if they went outside the range. That shouldn't be called "failing" or so, but highlighting would be good.

I.e. the check whether a->prefailure is set is too much I think.

Comment 11 Will Woods 2009-05-06 15:19:53 UTC
This problem was discussed as a blocker in last week's QA meeting. Adding to list for record-keeping purposes.

Comment 12 David Zeuthen 2009-05-06 15:42:07 UTC
(In reply to comment #11)
> This problem was discussed as a blocker in last week's QA meeting. Adding to
> list for record-keeping purposes.  

Request for F11 inclusion here

https://fedorahosted.org/rel-eng/ticket/1742

Comment 13 David Zeuthen 2009-05-06 15:54:49 UTC
(In reply to comment #10)
> Hmm, David, I think it would be good if you'd still highlight old-age
> attributes if they went outside the range. That shouldn't be called "failing"
> or so, but highlighting would be good.
> 
> I.e. the check whether a->prefailure is set is too much I think.  

This is just for determining the overall status; it's what libatasmart is doing as well. So this is actually a crucial bugfix since DeviceKit-disks was responsible for crying wolf here (libatasmart is fine).

(The problem is that libatasmart don't use a bitfield; e.g. HAS_BAD_SECTORS take precende over HAS_PREFAIL_ATTRIBUTES_EXCEEDING_THRESHOLD. I want to export both so I just do the same checks. It's not ideal duplicating this in DeviceKit-disks, I know.)

Of course neither "old-age attr exceeds threshold" and "old-age failed in the past" won't cause the "your disk is failing!" notification to be shown but... if the user is actively looking at the ATA SMART attributes we should do a better job. I've filed bugs for that

 https://bugs.freedesktop.org/show_bug.cgi?id=21599
 http://bugzilla.gnome.org/show_bug.cgi?id=581608