Red Hat Bugzilla – Bug 277641
Usability of important smartd messages
Last modified: 2007-11-30 17:12:14 EST
I got the following emailed to "root" by "smartd":
From: root <root@diet-anarchy>
Subject: SMART error (CurrentPendingSector) detected on host: diet-anarchy
Date: Mon, 20 Aug 2007 18:33:56 -0400
This email was generated by the smartd daemon running on:
host name: diet-anarchy
DNS domain: [Unknown]
NIS domain: (none)
The following warning/error was logged by the smartd daemon:
Device: /dev/hda, 1 Currently unreadable (pending) sectors
For details see host's SYSLOG (default: /var/log/messages).
You can also use the smartctl utility for further investigation.
No additional email messages about this problem will be sent.
I got the following emailed to "root" by "logwatch":
--------------------- Smartd Begin ------------------------
Currently unreadable (pending) sectors detected:
/dev/hda - 11 Time(s)
1 unreadable sectors detected
Warning via mail to root: successful - 1 Time(s)
Sending warning via mail to root ... - 1 Time(s)
---------------------- Smartd End -------------------------
There are two usability problems here. 1.) The wording of these warning
messages is rather technical. Most people - who are not experienced
command-line Linux users - would not know what to do next. 2.) These messages
are being e-mailed to "root". One would need to either actively check root's
email, configure sendmail to send root's email somewhere else, to even receive
these messages in the first place. Anaconda does not prompt users to do either
of these things, and I seriously doubt that most everyday users would actually
These are important messages, which after some amount of research I decided
meant that my hard drive was beginning to fail, and that it was time to replace
it with a new one. This is important enough that not receiving or understanding
these messages seems unsatisfactory.
One possible solution is to provide a pop-up warning, similar to those generated
by SELinux or Network Manager, to any users logged into GNOME. Better online
documentation to help novice users diagnose their hardware problem could be
helpful since there are a variety of messages which might be generated. There
might also be completely different solutions which are also satisfactory.
The action that smartd performs on error is configurable and the configuration
is described in the documentation even with examples. You can configure smartd
to use notify-send, but since smartmontools package is more important and useful
on servers (it's going to be turned off by default in F-8) I have to assume
there will be no user logged into GNOME. Then the only action that makes sense
is to send an email to root.
The manpages are browsable in yelp and I consider them to be really well
written. Much better than I could do with my English...
For users that don't even know enough to check root's email, it's not a viable
solution to expect them to re-configure smartd to take some different action
upon discovering a problem.
I am running Fedora 7 on a laptop, and I was very happy that smartd was enabled
by default. If it hadn't been, I wouldn't have known to replace my hard drive.
I don't see why it's any less useful for desktop users to get notified about
imminent hardware failure; they are even less likely to be making regular
backups of important data. I would definitely recommend enabling smartd by
default in Fedora 8. What would be the downside?
Having smartd turned off makes sense assuming users are clever enough to know
what they want and since all of them want fast system, nothing that's not
required should not be started by default. It's the user who decides what to
run on his machine, not the packager. And I think this is a good approach.
The point of enabling smartd by default is to do the right thing for users who
*aren't* clever enough to know that smartd exists in the first place, much less
figure out how to turn it on and receive its messages. I'm sure everyone wants
a fast system, but it seems a bit more important to know whether or not they are
about to lose all of their data due to hard drive failure.