Bug 155306

Summary: feature: add SMART HDD check to installation routine
Product: [Fedora] Fedora Reporter: Andrew <andrewz>
Component: anacondaAssignee: David Cantrell <dcantrell>
Status: CLOSED WONTFIX QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: rawhideCC: aleksey, goemon, mlists
Target Milestone: ---Keywords: FutureFeature
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard: FC5
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-08-23 19:20:13 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 150223    

Description Andrew 2005-04-18 22:37:18 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5) Gecko/20041111 Firefox/1.0.2

Description of problem:
If anaconda would use smartctl to check a HDD for errors before installation proceeds, this would save some people some grief.

(On the other hand, I'm glad Fedora runs smartd by default!)

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Install Fedora on bad hard drive.
2. Boot.

  

Actual Results:  I was disappointed I wasted time installing Fedora on a hard drive that was failing.

Expected Results:  Anaconda could have used smartctl to check the drive's status.  It would have discovered several "pre-fail" and "old age" flags.

Additional info:

After I installed and then rebooted, I got an email from smartd about these errors:

Device: /dev/hda, 54 Currently unreadable (pending) sectors
Device: /dev/hda, 34 Offline uncorrectable sectors

smartctl -H /dev/hda reported PASS, although smartctl -a /dev/hda reported several "pre-fail" warnings.

Comment 1 Dan Hollis 2006-01-02 20:21:33 UTC
it reported PASS because it did pass. your drive has uncorrectable errors which
can be remapped, though it's a bit involved[1]. it would only FAIL if the drive
wasn't able to remap them or had some other problem. [2]

[1] http://smartmontools.sourceforge.net/BadBlockHowTo.txt
[2] http://smartmontools.sourceforge.net/ see "My ATA drive is failing its
self-tests, but its SMART health status is 'PASS'. What's going on?"

Comment 2 David Cantrell 2006-02-10 18:27:04 UTC
I like this suggestion for anaconda.  I plan to work on this post-FC5 since to
do it correctly, I don't really have enough time to get it done for FC5. 
smartmontools consists of userspace tools and to do this correctly, I'd prefer a
library with Python bindings.  Still, it's a cool idea and expect to see some
activity on this in rawhide after FC5.

Comment 3 Dan Hollis 2006-02-10 20:01:24 UTC
are there any known buggy disks/IDE controllers which would cause problems for
smart checks?

Comment 4 David Cantrell 2006-02-10 20:08:26 UTC
Absolutely, which is why work for this needs to begin in smartmontools.  I
haven't sat down to look at the smartmontools code closely, but I know that a
lot of controllers provide issues as well as certain manufacturers'
implementations of SMART.  Their FAQ mentions the well-known problem disks and
controllers.

Comment 5 Dan Hollis 2006-02-10 20:25:11 UTC
so installing and running smartmon by default as it is done in FC right now is
not wise? a RFE should be opened to request smartmon NOT be installed and
enabled by default... at least until smartmontools are fixed

Comment 6 David Cantrell 2006-07-17 14:54:47 UTC
I don't think I explained that correctly.  smartmontools works well and handles
a wide range of disks and controllers.  It knows about a lot of manufacturer
specifics.  What I was trying to point out is that we simply can't rely on a
pass/fail test for anaconda.  We'll probably have to look at a variety of SMART
data fields and decide from there.

So this isn't that smartmontools is bad, it's interpreting the information it
gives you in the report(s) programatically that's more difficult.

Comment 7 David Cantrell 2007-08-23 19:20:13 UTC
I don't think this is a feature we can realistically add to the installation
process.  We already have enough mechanisms that point out bad hardware in log
files.