Bug 508241

Summary: "badblocks" manpage should indicate that the tool has limited usefulness in 2009
Product: Red Hat Enterprise Linux 5 Reporter: David Tonhofer <bughunt>
Component: e2fsprogsAssignee: Eric Sandeen <esandeen>
Status: CLOSED UPSTREAM QA Contact: BaseOS QE <qe-baseos-auto>
Severity: low Docs Contact:
Priority: low    
Version: 5.5CC: mharris, sct
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-01-26 20:50:14 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description David Tonhofer 2009-06-26 10:19:20 UTC
Description of problem:

At http://www.mail-archive.com/linux-users@it.canterbury.ac.nz/msg22980.html the limited usefulness of "badblocks" in the age of disks with sector-reallocation features is explained.

"badblocks does not have any low-level knowledge of the disk, so it's just doing normal I/O to the device file using fairly plain open/read/write/close calls.  If the disk decides to remap sectors during a badblocks run, there's nothing badblocks can do to detect this, and you're better off having the disk remaps the sectors (and have the bad sectors hidden permanently) than adding the bad sectors to the filesystem's bad blocks list and having to remember to
re-add those blocks to the list next time you format the partition."

Of course, "badblocks" may still may the disk actually reallocate those bad sectors. 

An appropriate notice should appear in the "badblocks" man page.

Version-Release number of selected component (if applicable):

e2fsprogs-1.39-20.el5

Comment 1 Mike A. Harris 2009-12-06 12:35:10 UTC
What happens though when the disk has maxed out the number of bad blocks it can remap and new bad blocks actually are operating system visible?  I run the badblocks command in order to discover when a disk has actual bad blocks on it that can _not_ be remapped by the drive because it is maxed out.

Perhaps smartmontools is the better way to handle this in the installer nowadays?  It could be integrated into the installer and could inform you that your disks are bad without having to scan the entire surface of the drive.  The added bonus is that it can detect many other drive failure modes, and do so before you've already installed the OS and possibly have important data on it.

I'll file an RFE for this if others think it is also a good idea.

The way it stands right now, it looks like you have to boot the installer up, switch to another tty, and manually run badblocks on the raw disk prior to letting the installation proceed, or running badblocks on the drive in another computer in order to guarantee the drive hasn't maxed out its bad block remapping.

Comment 2 Eric Sandeen 2009-12-07 17:13:07 UTC
(In reply to comment #1)
> What happens though when the disk has maxed out the number of bad blocks it can
> remap and new bad blocks actually are operating system visible?  

Then I think it's time for a new disk, as more will keep coming....

> I run the
> badblocks command in order to discover when a disk has actual bad blocks on it
> that can _not_ be remapped by the drive because it is maxed out.
> 
> Perhaps smartmontools is the better way to handle this in the installer
> nowadays?  It could be integrated into the installer and could inform you that
> your disks are bad without having to scan the entire surface of the drive.  The
> added bonus is that it can detect many other drive failure modes, and do so
> before you've already installed the OS and possibly have important data on it.
> 
> I'll file an RFE for this if others think it is also a good idea.

I think it could be, though I am no expert on SMART - if it's reliable then yes, this sounds like a nice enhancement for the installer.

-Eric

Comment 3 Mike A. Harris 2009-12-14 08:21:07 UTC
> Then I think it's time for a new disk, as more will keep coming...

Then we agree, which is why I wanted to run badblocks on the disk to find out whether or not it has bad sectors on it which are OS visible, so it can be replaced.  ;o)

> I think it could be, though I am no expert on SMART - if it's reliable then
> yes, this sounds like a nice enhancement for the installer.

I have been experimenting with smartctl in a kickstart %pre installation script since I posted the above.  The good news is that it works, and it works fast enough that you didn't even know it was there.  Also good news is that it is included on the installation media so I didn't have to jump through any tricks to be able to use it.  In fact, in non-kickstart installs you can jump over to tty2 and run smartctl manually before proceeding, so even if it isn't integrated into anaconda automatically, it is still available for admins and advanced users, which is good enough for me.

I do think that it would make a good feature to be integrated into anaconda directly though, possibly just automatically running some fast tests just prior to partitioning, and not even mentioning it to the user unless a problem is detected.  Then, if a problem occurs it can pop up a dialog informing the user that their hard disk has reported either an existing failure, or is reporting that the drive is degraded and may fail soon if it is not replaced, offering the user an option of whether or not to proceed, with the default being 'no'.

One thing I'm not certain of though, is if automatically running these tests is safe to do on all hardware, or if there is any possibility of it crashing certain hardware or some other problems.  If there are potential problems though and it isn't possible to easily detect those scenarios and just silently avoid them, it might be better to disable the scan by default, but offer it as an option to the user just prior to partitioning, right on the partitioning screen "[ ] Perform SMART tests on hard disk(s)".

Seems like a good idea to me, but as long as smartctl is left on the CD/DVD, I'm good either way as kickstart is my friend. ;o)

Comment 4 Eric Sandeen 2009-12-15 16:14:41 UTC
as far as I know, it should be safe to issue a smart cmd.

Many bioses do this on bootup ...

Want to ping the anaconda guys or file a feature request?

-Eric

Comment 5 Eric Sandeen 2011-01-26 20:50:14 UTC
This is really just a small documentation enhancement, IMHO, which should first be addressed upstream.  If you have proposals for manpage additions, it would be great if you would send them to the linux-ext4.org list for discussion.

If you feel this is critical for RHEL5 inclusion, please consider escalating through your RHEL support channels.

Thanks,
-Eric