Bug 214502

Summary: disk errors during bootup
Product: [Fedora] Fedora Reporter: Gene Czarcinski <gczarcinski>
Component: smartmontoolsAssignee: Dave Jones <davej>
Status: CLOSED RAWHIDE QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 6CC: calvin.ostrum, pfrields, robatino, tmraz, wtogami
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: smartmontools-5.36-8.fc7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-02-22 12:19:09 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
bootup/showon of /var/log/messages
none
log when smart turned on BEFORE starting smartd none

Description Gene Czarcinski 2006-11-07 21:27:53 UTC
Description of problem:

I am at a loss to pin-point just which package has the proble.

I have two systems:

#1 -- ABIT mobo with Athlon 64 4400+ (dual) processor, 2GB ram, one 200GB PATA
drive and two 300GB SATA drives configured as a single LVM volume group (all
Maxtor), x86_64 systems.  Works fine with both FC5 and FC6 running the latest
kernels in testing -- 2.6.18-1.2224.fc5 for FC5 and 2.6.18-1.2835.fc6 for FC6
... looking at the (src.rpm) patch list indicates these are pretty close.

#2 -- ASUS A8N-E mobo with Athlon 64 4400+ (dual) processor, 2GB ram, two 120GB
PATA drives and two 500GB SATA drives configured as a single LVM volume group
(all Maxtor), x86_64 systems.  FC5 works fine with both the 2.6.18-1.2200.fc5
and the 2.6.18-1.2224 kernels  ... I also tried the 2.6.18-1.2835.fc6 kernel
from FC6 and it works fine also.  However, after installing FC6, I am getting
bootup time disk errors (see attached portion of /var/log/messages which covers
one bootup/shutdown).

OK, it has got to be the kernel ... wrong ... on the FC6 system I installed and
tried the latest testing update 2.6.18-1.2835.fc6 as well as the FC5 testing
update 2.6.18-1.2224.fc5 ... no difference ... I still get the errors.

A portion of the log indicating thje errors is:

hda: drive_cmd: status=0x51 { DriveReady SeekComplete Error }
hda: drive_cmd: error=0x04 { DriveStatusError }
ide: failed opcode was: 0xb0
hdb: drive_cmd: status=0x51 { DriveReady SeekComplete Error }
hdb: drive_cmd: error=0x04 { DriveStatusError }
ide: failed opcode was: 0xb0
...
smartd version 5.36 [x86_64-redhat-linux-gnu] Copyright (C) 2002-6 Bruce Allen 
...
SCSI device sda: drive cache: write back
ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata3.00: tag 0 cmd 0xb0 Emask 0x1 stat 0x51 err 0x4 (device error)
ata3: EH complete
ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata3.00: tag 0 cmd 0xb0 Emask 0x1 stat 0x51 err 0x4 (device error)
ata3: EH complete
ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata3.00: tag 0 cmd 0xb0 Emask 0x1 stat 0x51 err 0x4 (device error)
ata3: EH complete
ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
...
See  the attached log for the whole thing.

Comment 1 Gene Czarcinski 2006-11-07 21:27:53 UTC
Created attachment 140603 [details]
bootup/showon of /var/log/messages

Comment 2 Tomas Mraz 2006-11-07 21:52:48 UTC
Definitely not smartmontools problem. This must be either hardware,
configuration or kernel problem.


Comment 3 Gene Czarcinski 2006-11-07 22:06:23 UTC
There may be something wrong in the kernel but it is something unique in FC6
that is triggering this error.  FC5 did not (and still does not) have problems
with this hardware regardless of the kernel whereas FC6 does (regardless of
which kernel I tried).

Comment 4 Dave Jones 2006-11-08 03:10:10 UTC
hda: drive_cmd: status=0x51 { DriveReady SeekComplete Error }
hda: drive_cmd: error=0x04 { DriveStatusError }

51/04 is the drive saying "I don't know what that command means" after being
told to do something. It's not an error.


ide: failed opcode was: 0xb0

0xb0 happens to be a SMART command. Your hard disk doesn't support SMART.


Comment 5 Gene Czarcinski 2006-11-08 23:05:13 UTC
Dave -- you are correct about the error but incorrect about the cause ... The
drive does support smart but (apparently) had NOT been enabled by smartd before
smartd attempted a smart-related operation ... I believe there is a race
condition in smartd which (for some reason) only shows itself with FC6 (both
i386 and x86_64) and this particular system but not on my other (very, very)
similar system ... mobo difference.

I have disabled automatic startup of smartd during bootup.

After bootup, I run "tail -f /var/log/messages | tee xxx.log" to capture
information.

1. If I start smartd, I get the errors as shown in the attached /var/log/messages.

2. However, if I first enable smart with:
   smartctl -s on /dev/hda
   smartctl -d ata -s on /dev/sda
   smartctl -d ata -s on /dev/sdb
and then start smartd, I get no errors (see attached log).

3. After bootup, if I run "smartctl -a /dev/hda" before enabling smart on the
drive, I get the erros you (Dave) indicated.

Comment 6 Gene Czarcinski 2006-11-08 23:06:41 UTC
Created attachment 140727 [details]
log when smart turned on BEFORE starting smartd

Comment 7 Tomas Mraz 2006-11-09 07:31:36 UTC
Isn't it just possible to enable SMART on disks in BIOS?

Please report the problem upstream. http://smartmontools.sourceforge.net/


Comment 8 Tomas Mraz 2006-11-09 08:55:29 UTC
Also can you try smartmontools 5.33 from original FC-5 install?

Comment 9 Gene Czarcinski 2006-11-09 12:19:09 UTC
The hardware system has two (among other) partitions ... one with Fc6 x86_64
installed (hda6), one with FC6 x86_64 installed (hda5), and one with FC6 i386
install (hda7 just to see if that made a difference).  Same BIOS settings, same
BIOS, same hardware.

I tried FC5 and FC6 kernels on both FC5 and FC6 systems.  I also tried
smartmontools 5.36-fc5.1 from FC5 on the FC6 system ... no differences: no
errors with FC5 and the errors with FC6.  I believe there is something about FC6
(not necessarily smartmontools) in combination with this mobo which causes this
problem.

At this point I have a simple workaround -- create a simple init script which is
started earlier than smartd and enables smart on all the disks .. kludgy but it
should work and now that I understand the problem, the errors are not anyway
near as serious.

In the interest of getting things fixed, I intend to:

1. bootup FC5 with autostart of smartd disable to see if something else is
enabling smartd.

2. Reporting the problem upstream ... ugh ... I can reproduce the problem on my
hardware (one system) but it may be a problem for others (such as the two FC6
system I have which do not have the problem).

3.  Take a look at the smartmontools code to see if another set of eyes can see
something.

Comment 10 Calvin Ostrum 2007-02-15 07:50:27 UTC
It is interesting to see the comment in the smartcrl man page for the -s option:

"-s VALUE, --smart=VALUE Enables  or disables SMART on device.  The valid
arguments to this option are on and off.  Note that the command ´-s on´ (perhaps
used with with the ´-o on´ and ´-S on´ options) should be placed in a startup
script for your machine,  for  example  in rc.local  or  rc.sysinit.  In
principle the SMART feature settings are preserved over power-cycling, but it
doesn´t hurt to be sure."

It is a little unfortunate that this piece of advice is quite buried in the man
page, because without it, quite a few people are probably getting these somewhat
frightening errors (as I did also).  

Of course, it will not do to put the command into the rc.local file, since the
smart daemon is started up before that. 



Comment 11 Calvin Ostrum 2007-02-16 22:22:26 UTC
Okay, I believe I have found the exact problem, and it is not smartd.  (First I
downloaded the smartd code and looked at it, finding no problem, so I then
looked elswhere).

The problem is in the smartd init.d file for FC6, at
/etc/rc.d/init.d/smartd.   It calls a configuration program
written in python, at /usr/sbin/smartd-conf.py.

If smartd finds that the disks are not SMART enabled, it will
enable them before trying to use them, but smartd-conf.py
does not do so.  It tries to call smartcrl without first 
enabling the drives.

The offending line is:

   status = os.system("/usr/sbin/smartctl -i %s%s 2>&1 >/dev/null" %
		(driver, drive.device))

On my system I simply changed it to:

 status = os.system("/usr/sbin/smartctl -s on -i %s%s 2>&1 >/dev/null" %
		(driver, drive.device))

I also note that the version of smartd-conf.py on FC5 does *not*
issue any such smartctl calls, so that is why the problem popped up
with FC6.

Comment 12 Tomas Mraz 2007-02-22 12:19:09 UTC
Thanks for the investigation. I've added the '-s on' to smartmontools-5.36-8.fc7.



Comment 13 Andre Robatino 2007-05-11 16:45:30 UTC
  This fix is not in the latest update smartmontools-5.37-1.1.fc6.  Is it in F7t4?