Bug 100947 - smartd segfaults
Summary: smartd segfaults
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel-utils
Version: 3.0
Hardware: x86_64
OS: Linux
low
medium
Target Milestone: ---
Assignee: Arjan van de Ven
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: FC2Blocker
TreeView+ depends on / blocked
 
Reported: 2003-07-27 19:45 UTC by Christopher McCrory
Modified: 2007-11-30 22:06 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2004-05-07 03:24:23 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Instructions on debugging smartd (1.88 KB, text/plain)
2003-07-30 08:04 UTC, Bruce Allen
no flags Details
Instructions on debugging smartd (1.88 KB, text/plain)
2003-07-30 08:04 UTC, Bruce Allen
no flags Details

Description Christopher McCrory 2003-07-27 19:45:23 UTC
Description of problem:
smartd segfaults on scsi drives

Version-Release number of selected component (if applicable):
[chrismcc@office180 chrismcc]$ rpm -qf /usr/sbin/smart*
kernel-utils-2.4-8.32
kernel-utils-2.4-8.32


How reproducible:
always

Steps to Reproduce:
1. edit /etc/smartd.conf 
2. un comment DEVICESCAN
3.
    
Actual results:
smartd[1355]: segfault at 000000000000000c rip 0000000000403148 rsp
0000007fbffff6d0 error 6


Expected results:

works


Additional info:

same with:
/dev/sda -d scsi
/dev/sdb -d scsi
/dev/sdc -d scsi
/dev/sdd -d scsi

Comment 1 Christopher McCrory 2003-07-27 20:11:02 UTC
more info:
copy smartd from taroon-i386:

[chrismcc@office180 tmp]$ sudo ./smartd -d
Password(for chrismcc on office180):
smartd version 5.1-11 Copyright (C) 2002-3 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
 
Using configuration file /etc/smartd.conf
Device: /dev/sda, opened
Device: /dev/sda, is SMART capable. Adding to "monitor" list.
Device: /dev/sdb, opened
Device: /dev/sdb, is SMART capable. Adding to "monitor" list.
Device: /dev/sdc, opened
Device: /dev/sdc, is SMART capable. Adding to "monitor" list.
Device: /dev/sdd, opened
Device: /dev/sdd, is SMART capable. Adding to "monitor" list.
Started monitoring 0 ATA and 4 SCSI devices
Device: /dev/sda, Acceptable asc,ascq: 0,0
Device: /dev/sdb, Acceptable asc,ascq: 0,0
Device: /dev/sdc, Acceptable asc,ascq: 0,0
Device: /dev/sdd, Acceptable asc,ascq: 0,0
Device: /dev/sdd, Temperature changed -71 degrees to 30 degrees since last reading



more from default x86_64 build:
[chrismcc@office180 tmp]$ sudo /usr/sbin/smartd -d
smartd version 5.1-11 Copyright (C) 2002-3 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
 
Using configuration file /etc/smartd.conf
Device: /dev/sda, opened
Device: /dev/sda, is SMART capable. Adding to "monitor" list.
Segmentation fault



Comment 2 Bruce Allen 2003-07-28 16:28:55 UTC
Please attache your /etc/smartd.conf file to the bug report.

I think that this is a segv that was reported and fixed already.  From the
changelog:

  [BA] smartd: if DEVICESCAN Directive used in smartd.conf, and
       -I, -R or -r Directives used in conjunction with this, got
       segv errors.  Fixed by correcting memory allocation calls.

If so you should be able to fix this by downloading 5.1-14 from
http://smartmontools.sourceforge.net/

Comment 3 Bruce Allen 2003-07-28 17:32:16 UTC
An additional question -- are you running on a 64-bit platform?

Also, I think I didn't understand your previous post.  Do you see the
segv in BOTH of the following cases:

case 1:
DEVICESCAN enabled, no disks listed

case 2:
DEVICESCAN commented out
/dev/sda - /dev/sdd explicitly listed?

Cheers,



Comment 4 Bruce Allen 2003-07-28 21:03:23 UTC
Chris,

Sorry for responding three times in a row to your post. I think I now understand
-- please tell me if this is right:

(1) You are running smartd on 64-bit X86
(2) You see segv with DEVICESCAN
(3) You see segv with a list of four scsi devices and DEVICESCAN disabled
(4) If you use the 32-bit binary, then smartd appears to work correct.

If this is correct, I am unfortunately not surprised.  It's only during the
past few weeks that we've made smartmontools endian-order independent and
have worked on making it 64-bit clean.  In fact there was some slightly odd
SCSI behavior noted on 64-bit hardware.  Our main SCSI developer should be
looking at this soon, and I've added him to the CC list.

If what I have said above is correct, then I suggest that you download the
latest code from the CVS server at the smartmontools home page, and give that
a try.  It's more recent code than any of the releases.

It would also be very helpful if you could post the output of
smartctl -a /dev/sda
and likewise for the other three disks.

Cheers,
   Bruce


Comment 5 Christopher McCrory 2003-07-28 23:12:53 UTC
> (1) You are running smartd on 64-bit X86

yes . dual opteron


> (2) You see segv with DEVICESCAN

yes

> (3) You see segv with a list of four scsi devices and DEVICESCAN disabled

yes

> (4) If you use the 32-bit binary, then smartd appears to work correct.

yes



I'll fire it up and give it a shot




Comment 6 Christopher McCrory 2003-07-30 02:18:19 UTC
[chrismcc@taroon smartmontools-5.1-14]$ ./smartctl -a /dev/sda
smartctl version 5.1-14 Copyright (C) 2002-3 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
 
Smartctl open device: /dev/sda failed: Permission denied
[chrismcc@taroon smartmontools-5.1-14]$ sudo ./smartctl -a /dev/sda
smartctl version 5.1-14 Copyright (C) 2002-3 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
 
Device: SEAGATE  ST336607LC       Version: 0005
Serial number: 3JA1RGLD00007346LY3W
Device type: disk
Local Time is: Tue Jul 29 19:17:22 2003 PDT
Device supports SMART and is Enabled
Temperature Warning Enabled
SMART Health Status: OK
Current Drive Temperature:     27 C
Drive Trip Temperature:        68 C
 
Error counter log:
          Errors Corrected    Total      Total   Correction     Gigabytes    Total
              delay:       [rereads/    errors   algorithm      processed   
uncorrected
            minor | major  rewrites]  corrected  invocations   [10^9 bytes]  errors
read:      38588        0         0     38588      38588        223.152           0
write:         0        0         0         0          0        113.923           0
 
Non-medium error count:       73
 
SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err
[SK ASC ASQ]
     Description                              number   (hours)
# 1  Background short  Completed                   -     1                   -
[-   -    -]
# 2  Background short  Completed                   -     1                   -
[-   -    -]
 
Long (extended) Self Test duration: 768 seconds [12.8 minutes]
[chrismcc@taroon smartmontools-5.1-14]$ sudo ./smartctl -a /dev/sdb
smartctl version 5.1-14 Copyright (C) 2002-3 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
 
Device: WDIGTL   WD91 ULTRA2      Version: 1.00
Serial number: WT7160066181
Device type: disk
Local Time is: Tue Jul 29 19:17:24 2003 PDT
Device supports SMART and is Enabled
Temperature Warning Enabled
SMART Health Status: OK
Current Drive Temperature:     29 C
 
Error counter log:
          Errors Corrected    Total      Total   Correction     Gigabytes    Total
              delay:       [rereads/    errors   algorithm      processed   
uncorrected
            minor | major  rewrites]  corrected  invocations   [10^9 bytes]  errors
read:          0        0      7965      7965          0          0.000           0
write:         0        0       470       470          0          0.000           0
 
Non-medium error count:        0
Device does not support Self Test logging
[chrismcc@taroon smartmontools-5.1-14]$ sudo ./smartctl -a /dev/sdc
smartctl version 5.1-14 Copyright (C) 2002-3 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
 
Device: WDIGTL   WD91 ULTRA2      Version: 1.00
Serial number: WT7160057106
Device type: disk
Local Time is: Tue Jul 29 19:17:25 2003 PDT
Device supports SMART and is Enabled
Temperature Warning Enabled
SMART Health Status: OK
Current Drive Temperature:     29 C
 
Error counter log:
          Errors Corrected    Total      Total   Correction     Gigabytes    Total
              delay:       [rereads/    errors   algorithm      processed   
uncorrected
            minor | major  rewrites]  corrected  invocations   [10^9 bytes]  errors
read:          0        0      3337      3337          0          0.000           0
write:         0        0       168       168          0          0.000           0
 
Non-medium error count:        3
Device does not support Self Test logging





[chrismcc@taroon smartmontools-5.1-14]$ sudo ./smartd -d
smartd version 5.1-14 Copyright (C) 2002-3 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
 
Using configuration file /etc/smartd.conf
Device: /dev/sda, opened
Device: /dev/sda, is SMART capable. Adding to "monitor" list.
Segmentation fault





Comment 7 Bruce Allen 2003-07-30 08:04:01 UTC
Created attachment 93254 [details]
Instructions on debugging smartd

This contains step-by-step instructions on how to download the latest smartd
code from sourceforge, compile it for debugging, and run it under a debugger.

Comment 8 Bruce Allen 2003-07-30 08:04:58 UTC
Created attachment 93255 [details]
Instructions on debugging smartd

This contains step-by-step instructions on how to download the latest smartd
code from sourceforge, compile it for debugging, and run it under a debugger.

Comment 9 Bruce Allen 2003-07-30 08:05:21 UTC
Christopher,

Thanks for doing as I asked.  I can't see what's wrong, though obviously
something is.  Instead of further guessing, I'd like you to run the code under a
debugger to identify the exact line where something is going wrong.  I'm
attaching a text file with step-by-step instructions. I just went through each
step and it took me < 5 minutes.  You can cut-and-paste from my instructions.

Cheers,
     Bruce

Comment 10 Christopher McCrory 2003-07-30 16:31:17 UTC
:) you might not like this

smartd.conf:
/dev/sda -d scsi
/dev/sdb -d scsi
/dev/sdc -d scsi
/dev/sdd -d scsi


[root@taroon smartmontools-5.1-14]# grep ^CFLAGS Makefile | tail -n 1
CFLAGS   = -fsigned-char -Wall
[root@taroon smartmontools-5.1-14]# make  >/dev/null 2>&1


[root@taroon smartmontools-5.1-14]# ./smartd -d
smartd version 5.1-14 Copyright (C) 2002-3 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
 
Using configuration file /etc/smartd.conf
Device: /dev/sda, opened
Device: /dev/sda, is SMART capable. Adding to "monitor" list.
Device: /dev/sdb, opened
Device: /dev/sdb, is SMART capable. Adding to "monitor" list.
Device: /dev/sdc, opened
Device: /dev/sdc, is SMART capable. Adding to "monitor" list.
Device: /dev/sdd, opened
Device: /dev/sdd, is SMART capable. Adding to "monitor" list.
Started monitoring 0 ATA and 4 SCSI devices
Device: /dev/sda, Acceptable asc,ascq: 0,0
Device: /dev/sda, Temperature changed -229 degrees to 26 degrees since last reading
Device: /dev/sdb, Acceptable asc,ascq: 0,0
Device: /dev/sdc, Acceptable asc,ascq: 0,0
Device: /dev/sdc, Temperature changed -227 degrees to 28 degrees since last reading
Device: /dev/sdd, Acceptable asc,ascq: 0,0
Device: /dev/sdd, Temperature changed -57 degrees to 28 degrees since last reading


It worked !!!!

[root@taroon smartmontools-5.1-14]# make clean
rm -f *.o smartctl smartd *~ \#*\# smartmontools*.tar.gz smartmontools*.rpm
temp.* smart*.8.gz smart*.5.gz
[root@taroon smartmontools-5.1-14]# grep ^CFLAGS Makefile | tail -n 1
CFLAGS   = -fsigned-char -Wall -O2 -g
[root@taroon smartmontools-5.1-14]# make  >/dev/null 2>&1
[root@taroon smartmontools-5.1-14]# ./smartd -d
smartd version 5.1-14 Copyright (C) 2002-3 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
 
Using configuration file /etc/smartd.conf
Device: /dev/sda, opened
Device: /dev/sda, is SMART capable. Adding to "monitor" list.
Segmentation fault


Ouch!   -O2 made it fail






Comment 11 Christopher McCrory 2003-07-30 16:35:23 UTC
[root@taroon smartmontools-5.1-14]# gdb ./smartd
GNU gdb Red Hat Linux (5.3.90-0.20030710.3rh)
Copyright 2003 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...
(gdb) run -d
Starting program: /home/chrismcc/redhat/BUILD/smartmontools-5.1-14/smartd -d
smartd version 5.1-14 Copyright (C) 2002-3 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
 
Using configuration file /etc/smartd.conf
Device: /dev/sda, opened
Device: /dev/sda, is SMART capable. Adding to "monitor" list.
 
Program received signal SIGSEGV, Segmentation fault.
scsidevicescan (devices=0x7fbffff440, cfg=0xf) at smartd.c:854
854         cfg->tryata = 0;
(gdb) where
#0  scsidevicescan (devices=0x7fbffff440, cfg=0xf) at smartd.c:854
#1  0x00000000004057af in main (argc=0, argv=0x0) at smartd.c:2187
(gdb) list
2187          if (scsidevicescan(scsidevicesptr+numscsidevices, config+i))
2188            cantregister(config[i].name, "SCSI", config[i].lineno,
scandirective);
2189          else
2190            notregistered=0;
2191        }
2192
2193        // if device is explictly listed and we can't register it, then exit
unless
2194        // the user has specified that the device is removable
2195        if (notregistered && !scanning){
2196          if (config[i].removable)
(gdb) up
#1  0x00000000004057af in main (argc=0, argv=0x0) at smartd.c:2187
2187          if (scsidevicescan(scsidevicesptr+numscsidevices, config+i))
(gdb) list
2182            notregistered=0;
2183        }
2184
2185        // then register SCSI devices
2186        if (config[i].tryscsi){
2187          if (scsidevicescan(scsidevicesptr+numscsidevices, config+i))
2188            cantregister(config[i].name, "SCSI", config[i].lineno,
scandirective);
2189          else
2190            notregistered=0;
2191        }
(gdb) list
2192
2193        // if device is explictly listed and we can't register it, then exit
unless
2194        // the user has specified that the device is removable
2195        if (notregistered && !scanning){
2196          if (config[i].removable)
2197            printout(LOG_INFO, "Device %s not available\n", config[i].name);
2198          else {
2199            printout(LOG_CRIT, "Unable to register device %s - exiting.\n",
config[i].name);
2200            exit(EXIT_BADDEV);
2201          }
(gdb) list
2202        }
2203      } // done registering entries
2204
2205      // If there are no devices to monitor, then exit
2206      if (!numatadevices && !numscsidevices){
2207        printout(LOG_INFO,"Unable to monitor any SMART enabled ATA or SCSI
devices.\n");
2208        exit(EXIT_BADDEV);
2209      }
2210
2211      // Now start an infinite loop that checks all devices
(gdb) up
Initial frame selected; you cannot go up.
(gdb)





Comment 12 Christopher McCrory 2003-07-30 16:37:04 UTC
[root@taroon smartmontools-5.1-14]# make clean
rm -f *.o smartctl smartd *~ \#*\# smartmontools*.tar.gz smartmontools*.rpm
temp.* smart*.8.gz smart*.5.gz
[root@taroon smartmontools-5.1-14]# grep ^CFLAGS Makefile | tail -n 1
CFLAGS   = -fsigned-char -Wall -O2 -g
[root@taroon smartmontools-5.1-14]# make
gcc -fsigned-char -Wall -O2 -g -DHAVE_GETOPT_H -DHAVE_GETOPT_LONG -c atacmds.c
atacmds.c: In function `ataVersionInfo':
atacmds.c:703: warning: int format, different type arg (arg 2)
atacmds.c:710: warning: int format, different type arg (arg 2)
gcc -fsigned-char -Wall -O2 -g -DHAVE_GETOPT_H -DHAVE_GETOPT_LONG -c
ataprint.cgcc -fsigned-char -Wall -O2 -g -DHAVE_GETOPT_H -DHAVE_GETOPT_LONG -c
scsicmds.cscsicmds.c: In function `linux_do_scsi_cmnd_io':
scsicmds.c:199: warning: int format, different type arg (arg 2)
scsicmds.c:245: warning: int format, different type arg (arg 2)
scsicmds.c:261: warning: int format, different type arg (arg 2)
gcc -fsigned-char -Wall -O2 -g -DHAVE_GETOPT_H -DHAVE_GETOPT_LONG -c utility.c
gcc -fsigned-char -Wall -O2 -g -DHAVE_GETOPT_H -DHAVE_GETOPT_LONG -c knowndrives.c
gcc -DSMARTMONTOOLS_VERSION=14 -o smartd -fsigned-char -Wall -O2 -g
-DHAVE_GETOPT_H -DHAVE_GETOPT_LONG  smartd.c \
                                      atacmds.o ataprint.o knowndrives.o
scsicmds.o utility.o
gcc -fsigned-char -Wall -O2 -g -DHAVE_GETOPT_H -DHAVE_GETOPT_LONG -c scsiprint.c
gcc -DSMARTMONTOOLS_VERSION=14 -o smartctl -fsigned-char -Wall -O2 -g
-DHAVE_GETOPT_H -DHAVE_GETOPT_LONG  smartctl.c \
                                      atacmds.o ataprint.o knowndrives.o
scsicmds.o scsiprint.o utility.o
 
 
Smartd can now use a configuration file /etc/smartd.conf. Do:
 
        man ./smartctl.8
        man ./smartd.8
        man ./smartd.conf.5
 
to read the manual pages now.  Unless you do a "make install" the manual pages
won't be installed.
 
[root@taroon smartmontools-5.1-14]#




Comment 13 Bruce Allen 2003-07-30 20:01:10 UTC
Hi Christopher,

Thank you very much for doing this.  It's very interesting.  I think that this
might be because you are using 5.1-14, and not the current version of the code
from CVS.  In particular, I think that the compilation warning messages no
longer appear because of some explicit typecasts in the code.  This was fixed
about three weeks ago.  This might be what breaks -O2.

Could you please repeat your experiment, but this time after downloading the
code from CVS as explained in the attachment file: "Instructions on debugging
smartd"?
[You used a .tar.gz file containing the 5.1-14 release, not the latest CVS code.
 You'll know when you have the right one because it will identify itself as
release 5.1-15]

Cheers,
     Bruce


Comment 14 Christopher McCrory 2003-07-30 22:01:35 UTC
Yep, I skipped the cvs part.  I had the older code and could whip it out real
quick before I erased the whole system and reinstalled.

that now done.



[chrismcc@taroon64 sm5]$ grep ^CFLAGS Makefile | tail -n 1
CFLAGS   = -fsigned-char -Wall     -g
[chrismcc@taroon64 sm5]$ make clean all
rm -f *.o smartctl smartd *~ \#*\# smartmontools*.tar.gz smartmontools*.rpm
temp.* smart*.8.gz smart*.5.gz
gcc -fsigned-char -Wall     -g -DHAVE_GETOPT_H -DHAVE_GETOPT_LONG -c atacmdnames.c
gcc -fsigned-char -Wall     -g -DHAVE_GETOPT_H -DHAVE_GETOPT_LONG -c atacmds.c
gcc -fsigned-char -Wall     -g -DHAVE_GETOPT_H -DHAVE_GETOPT_LONG -c ataprint.c
gcc -fsigned-char -Wall     -g -DHAVE_GETOPT_H -DHAVE_GETOPT_LONG -c scsicmds.c
gcc -fsigned-char -Wall     -g -DHAVE_GETOPT_H -DHAVE_GETOPT_LONG -c utility.c
gcc -fsigned-char -Wall     -g -DHAVE_GETOPT_H -DHAVE_GETOPT_LONG -c knowndrives.c
gcc -DSMARTMONTOOLS_VERSION=15 -o smartd -fsigned-char -Wall     -g
-DHAVE_GETOPT_H -DHAVE_GETOPT_LONG  smartd.c \
                                      atacmdnames.o atacmds.o ataprint.o
knowndrives.o scsicmds.o utility.o
gcc -fsigned-char -Wall     -g -DHAVE_GETOPT_H -DHAVE_GETOPT_LONG -c
scsiprint.cgcc -DSMARTMONTOOLS_VERSION=15 -o smartctl -fsigned-char -Wall     -g
-DHAVE_GETOPT_H -DHAVE_GETOPT_LONG  smartctl.c \
                                      atacmdnames.o atacmds.o ataprint.o
knowndrives.o scsicmds.o scsiprint.o utility.o
 
 
Smartd can now use a configuration file /etc/smartd.conf. Do:
 
        man ./smartctl.8
        man ./smartd.8
        man ./smartd.conf.5
 
to read the manual pages now.  Unless you do a "make install" the manual pages
won't be installed.
 
[chrismcc@taroon64 sm5]$ sudo ./smartd -d
smartd version 5.1-15 Copyright (C) 2002-3 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
 
Using configuration file /etc/smartd.conf
Device: /dev/sda, opened
Device: /dev/sda, is SMART capable. Adding to "monitor" list.
Device: /dev/sdb, opened
Device: /dev/sdb, is SMART capable. Adding to "monitor" list.
Device: /dev/sdc, opened
Device: /dev/sdc, is SMART capable. Adding to "monitor" list.
Device: /dev/sdd, opened
Device: /dev/sdd, is SMART capable. Adding to "monitor" list.
Started monitoring 0 ATA and 4 SCSI devices
Device: /dev/sda, Acceptable asc,ascq: 0,0
Device: /dev/sda, Temperature changed -229 degrees to 26 degrees since last reading
Device: /dev/sdb, Acceptable asc,ascq: 0,0
Device: /dev/sdc, Acceptable asc,ascq: 0,0
Device: /dev/sdc, Temperature changed -226 degrees to 29 degrees since last reading
Device: /dev/sdd, Acceptable asc,ascq: 0,0
Device: /dev/sdd, Temperature changed -56 degrees to 29 degrees since last reading



[chrismcc@taroon64 sm5]$ grep ^CFLAGS Makefile | tail -n 1
CFLAGS   = -fsigned-char -Wall -O2 -g
[chrismcc@taroon64 sm5]$ make clean all
rm -f *.o smartctl smartd *~ \#*\# smartmontools*.tar.gz smartmontools*.rpm
temp.* smart*.8.gz smart*.5.gz
gcc -fsigned-char -Wall -O2 -g -DHAVE_GETOPT_H -DHAVE_GETOPT_LONG -c atacmdnames.c
gcc -fsigned-char -Wall -O2 -g -DHAVE_GETOPT_H -DHAVE_GETOPT_LONG -c atacmds.c
gcc -fsigned-char -Wall -O2 -g -DHAVE_GETOPT_H -DHAVE_GETOPT_LONG -c ataprint.c
gcc -fsigned-char -Wall -O2 -g -DHAVE_GETOPT_H -DHAVE_GETOPT_LONG -c scsicmds.c
gcc -fsigned-char -Wall -O2 -g -DHAVE_GETOPT_H -DHAVE_GETOPT_LONG -c utility.c
gcc -fsigned-char -Wall -O2 -g -DHAVE_GETOPT_H -DHAVE_GETOPT_LONG -c knowndrives.c
gcc -DSMARTMONTOOLS_VERSION=15 -o smartd -fsigned-char -Wall -O2 -g
-DHAVE_GETOPT_H -DHAVE_GETOPT_LONG  smartd.c \
                                      atacmdnames.o atacmds.o ataprint.o
knowndrives.o scsicmds.o utility.o
gcc -fsigned-char -Wall -O2 -g -DHAVE_GETOPT_H -DHAVE_GETOPT_LONG -c
scsiprint.cgcc -DSMARTMONTOOLS_VERSION=15 -o smartctl -fsigned-char -Wall -O2 -g
-DHAVE_GETOPT_H -DHAVE_GETOPT_LONG  smartctl.c \
                                      atacmdnames.o atacmds.o ataprint.o
knowndrives.o scsicmds.o scsiprint.o utility.o
 
 
Smartd can now use a configuration file /etc/smartd.conf. Do:
 
        man ./smartctl.8
        man ./smartd.8
        man ./smartd.conf.5
 
to read the manual pages now.  Unless you do a "make install" the manual pages
won't be installed.



[root@taroon64 sm5]# ./smartd -d
smartd version 5.1-15 Copyright (C) 2002-3 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
 
Using configuration file /etc/smartd.conf
Device: /dev/sda, opened
Device: /dev/sda, is SMART capable. Adding to "monitor" list.
Segmentation fault



[root@taroon64 sm5]# gdb ./smartd
GNU gdb Red Hat Linux (5.3.90-0.20030710.6rh)
Copyright 2003 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...
(gdb) run -d
Starting program: /home/chrismcc/redhat/BUILD/sm5/smartd -d
smartd version 5.1-15 Copyright (C) 2002-3 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
 
Using configuration file /etc/smartd.conf
Device: /dev/sda, opened
Device: /dev/sda, is SMART capable. Adding to "monitor" list.
 
Program received signal SIGSEGV, Segmentation fault.
scsidevicescan (devices=0x7fbffff500, cfg=0xf) at smartd.c:861
861         cfg->tryata = 0;
(gdb) where
#0  scsidevicescan (devices=0x7fbffff500, cfg=0xf) at smartd.c:861
#1  0x000000000040582f in main (argc=0, argv=0x0) at smartd.c:2201
(gdb) list
2201          if (scsidevicescan(scsidevicesptr+numscsidevices, config+i))
2202            cantregister(config[i].name, "SCSI", config[i].lineno,
scandirective);
2203          else
2204            notregistered=0;
2205        }
2206
2207        // if device is explictly listed and we can't register it, then exit
unless
2208        // the user has specified that the device is removable
2209        if (notregistered && !scanning){
2210          if (config[i].removable)
(gdb) up
#1  0x000000000040582f in main (argc=0, argv=0x0) at smartd.c:2201
2201          if (scsidevicescan(scsidevicesptr+numscsidevices, config+i))
(gdb) list
2196            notregistered=0;
2197        }
2198
2199        // then register SCSI devices
2200        if (config[i].tryscsi){
2201          if (scsidevicescan(scsidevicesptr+numscsidevices, config+i))
2202            cantregister(config[i].name, "SCSI", config[i].lineno,
scandirective);
2203          else
2204            notregistered=0;
2205        }
(gdb) up
Initial frame selected; you cannot go up.
(gdb)







Comment 15 Christopher McCrory 2003-07-30 22:09:18 UTC
However it DOES WORK

without -O2 in CFLAGS

after adding a known bad drive:

[root@taroon64 sm5]# ./smartd -d
smartd version 5.1-15 Copyright (C) 2002-3 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
 
Using configuration file /etc/smartd.conf
Device: /dev/sda, opened
Device: /dev/sda, is SMART capable. Adding to "monitor" list.
Device: /dev/sdb, opened
Device: /dev/sdb, is SMART capable. Adding to "monitor" list.
Device: /dev/sdc, opened
Device: /dev/sdc, is SMART capable. Adding to "monitor" list.
Device: /dev/sdd, opened
Device: /dev/sdd, is SMART capable. Adding to "monitor" list.
Device: /dev/sde, opened
Device: /dev/sde, is SMART capable. Adding to "monitor" list.
Started monitoring 0 ATA and 5 SCSI devices
Device: /dev/sda, Acceptable asc,ascq: 0,0
Device: /dev/sda, Temperature changed -228 degrees to 27 degrees since last reading
Device: /dev/sdb, Acceptable asc,ascq: 0,0
Device: /dev/sdc, Acceptable asc,ascq: 0,0
Device: /dev/sdc, Temperature changed -226 degrees to 29 degrees since last reading
Device: /dev/sdd, Acceptable asc,ascq: 0,0
Device: /dev/sdd, Temperature changed -56 degrees to 29 degrees since last reading
Device: /dev/sde, SMART Failure: SERVO IMPENDING FAILURE SEEK ERROR RATE TOO HIGH
Device: /dev/sde, Temperature changed -80 degrees to 22 degrees since last reading



YEA!




Comment 16 Bruce Allen 2003-07-31 11:48:11 UTC
Christopher,

Thanks a lot for sticking with this.  I've gotten access to an ia64 system, but
unfortunately I can't reproduce the problem there.  If you have the patience to
stick with this, we'll have to continue "remote debugging".  While this may be a
gcc error, I've found too many (of my own) coding mistakes in porting code to be
ready to write to gcc-bugs quite yet.

Here are the next things to try:

(1) "For the record" please try -O1 -g.  If that SEGVs then continue
    with that executable, else with -O2.

(2) Let's check the address 0xf by hand:
    within gcc, use the "up" and "down" commands until you
    are on line 2201 in main().

    [From this point on, I need a transcript]

    Then do:
    print &config[0]
    print config[0]
    print sizeof(config[0])
    print &config[1]
    print config[0].tryata
    print &(config[0].tryata)

    Then use "down" until you are around line 816 in scsidevicescan().
    Now do:
    print cfg
    print *cfg
    print &(cfg->tryata)
    print cfg->tryata

    That's it!

Cheers,
    Bruce

Comment 17 Christopher McCrory 2003-07-31 19:21:43 UTC
-O1 segfaulted also

Program received signal SIGSEGV, Segmentation fault.
scsidevicescan (devices=0x7fbffff500, cfg=0xf) at smartd.c:861
861         cfg->tryata = 0;
(gdb) down
Bottom (i.e., innermost) frame selected; you cannot go down.
(gdb) up
#1  0x0000000000405843 in main (argc=12, argv=0xfc0) at smartd.c:2201
2201          if (scsidevicescan(scsidevicesptr+numscsidevices, config+i))
(gdb)    print &config[0]
$1 = (cfgfile *) 0x5207a0
(gdb)  print config[0]
$2 = {lineno = 19, scsidevicenum = 0, atadevicenum = 0, tryata = 1 '\001',
  tryscsi = 0 '\0', name = 0x5267f0 "/dev/hda", smartcheck = 1 '\001',
  usagefailed = 1 '\001', prefail = 1 '\001', usage = 1 '\001',
  selftest = 1 '\001', errorlog = 1 '\001', permissive = 0 '\0',
  autosave = 0 '\0', autoofflinetest = 0 '\0', maildata = {{logged = 0,
      lastsent = 0, firstsent = 0}, {logged = 0, lastsent = 0, firstsent = 0},
    {logged = 0, lastsent = 0, firstsent = 0}, {logged = 0, lastsent = 0,
      firstsent = 0}, {logged = 0, lastsent = 0, firstsent = 0}, {logged = 0,
      lastsent = 0, firstsent = 0}, {logged = 0, lastsent = 0, firstsent = 0},
    {logged = 0, lastsent = 0, firstsent = 0}, {logged = 0, lastsent = 0,
      firstsent = 0}, {logged = 0, lastsent = 0, firstsent = 0}},
  emailfreq = 1 '\001', emailtest = 0 '\0', emailcmdline = 0x0,
  address = 0x5269e0 "root", selflogcount = 0 '\0',
  ataerrorcount = 0, monitorattflags = 0x526840 "",
  attributedefs = 0x5268d0 "", fixfirmwarebug = 0 '\0',
  ignorepresets = 0 '\0', showpresets = 0 '\0', removable = 0 '\0'}
(gdb) print sizeof(config[0])
$3 = 336
(gdb)  print &config[1]
$4 = (cfgfile *) 0x5208f0
(gdb) print config[0].tryata
$5 = 1 '\001'
(gdb)  print &(config[0].tryata)
$6 = 0x5207ac "\001"
(gdb) down
#0  scsidevicescan (devices=0x7fbffff500, cfg=0xf) at smartd.c:861
861         cfg->tryata = 0;
(gdb)    print cfg
$7 = (cfgfile *) 0xf
(gdb) print *cfg
Cannot access memory at address 0xf
(gdb)  print &(cfg->tryata)
$8 = 0x1b <Address 0x1b out of bounds>
(gdb) print cfg->tryata
Cannot access memory at address 0x1b


yes?



Comment 18 Christopher McCrory 2003-07-31 19:25:12 UTC
> I've gotten access to an ia64 system, but
> unfortunately I can't reproduce the problem there. 

My install is taroon x86_86 with "everything" using all scsi drives

the system is from appro using a tyan S2880 Thunder K8S mainboard
the scsi controller is:


Fusion MPT base driver 2.05.05+
Copyright (c) 1999-2002 LSI Logic Corporation
mptbase: Initiating ioc0 bringup

02:0a.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X
Fusion-MPT Dual Ultra320 SCSI (rev 07)
02:0a.1 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X
Fusion-MPT Dual Ultra320 SCSI (rev 07)



Comment 19 Bruce Allen 2003-08-01 06:03:36 UTC
Hi Christopher,

OK, looks like the code is tickling a GCC bug.  Let's try and isolate the
problem by making what we want slightly more explicit for the compiler.

(1) in the directory sm5/, save a copy of smartd.c:
    cp smartd.c smartd.c.original

(2) make the following 3 changes:
    static int scsidevicescan(scsidevices_t *devices, cfgfile *cfg)
    ->
    int scsidevicescan(scsidevices_t *devices, cfgfile *cfg)

    if (scsidevicescan(scsidevicesptr+numscsidevices, config+i))
    ->
    if (scsidevicescan(scsidevicesptr+numscsidevices, &(config[i])))


    if (atadevicescan2(atadevicesptr+numatadevices, config+i))
    ->
    if (atadevicescan2(atadevicesptr+numatadevices, &(config[i])))

Then do "make clean", then "make" and try again.

[Note - it looks like you are now using /etc/smartd.conf containing
DEVICESCAN -m email@domain
it would simplify the debugging somewhat if you could simply use the
same thing you originally had, which I think was
/dev/sda
/dev/sdb
/dev/sdc
/dev/sdd
and nothing else.

Cheers,
   Bruce

Comment 20 Bruce Allen 2003-08-01 07:40:53 UTC
Christopher,

An afterthought -- please try what I said, but I am not sure it will work.  In
fact it probably won't.  But I think I have an idea what's happening. To
confirm, please do this:

BEFORE you type "run -d" at the (gdb) prompt, type
  break scsidevicescan
When you type "run -d" the program will stop just inside scsidevicescan.
Now, alternately, give the commands:
  print cfg
  print cfg->tryata
  n
[Note that n is shorthand for the command "next"]
Continue this process until the code segvs.  You may have to do this
more than a dozen times.  Note that within gdb, the up cursor arrow can
be used to return to previous lines to repeat them, which makes this easier.

I think that the address cfg is getting over-written, probably in the loop
starting just below the comment reading:

    // Flag that certain log pages are supported (information may be
    // available from other sources).

As usual, I'll need a transcript of the gdb session.

Cheers,
   Bruce



Comment 21 Bruce Allen 2003-08-15 10:58:36 UTC
Chris,

I still don't know if what you saw was a bug in smartd or a compiler bug. After
looking through the code for possible problems, I had some suspicions that this
could be a double-free() error corrupting the heap -- though I am not sure and
can not reproduce it on any of the systems I have tried.

Nevertheless, because I was worried about this, I did some substantial code
restructuring and am confident that the code that's now in CVS should be free of
any type of memory allocation bug. You (or other users who encounter this
problem and see this report) should check out the development code from CVS and
report if it has problems with -O2.  Instructions on CVS checkout are the first
four steps in the previous attached Instructions -- step 3 can be skipped.

Cheers,
    Bruce

Comment 22 Warren Togami 2004-04-25 04:10:29 UTC
We just ran into this segfault with FC2 Test3 on x86-64 running
kernel-utils-2.4-9.1.127, which contains smartmontools-5.21. 
smartmontools-5.30 seems to fix this issue.  Any chance we can upgrade
this for FC2 final?  How much risk would you judge an upgrade of 5.21
to 5.30 have?

Comment 23 Bruce Allen 2004-04-27 00:19:27 UTC
My advice to the FC maintainers is to upgrade the current
smartmontools (release 5.21) to the current 5.30 stable release.  The
5.30 release has been out there for about six weeks with thousands of
direct downloads and many times that in indirect downloads.  Reviewing
the changelog from release 5.21 to 5.30 there are many bugs fixed.  

Note that there IS one reported/fixed bug in the 5.30 release that can
cause a segv crash in smartd. The FC maintainers might consider
releasing a patched 5.30 release that fixes this.  The fix can be seen
by looking at the diffs between versions 1.302 and 1.303 of smartd.c,
in CVS here:
http://cvs.sourceforge.net/viewcvs.py/smartmontools/sm5/smartd.c?r1=1.302&r2=1.303
It's a one-line patch.

Cheers,
   Bruce Allen

Comment 24 Jeremy Katz 2004-05-07 03:24:23 UTC
This has been updated.


Note You need to log in before you can comment on or make changes to this bug.