Description of problem: smartd segfaults on scsi drives Version-Release number of selected component (if applicable): [chrismcc@office180 chrismcc]$ rpm -qf /usr/sbin/smart* kernel-utils-2.4-8.32 kernel-utils-2.4-8.32 How reproducible: always Steps to Reproduce: 1. edit /etc/smartd.conf 2. un comment DEVICESCAN 3. Actual results: smartd[1355]: segfault at 000000000000000c rip 0000000000403148 rsp 0000007fbffff6d0 error 6 Expected results: works Additional info: same with: /dev/sda -d scsi /dev/sdb -d scsi /dev/sdc -d scsi /dev/sdd -d scsi
more info: copy smartd from taroon-i386: [chrismcc@office180 tmp]$ sudo ./smartd -d Password(for chrismcc on office180): smartd version 5.1-11 Copyright (C) 2002-3 Bruce Allen Home page is http://smartmontools.sourceforge.net/ Using configuration file /etc/smartd.conf Device: /dev/sda, opened Device: /dev/sda, is SMART capable. Adding to "monitor" list. Device: /dev/sdb, opened Device: /dev/sdb, is SMART capable. Adding to "monitor" list. Device: /dev/sdc, opened Device: /dev/sdc, is SMART capable. Adding to "monitor" list. Device: /dev/sdd, opened Device: /dev/sdd, is SMART capable. Adding to "monitor" list. Started monitoring 0 ATA and 4 SCSI devices Device: /dev/sda, Acceptable asc,ascq: 0,0 Device: /dev/sdb, Acceptable asc,ascq: 0,0 Device: /dev/sdc, Acceptable asc,ascq: 0,0 Device: /dev/sdd, Acceptable asc,ascq: 0,0 Device: /dev/sdd, Temperature changed -71 degrees to 30 degrees since last reading more from default x86_64 build: [chrismcc@office180 tmp]$ sudo /usr/sbin/smartd -d smartd version 5.1-11 Copyright (C) 2002-3 Bruce Allen Home page is http://smartmontools.sourceforge.net/ Using configuration file /etc/smartd.conf Device: /dev/sda, opened Device: /dev/sda, is SMART capable. Adding to "monitor" list. Segmentation fault
Please attache your /etc/smartd.conf file to the bug report. I think that this is a segv that was reported and fixed already. From the changelog: [BA] smartd: if DEVICESCAN Directive used in smartd.conf, and -I, -R or -r Directives used in conjunction with this, got segv errors. Fixed by correcting memory allocation calls. If so you should be able to fix this by downloading 5.1-14 from http://smartmontools.sourceforge.net/
An additional question -- are you running on a 64-bit platform? Also, I think I didn't understand your previous post. Do you see the segv in BOTH of the following cases: case 1: DEVICESCAN enabled, no disks listed case 2: DEVICESCAN commented out /dev/sda - /dev/sdd explicitly listed? Cheers,
Chris, Sorry for responding three times in a row to your post. I think I now understand -- please tell me if this is right: (1) You are running smartd on 64-bit X86 (2) You see segv with DEVICESCAN (3) You see segv with a list of four scsi devices and DEVICESCAN disabled (4) If you use the 32-bit binary, then smartd appears to work correct. If this is correct, I am unfortunately not surprised. It's only during the past few weeks that we've made smartmontools endian-order independent and have worked on making it 64-bit clean. In fact there was some slightly odd SCSI behavior noted on 64-bit hardware. Our main SCSI developer should be looking at this soon, and I've added him to the CC list. If what I have said above is correct, then I suggest that you download the latest code from the CVS server at the smartmontools home page, and give that a try. It's more recent code than any of the releases. It would also be very helpful if you could post the output of smartctl -a /dev/sda and likewise for the other three disks. Cheers, Bruce
> (1) You are running smartd on 64-bit X86 yes . dual opteron > (2) You see segv with DEVICESCAN yes > (3) You see segv with a list of four scsi devices and DEVICESCAN disabled yes > (4) If you use the 32-bit binary, then smartd appears to work correct. yes I'll fire it up and give it a shot
[chrismcc@taroon smartmontools-5.1-14]$ ./smartctl -a /dev/sda smartctl version 5.1-14 Copyright (C) 2002-3 Bruce Allen Home page is http://smartmontools.sourceforge.net/ Smartctl open device: /dev/sda failed: Permission denied [chrismcc@taroon smartmontools-5.1-14]$ sudo ./smartctl -a /dev/sda smartctl version 5.1-14 Copyright (C) 2002-3 Bruce Allen Home page is http://smartmontools.sourceforge.net/ Device: SEAGATE ST336607LC Version: 0005 Serial number: 3JA1RGLD00007346LY3W Device type: disk Local Time is: Tue Jul 29 19:17:22 2003 PDT Device supports SMART and is Enabled Temperature Warning Enabled SMART Health Status: OK Current Drive Temperature: 27 C Drive Trip Temperature: 68 C Error counter log: Errors Corrected Total Total Correction Gigabytes Total delay: [rereads/ errors algorithm processed uncorrected minor | major rewrites] corrected invocations [10^9 bytes] errors read: 38588 0 0 38588 38588 223.152 0 write: 0 0 0 0 0 113.923 0 Non-medium error count: 73 SMART Self-test log Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ] Description number (hours) # 1 Background short Completed - 1 - [- - -] # 2 Background short Completed - 1 - [- - -] Long (extended) Self Test duration: 768 seconds [12.8 minutes] [chrismcc@taroon smartmontools-5.1-14]$ sudo ./smartctl -a /dev/sdb smartctl version 5.1-14 Copyright (C) 2002-3 Bruce Allen Home page is http://smartmontools.sourceforge.net/ Device: WDIGTL WD91 ULTRA2 Version: 1.00 Serial number: WT7160066181 Device type: disk Local Time is: Tue Jul 29 19:17:24 2003 PDT Device supports SMART and is Enabled Temperature Warning Enabled SMART Health Status: OK Current Drive Temperature: 29 C Error counter log: Errors Corrected Total Total Correction Gigabytes Total delay: [rereads/ errors algorithm processed uncorrected minor | major rewrites] corrected invocations [10^9 bytes] errors read: 0 0 7965 7965 0 0.000 0 write: 0 0 470 470 0 0.000 0 Non-medium error count: 0 Device does not support Self Test logging [chrismcc@taroon smartmontools-5.1-14]$ sudo ./smartctl -a /dev/sdc smartctl version 5.1-14 Copyright (C) 2002-3 Bruce Allen Home page is http://smartmontools.sourceforge.net/ Device: WDIGTL WD91 ULTRA2 Version: 1.00 Serial number: WT7160057106 Device type: disk Local Time is: Tue Jul 29 19:17:25 2003 PDT Device supports SMART and is Enabled Temperature Warning Enabled SMART Health Status: OK Current Drive Temperature: 29 C Error counter log: Errors Corrected Total Total Correction Gigabytes Total delay: [rereads/ errors algorithm processed uncorrected minor | major rewrites] corrected invocations [10^9 bytes] errors read: 0 0 3337 3337 0 0.000 0 write: 0 0 168 168 0 0.000 0 Non-medium error count: 3 Device does not support Self Test logging [chrismcc@taroon smartmontools-5.1-14]$ sudo ./smartd -d smartd version 5.1-14 Copyright (C) 2002-3 Bruce Allen Home page is http://smartmontools.sourceforge.net/ Using configuration file /etc/smartd.conf Device: /dev/sda, opened Device: /dev/sda, is SMART capable. Adding to "monitor" list. Segmentation fault
Created attachment 93254 [details] Instructions on debugging smartd This contains step-by-step instructions on how to download the latest smartd code from sourceforge, compile it for debugging, and run it under a debugger.
Created attachment 93255 [details] Instructions on debugging smartd This contains step-by-step instructions on how to download the latest smartd code from sourceforge, compile it for debugging, and run it under a debugger.
Christopher, Thanks for doing as I asked. I can't see what's wrong, though obviously something is. Instead of further guessing, I'd like you to run the code under a debugger to identify the exact line where something is going wrong. I'm attaching a text file with step-by-step instructions. I just went through each step and it took me < 5 minutes. You can cut-and-paste from my instructions. Cheers, Bruce
:) you might not like this smartd.conf: /dev/sda -d scsi /dev/sdb -d scsi /dev/sdc -d scsi /dev/sdd -d scsi [root@taroon smartmontools-5.1-14]# grep ^CFLAGS Makefile | tail -n 1 CFLAGS = -fsigned-char -Wall [root@taroon smartmontools-5.1-14]# make >/dev/null 2>&1 [root@taroon smartmontools-5.1-14]# ./smartd -d smartd version 5.1-14 Copyright (C) 2002-3 Bruce Allen Home page is http://smartmontools.sourceforge.net/ Using configuration file /etc/smartd.conf Device: /dev/sda, opened Device: /dev/sda, is SMART capable. Adding to "monitor" list. Device: /dev/sdb, opened Device: /dev/sdb, is SMART capable. Adding to "monitor" list. Device: /dev/sdc, opened Device: /dev/sdc, is SMART capable. Adding to "monitor" list. Device: /dev/sdd, opened Device: /dev/sdd, is SMART capable. Adding to "monitor" list. Started monitoring 0 ATA and 4 SCSI devices Device: /dev/sda, Acceptable asc,ascq: 0,0 Device: /dev/sda, Temperature changed -229 degrees to 26 degrees since last reading Device: /dev/sdb, Acceptable asc,ascq: 0,0 Device: /dev/sdc, Acceptable asc,ascq: 0,0 Device: /dev/sdc, Temperature changed -227 degrees to 28 degrees since last reading Device: /dev/sdd, Acceptable asc,ascq: 0,0 Device: /dev/sdd, Temperature changed -57 degrees to 28 degrees since last reading It worked !!!! [root@taroon smartmontools-5.1-14]# make clean rm -f *.o smartctl smartd *~ \#*\# smartmontools*.tar.gz smartmontools*.rpm temp.* smart*.8.gz smart*.5.gz [root@taroon smartmontools-5.1-14]# grep ^CFLAGS Makefile | tail -n 1 CFLAGS = -fsigned-char -Wall -O2 -g [root@taroon smartmontools-5.1-14]# make >/dev/null 2>&1 [root@taroon smartmontools-5.1-14]# ./smartd -d smartd version 5.1-14 Copyright (C) 2002-3 Bruce Allen Home page is http://smartmontools.sourceforge.net/ Using configuration file /etc/smartd.conf Device: /dev/sda, opened Device: /dev/sda, is SMART capable. Adding to "monitor" list. Segmentation fault Ouch! -O2 made it fail
[root@taroon smartmontools-5.1-14]# gdb ./smartd GNU gdb Red Hat Linux (5.3.90-0.20030710.3rh) Copyright 2003 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu"... (gdb) run -d Starting program: /home/chrismcc/redhat/BUILD/smartmontools-5.1-14/smartd -d smartd version 5.1-14 Copyright (C) 2002-3 Bruce Allen Home page is http://smartmontools.sourceforge.net/ Using configuration file /etc/smartd.conf Device: /dev/sda, opened Device: /dev/sda, is SMART capable. Adding to "monitor" list. Program received signal SIGSEGV, Segmentation fault. scsidevicescan (devices=0x7fbffff440, cfg=0xf) at smartd.c:854 854 cfg->tryata = 0; (gdb) where #0 scsidevicescan (devices=0x7fbffff440, cfg=0xf) at smartd.c:854 #1 0x00000000004057af in main (argc=0, argv=0x0) at smartd.c:2187 (gdb) list 2187 if (scsidevicescan(scsidevicesptr+numscsidevices, config+i)) 2188 cantregister(config[i].name, "SCSI", config[i].lineno, scandirective); 2189 else 2190 notregistered=0; 2191 } 2192 2193 // if device is explictly listed and we can't register it, then exit unless 2194 // the user has specified that the device is removable 2195 if (notregistered && !scanning){ 2196 if (config[i].removable) (gdb) up #1 0x00000000004057af in main (argc=0, argv=0x0) at smartd.c:2187 2187 if (scsidevicescan(scsidevicesptr+numscsidevices, config+i)) (gdb) list 2182 notregistered=0; 2183 } 2184 2185 // then register SCSI devices 2186 if (config[i].tryscsi){ 2187 if (scsidevicescan(scsidevicesptr+numscsidevices, config+i)) 2188 cantregister(config[i].name, "SCSI", config[i].lineno, scandirective); 2189 else 2190 notregistered=0; 2191 } (gdb) list 2192 2193 // if device is explictly listed and we can't register it, then exit unless 2194 // the user has specified that the device is removable 2195 if (notregistered && !scanning){ 2196 if (config[i].removable) 2197 printout(LOG_INFO, "Device %s not available\n", config[i].name); 2198 else { 2199 printout(LOG_CRIT, "Unable to register device %s - exiting.\n", config[i].name); 2200 exit(EXIT_BADDEV); 2201 } (gdb) list 2202 } 2203 } // done registering entries 2204 2205 // If there are no devices to monitor, then exit 2206 if (!numatadevices && !numscsidevices){ 2207 printout(LOG_INFO,"Unable to monitor any SMART enabled ATA or SCSI devices.\n"); 2208 exit(EXIT_BADDEV); 2209 } 2210 2211 // Now start an infinite loop that checks all devices (gdb) up Initial frame selected; you cannot go up. (gdb)
[root@taroon smartmontools-5.1-14]# make clean rm -f *.o smartctl smartd *~ \#*\# smartmontools*.tar.gz smartmontools*.rpm temp.* smart*.8.gz smart*.5.gz [root@taroon smartmontools-5.1-14]# grep ^CFLAGS Makefile | tail -n 1 CFLAGS = -fsigned-char -Wall -O2 -g [root@taroon smartmontools-5.1-14]# make gcc -fsigned-char -Wall -O2 -g -DHAVE_GETOPT_H -DHAVE_GETOPT_LONG -c atacmds.c atacmds.c: In function `ataVersionInfo': atacmds.c:703: warning: int format, different type arg (arg 2) atacmds.c:710: warning: int format, different type arg (arg 2) gcc -fsigned-char -Wall -O2 -g -DHAVE_GETOPT_H -DHAVE_GETOPT_LONG -c ataprint.cgcc -fsigned-char -Wall -O2 -g -DHAVE_GETOPT_H -DHAVE_GETOPT_LONG -c scsicmds.cscsicmds.c: In function `linux_do_scsi_cmnd_io': scsicmds.c:199: warning: int format, different type arg (arg 2) scsicmds.c:245: warning: int format, different type arg (arg 2) scsicmds.c:261: warning: int format, different type arg (arg 2) gcc -fsigned-char -Wall -O2 -g -DHAVE_GETOPT_H -DHAVE_GETOPT_LONG -c utility.c gcc -fsigned-char -Wall -O2 -g -DHAVE_GETOPT_H -DHAVE_GETOPT_LONG -c knowndrives.c gcc -DSMARTMONTOOLS_VERSION=14 -o smartd -fsigned-char -Wall -O2 -g -DHAVE_GETOPT_H -DHAVE_GETOPT_LONG smartd.c \ atacmds.o ataprint.o knowndrives.o scsicmds.o utility.o gcc -fsigned-char -Wall -O2 -g -DHAVE_GETOPT_H -DHAVE_GETOPT_LONG -c scsiprint.c gcc -DSMARTMONTOOLS_VERSION=14 -o smartctl -fsigned-char -Wall -O2 -g -DHAVE_GETOPT_H -DHAVE_GETOPT_LONG smartctl.c \ atacmds.o ataprint.o knowndrives.o scsicmds.o scsiprint.o utility.o Smartd can now use a configuration file /etc/smartd.conf. Do: man ./smartctl.8 man ./smartd.8 man ./smartd.conf.5 to read the manual pages now. Unless you do a "make install" the manual pages won't be installed. [root@taroon smartmontools-5.1-14]#
Hi Christopher, Thank you very much for doing this. It's very interesting. I think that this might be because you are using 5.1-14, and not the current version of the code from CVS. In particular, I think that the compilation warning messages no longer appear because of some explicit typecasts in the code. This was fixed about three weeks ago. This might be what breaks -O2. Could you please repeat your experiment, but this time after downloading the code from CVS as explained in the attachment file: "Instructions on debugging smartd"? [You used a .tar.gz file containing the 5.1-14 release, not the latest CVS code. You'll know when you have the right one because it will identify itself as release 5.1-15] Cheers, Bruce
Yep, I skipped the cvs part. I had the older code and could whip it out real quick before I erased the whole system and reinstalled. that now done. [chrismcc@taroon64 sm5]$ grep ^CFLAGS Makefile | tail -n 1 CFLAGS = -fsigned-char -Wall -g [chrismcc@taroon64 sm5]$ make clean all rm -f *.o smartctl smartd *~ \#*\# smartmontools*.tar.gz smartmontools*.rpm temp.* smart*.8.gz smart*.5.gz gcc -fsigned-char -Wall -g -DHAVE_GETOPT_H -DHAVE_GETOPT_LONG -c atacmdnames.c gcc -fsigned-char -Wall -g -DHAVE_GETOPT_H -DHAVE_GETOPT_LONG -c atacmds.c gcc -fsigned-char -Wall -g -DHAVE_GETOPT_H -DHAVE_GETOPT_LONG -c ataprint.c gcc -fsigned-char -Wall -g -DHAVE_GETOPT_H -DHAVE_GETOPT_LONG -c scsicmds.c gcc -fsigned-char -Wall -g -DHAVE_GETOPT_H -DHAVE_GETOPT_LONG -c utility.c gcc -fsigned-char -Wall -g -DHAVE_GETOPT_H -DHAVE_GETOPT_LONG -c knowndrives.c gcc -DSMARTMONTOOLS_VERSION=15 -o smartd -fsigned-char -Wall -g -DHAVE_GETOPT_H -DHAVE_GETOPT_LONG smartd.c \ atacmdnames.o atacmds.o ataprint.o knowndrives.o scsicmds.o utility.o gcc -fsigned-char -Wall -g -DHAVE_GETOPT_H -DHAVE_GETOPT_LONG -c scsiprint.cgcc -DSMARTMONTOOLS_VERSION=15 -o smartctl -fsigned-char -Wall -g -DHAVE_GETOPT_H -DHAVE_GETOPT_LONG smartctl.c \ atacmdnames.o atacmds.o ataprint.o knowndrives.o scsicmds.o scsiprint.o utility.o Smartd can now use a configuration file /etc/smartd.conf. Do: man ./smartctl.8 man ./smartd.8 man ./smartd.conf.5 to read the manual pages now. Unless you do a "make install" the manual pages won't be installed. [chrismcc@taroon64 sm5]$ sudo ./smartd -d smartd version 5.1-15 Copyright (C) 2002-3 Bruce Allen Home page is http://smartmontools.sourceforge.net/ Using configuration file /etc/smartd.conf Device: /dev/sda, opened Device: /dev/sda, is SMART capable. Adding to "monitor" list. Device: /dev/sdb, opened Device: /dev/sdb, is SMART capable. Adding to "monitor" list. Device: /dev/sdc, opened Device: /dev/sdc, is SMART capable. Adding to "monitor" list. Device: /dev/sdd, opened Device: /dev/sdd, is SMART capable. Adding to "monitor" list. Started monitoring 0 ATA and 4 SCSI devices Device: /dev/sda, Acceptable asc,ascq: 0,0 Device: /dev/sda, Temperature changed -229 degrees to 26 degrees since last reading Device: /dev/sdb, Acceptable asc,ascq: 0,0 Device: /dev/sdc, Acceptable asc,ascq: 0,0 Device: /dev/sdc, Temperature changed -226 degrees to 29 degrees since last reading Device: /dev/sdd, Acceptable asc,ascq: 0,0 Device: /dev/sdd, Temperature changed -56 degrees to 29 degrees since last reading [chrismcc@taroon64 sm5]$ grep ^CFLAGS Makefile | tail -n 1 CFLAGS = -fsigned-char -Wall -O2 -g [chrismcc@taroon64 sm5]$ make clean all rm -f *.o smartctl smartd *~ \#*\# smartmontools*.tar.gz smartmontools*.rpm temp.* smart*.8.gz smart*.5.gz gcc -fsigned-char -Wall -O2 -g -DHAVE_GETOPT_H -DHAVE_GETOPT_LONG -c atacmdnames.c gcc -fsigned-char -Wall -O2 -g -DHAVE_GETOPT_H -DHAVE_GETOPT_LONG -c atacmds.c gcc -fsigned-char -Wall -O2 -g -DHAVE_GETOPT_H -DHAVE_GETOPT_LONG -c ataprint.c gcc -fsigned-char -Wall -O2 -g -DHAVE_GETOPT_H -DHAVE_GETOPT_LONG -c scsicmds.c gcc -fsigned-char -Wall -O2 -g -DHAVE_GETOPT_H -DHAVE_GETOPT_LONG -c utility.c gcc -fsigned-char -Wall -O2 -g -DHAVE_GETOPT_H -DHAVE_GETOPT_LONG -c knowndrives.c gcc -DSMARTMONTOOLS_VERSION=15 -o smartd -fsigned-char -Wall -O2 -g -DHAVE_GETOPT_H -DHAVE_GETOPT_LONG smartd.c \ atacmdnames.o atacmds.o ataprint.o knowndrives.o scsicmds.o utility.o gcc -fsigned-char -Wall -O2 -g -DHAVE_GETOPT_H -DHAVE_GETOPT_LONG -c scsiprint.cgcc -DSMARTMONTOOLS_VERSION=15 -o smartctl -fsigned-char -Wall -O2 -g -DHAVE_GETOPT_H -DHAVE_GETOPT_LONG smartctl.c \ atacmdnames.o atacmds.o ataprint.o knowndrives.o scsicmds.o scsiprint.o utility.o Smartd can now use a configuration file /etc/smartd.conf. Do: man ./smartctl.8 man ./smartd.8 man ./smartd.conf.5 to read the manual pages now. Unless you do a "make install" the manual pages won't be installed. [root@taroon64 sm5]# ./smartd -d smartd version 5.1-15 Copyright (C) 2002-3 Bruce Allen Home page is http://smartmontools.sourceforge.net/ Using configuration file /etc/smartd.conf Device: /dev/sda, opened Device: /dev/sda, is SMART capable. Adding to "monitor" list. Segmentation fault [root@taroon64 sm5]# gdb ./smartd GNU gdb Red Hat Linux (5.3.90-0.20030710.6rh) Copyright 2003 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu"... (gdb) run -d Starting program: /home/chrismcc/redhat/BUILD/sm5/smartd -d smartd version 5.1-15 Copyright (C) 2002-3 Bruce Allen Home page is http://smartmontools.sourceforge.net/ Using configuration file /etc/smartd.conf Device: /dev/sda, opened Device: /dev/sda, is SMART capable. Adding to "monitor" list. Program received signal SIGSEGV, Segmentation fault. scsidevicescan (devices=0x7fbffff500, cfg=0xf) at smartd.c:861 861 cfg->tryata = 0; (gdb) where #0 scsidevicescan (devices=0x7fbffff500, cfg=0xf) at smartd.c:861 #1 0x000000000040582f in main (argc=0, argv=0x0) at smartd.c:2201 (gdb) list 2201 if (scsidevicescan(scsidevicesptr+numscsidevices, config+i)) 2202 cantregister(config[i].name, "SCSI", config[i].lineno, scandirective); 2203 else 2204 notregistered=0; 2205 } 2206 2207 // if device is explictly listed and we can't register it, then exit unless 2208 // the user has specified that the device is removable 2209 if (notregistered && !scanning){ 2210 if (config[i].removable) (gdb) up #1 0x000000000040582f in main (argc=0, argv=0x0) at smartd.c:2201 2201 if (scsidevicescan(scsidevicesptr+numscsidevices, config+i)) (gdb) list 2196 notregistered=0; 2197 } 2198 2199 // then register SCSI devices 2200 if (config[i].tryscsi){ 2201 if (scsidevicescan(scsidevicesptr+numscsidevices, config+i)) 2202 cantregister(config[i].name, "SCSI", config[i].lineno, scandirective); 2203 else 2204 notregistered=0; 2205 } (gdb) up Initial frame selected; you cannot go up. (gdb)
However it DOES WORK without -O2 in CFLAGS after adding a known bad drive: [root@taroon64 sm5]# ./smartd -d smartd version 5.1-15 Copyright (C) 2002-3 Bruce Allen Home page is http://smartmontools.sourceforge.net/ Using configuration file /etc/smartd.conf Device: /dev/sda, opened Device: /dev/sda, is SMART capable. Adding to "monitor" list. Device: /dev/sdb, opened Device: /dev/sdb, is SMART capable. Adding to "monitor" list. Device: /dev/sdc, opened Device: /dev/sdc, is SMART capable. Adding to "monitor" list. Device: /dev/sdd, opened Device: /dev/sdd, is SMART capable. Adding to "monitor" list. Device: /dev/sde, opened Device: /dev/sde, is SMART capable. Adding to "monitor" list. Started monitoring 0 ATA and 5 SCSI devices Device: /dev/sda, Acceptable asc,ascq: 0,0 Device: /dev/sda, Temperature changed -228 degrees to 27 degrees since last reading Device: /dev/sdb, Acceptable asc,ascq: 0,0 Device: /dev/sdc, Acceptable asc,ascq: 0,0 Device: /dev/sdc, Temperature changed -226 degrees to 29 degrees since last reading Device: /dev/sdd, Acceptable asc,ascq: 0,0 Device: /dev/sdd, Temperature changed -56 degrees to 29 degrees since last reading Device: /dev/sde, SMART Failure: SERVO IMPENDING FAILURE SEEK ERROR RATE TOO HIGH Device: /dev/sde, Temperature changed -80 degrees to 22 degrees since last reading YEA!
Christopher, Thanks a lot for sticking with this. I've gotten access to an ia64 system, but unfortunately I can't reproduce the problem there. If you have the patience to stick with this, we'll have to continue "remote debugging". While this may be a gcc error, I've found too many (of my own) coding mistakes in porting code to be ready to write to gcc-bugs quite yet. Here are the next things to try: (1) "For the record" please try -O1 -g. If that SEGVs then continue with that executable, else with -O2. (2) Let's check the address 0xf by hand: within gcc, use the "up" and "down" commands until you are on line 2201 in main(). [From this point on, I need a transcript] Then do: print &config[0] print config[0] print sizeof(config[0]) print &config[1] print config[0].tryata print &(config[0].tryata) Then use "down" until you are around line 816 in scsidevicescan(). Now do: print cfg print *cfg print &(cfg->tryata) print cfg->tryata That's it! Cheers, Bruce
-O1 segfaulted also Program received signal SIGSEGV, Segmentation fault. scsidevicescan (devices=0x7fbffff500, cfg=0xf) at smartd.c:861 861 cfg->tryata = 0; (gdb) down Bottom (i.e., innermost) frame selected; you cannot go down. (gdb) up #1 0x0000000000405843 in main (argc=12, argv=0xfc0) at smartd.c:2201 2201 if (scsidevicescan(scsidevicesptr+numscsidevices, config+i)) (gdb) print &config[0] $1 = (cfgfile *) 0x5207a0 (gdb) print config[0] $2 = {lineno = 19, scsidevicenum = 0, atadevicenum = 0, tryata = 1 '\001', tryscsi = 0 '\0', name = 0x5267f0 "/dev/hda", smartcheck = 1 '\001', usagefailed = 1 '\001', prefail = 1 '\001', usage = 1 '\001', selftest = 1 '\001', errorlog = 1 '\001', permissive = 0 '\0', autosave = 0 '\0', autoofflinetest = 0 '\0', maildata = {{logged = 0, lastsent = 0, firstsent = 0}, {logged = 0, lastsent = 0, firstsent = 0}, {logged = 0, lastsent = 0, firstsent = 0}, {logged = 0, lastsent = 0, firstsent = 0}, {logged = 0, lastsent = 0, firstsent = 0}, {logged = 0, lastsent = 0, firstsent = 0}, {logged = 0, lastsent = 0, firstsent = 0}, {logged = 0, lastsent = 0, firstsent = 0}, {logged = 0, lastsent = 0, firstsent = 0}, {logged = 0, lastsent = 0, firstsent = 0}}, emailfreq = 1 '\001', emailtest = 0 '\0', emailcmdline = 0x0, address = 0x5269e0 "root", selflogcount = 0 '\0', ataerrorcount = 0, monitorattflags = 0x526840 "", attributedefs = 0x5268d0 "", fixfirmwarebug = 0 '\0', ignorepresets = 0 '\0', showpresets = 0 '\0', removable = 0 '\0'} (gdb) print sizeof(config[0]) $3 = 336 (gdb) print &config[1] $4 = (cfgfile *) 0x5208f0 (gdb) print config[0].tryata $5 = 1 '\001' (gdb) print &(config[0].tryata) $6 = 0x5207ac "\001" (gdb) down #0 scsidevicescan (devices=0x7fbffff500, cfg=0xf) at smartd.c:861 861 cfg->tryata = 0; (gdb) print cfg $7 = (cfgfile *) 0xf (gdb) print *cfg Cannot access memory at address 0xf (gdb) print &(cfg->tryata) $8 = 0x1b <Address 0x1b out of bounds> (gdb) print cfg->tryata Cannot access memory at address 0x1b yes?
> I've gotten access to an ia64 system, but > unfortunately I can't reproduce the problem there. My install is taroon x86_86 with "everything" using all scsi drives the system is from appro using a tyan S2880 Thunder K8S mainboard the scsi controller is: Fusion MPT base driver 2.05.05+ Copyright (c) 1999-2002 LSI Logic Corporation mptbase: Initiating ioc0 bringup 02:0a.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 07) 02:0a.1 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 07)
Hi Christopher, OK, looks like the code is tickling a GCC bug. Let's try and isolate the problem by making what we want slightly more explicit for the compiler. (1) in the directory sm5/, save a copy of smartd.c: cp smartd.c smartd.c.original (2) make the following 3 changes: static int scsidevicescan(scsidevices_t *devices, cfgfile *cfg) -> int scsidevicescan(scsidevices_t *devices, cfgfile *cfg) if (scsidevicescan(scsidevicesptr+numscsidevices, config+i)) -> if (scsidevicescan(scsidevicesptr+numscsidevices, &(config[i]))) if (atadevicescan2(atadevicesptr+numatadevices, config+i)) -> if (atadevicescan2(atadevicesptr+numatadevices, &(config[i]))) Then do "make clean", then "make" and try again. [Note - it looks like you are now using /etc/smartd.conf containing DEVICESCAN -m email@domain it would simplify the debugging somewhat if you could simply use the same thing you originally had, which I think was /dev/sda /dev/sdb /dev/sdc /dev/sdd and nothing else. Cheers, Bruce
Christopher, An afterthought -- please try what I said, but I am not sure it will work. In fact it probably won't. But I think I have an idea what's happening. To confirm, please do this: BEFORE you type "run -d" at the (gdb) prompt, type break scsidevicescan When you type "run -d" the program will stop just inside scsidevicescan. Now, alternately, give the commands: print cfg print cfg->tryata n [Note that n is shorthand for the command "next"] Continue this process until the code segvs. You may have to do this more than a dozen times. Note that within gdb, the up cursor arrow can be used to return to previous lines to repeat them, which makes this easier. I think that the address cfg is getting over-written, probably in the loop starting just below the comment reading: // Flag that certain log pages are supported (information may be // available from other sources). As usual, I'll need a transcript of the gdb session. Cheers, Bruce
Chris, I still don't know if what you saw was a bug in smartd or a compiler bug. After looking through the code for possible problems, I had some suspicions that this could be a double-free() error corrupting the heap -- though I am not sure and can not reproduce it on any of the systems I have tried. Nevertheless, because I was worried about this, I did some substantial code restructuring and am confident that the code that's now in CVS should be free of any type of memory allocation bug. You (or other users who encounter this problem and see this report) should check out the development code from CVS and report if it has problems with -O2. Instructions on CVS checkout are the first four steps in the previous attached Instructions -- step 3 can be skipped. Cheers, Bruce
We just ran into this segfault with FC2 Test3 on x86-64 running kernel-utils-2.4-9.1.127, which contains smartmontools-5.21. smartmontools-5.30 seems to fix this issue. Any chance we can upgrade this for FC2 final? How much risk would you judge an upgrade of 5.21 to 5.30 have?
My advice to the FC maintainers is to upgrade the current smartmontools (release 5.21) to the current 5.30 stable release. The 5.30 release has been out there for about six weeks with thousands of direct downloads and many times that in indirect downloads. Reviewing the changelog from release 5.21 to 5.30 there are many bugs fixed. Note that there IS one reported/fixed bug in the 5.30 release that can cause a segv crash in smartd. The FC maintainers might consider releasing a patched 5.30 release that fixes this. The fix can be seen by looking at the diffs between versions 1.302 and 1.303 of smartd.c, in CVS here: http://cvs.sourceforge.net/viewcvs.py/smartmontools/sm5/smartd.c?r1=1.302&r2=1.303 It's a one-line patch. Cheers, Bruce Allen
This has been updated.