Description of problem: Segfault on startup when trying to monitor SCSI hard drives on AMD64 with smartd. Version-Release number of selected component (if applicable): # rpm -q kernel-utils kernel-utils-2.4-8.37.3 How reproducible: Every time. Steps to Reproduce: 1. Set up an AMD64 machine with SCSI hard drives 2. Add "/dev/sda -a -m root.redhat.com" to /etc/smartd.conf 3. launch "smartd -d", and see the crash. The crash only occurs when smartd is compiled with -O2, and -O1, not when compiled with -O0. Backtrace with -O1: scsidevicescan (devices=0x7fbffff3f0, cfg=0x0) at smartd.c:852 852 smartd.c: No such file or directory. in smartd.c (gdb) bt #0 scsidevicescan (devices=0x7fbffff3f0, cfg=0x0) at smartd.c:852 #1 0x0000000000405772 in main (argc=0, argv=0x0) at smartd.c:2175 It crashes as soon as it tries to access cfg->, as cfg is NULL, but adding a 'printf ("%p\n", cfg);' says that cfg is not NULL... Breakpoint Backtrace with -O0, as it doesn't crash with -O0: Breakpoint 1, scsidevicescan (devices=0x7fbffff400, cfg=0x51cfa0) at smartd.c:758 758 smartd.c: No such file or directory. in smartd.c (gdb) bt #0 scsidevicescan (devices=0x7fbffff400, cfg=0x51cfa0) at smartd.c:758 #1 0x00000000004067e4 in main (argc=2, argv=0x7fbffff908) at smartd.c:2183 I tried debugging with ElectricFence, both under- and overfencing, and there don't seem to be any memory corruption issues. I still don't understand why "cfg" would show up as NULL when compiled with -O1, and non-NULL, with -O0, under gdb. (Bear in mind that those sources were modified a bit when trying to debug the problem, the crash actually occurs when accessing cfg->: // record number of device, type of device, increment device count cfg->tryata = 0; I think it could be a compilation/optimisation problem.
There seems to be some corruption in _testunitready() in scsicmds.c. Commenting the whole function out removes the crash. Still debugging.
Created attachment 100584 [details] smartd-amd64-crash.patch This patch fixes the crashes on startup on AMD64.
Bastien, Thanks very much for your patch. This was fixed in smartmontools on November 19, 2003. If file scsicmds.c is version 1.65 or greater then it incorporates this fix. Please see: http://cvs.sourceforge.net/viewcvs.py/smartmontools/sm5/scsicmds.c?r1=1.64&r2=1.65 This is fixed in all smartmontools releases >= 5.26 My current suggestion is that the RH/fedora upgrade to version 5.30 of smartmontools with a one-line patch to fix one other segv. Bruce
This problem is resolved in the next release of Red Hat Enterprise Linux. Red Hat does not currently plan to provide a resolution for this in a Red Hat Enterprise Linux update for currently deployed systems. With the goal of minimizing risk of change for deployed systems, and in response to customer and partner requirements, Red Hat takes a conservative approach when evaluating changes for inclusion in maintenance updates for currently deployed products. The primary objectives of update releases are to enable new hardware platform support and to resolve critical defects.
*** Bug 143553 has been marked as a duplicate of this bug. ***