Bug 124412 - [PATCH] smartd crash when monitoring SCSI devices on AMD64
Summary: [PATCH] smartd crash when monitoring SCSI devices on AMD64
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel-utils
Version: 3.0
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Tomas Mraz
QA Contact:
URL:
Whiteboard:
: 143553 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2004-05-26 13:18 UTC by Bastien Nocera
Modified: 2007-11-30 22:07 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-11-23 08:42:39 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
smartd-amd64-crash.patch (504 bytes, patch)
2004-05-26 13:55 UTC, Bastien Nocera
no flags Details | Diff

Description Bastien Nocera 2004-05-26 13:18:10 UTC
Description of problem:
Segfault on startup when trying to monitor SCSI hard drives on AMD64
with smartd.

Version-Release number of selected component (if applicable):
# rpm -q kernel-utils
kernel-utils-2.4-8.37.3

How reproducible:
Every time.

Steps to Reproduce:
1. Set up an AMD64 machine with SCSI hard drives
2. Add "/dev/sda -a -m root.redhat.com" to
/etc/smartd.conf
3. launch "smartd -d", and see the crash.
  
The crash only occurs when smartd is compiled with -O2, and -O1, not
when compiled with -O0.

Backtrace with -O1:
scsidevicescan (devices=0x7fbffff3f0, cfg=0x0) at smartd.c:852
852     smartd.c: No such file or directory.
       in smartd.c
(gdb) bt
#0  scsidevicescan (devices=0x7fbffff3f0, cfg=0x0) at smartd.c:852
#1  0x0000000000405772 in main (argc=0, argv=0x0) at smartd.c:2175

It crashes as soon as it tries to access cfg->, as cfg is NULL, but
adding a 'printf ("%p\n", cfg);' says that cfg is not NULL...

Breakpoint Backtrace with -O0, as it doesn't crash with -O0:
Breakpoint 1, scsidevicescan (devices=0x7fbffff400, cfg=0x51cfa0)
   at smartd.c:758
758     smartd.c: No such file or directory.
       in smartd.c
(gdb) bt
#0  scsidevicescan (devices=0x7fbffff400, cfg=0x51cfa0) at smartd.c:758
#1  0x00000000004067e4 in main (argc=2, argv=0x7fbffff908) at
smartd.c:2183

I tried debugging with ElectricFence, both under- and overfencing, and
there don't seem to be any memory corruption issues.
I still don't understand why "cfg" would show up as NULL when compiled
with -O1, and non-NULL, with -O0, under gdb.

(Bear in mind that those sources were modified a bit when trying to
debug the problem, the crash actually occurs when accessing cfg->:
// record number of device, type of device, increment device count
cfg->tryata = 0;

I think it could be a compilation/optimisation problem.

Comment 2 Bastien Nocera 2004-05-26 13:38:33 UTC
There seems to be some corruption in _testunitready() in scsicmds.c.
Commenting the whole function out removes the crash.

Still debugging.

Comment 3 Bastien Nocera 2004-05-26 13:55:43 UTC
Created attachment 100584 [details]
smartd-amd64-crash.patch

This patch fixes the crashes on startup on AMD64.

Comment 4 Bruce Allen 2004-05-26 15:07:15 UTC
Bastien,

Thanks very much for your patch.  This was fixed in smartmontools
on November 19, 2003.  If file scsicmds.c is version 1.65 or
greater then it incorporates this fix.

Please see:
http://cvs.sourceforge.net/viewcvs.py/smartmontools/sm5/scsicmds.c?r1=1.64&r2=1.65

This is fixed in all smartmontools releases >= 5.26

My current suggestion is that the RH/fedora upgrade to version 5.30
of smartmontools with a one-line patch to fix one other segv.

Bruce

Comment 5 Tomas Mraz 2005-11-23 08:42:39 UTC
This problem is resolved in the next release of Red Hat Enterprise Linux. Red
Hat does not currently plan to provide a resolution for this in a Red Hat
Enterprise Linux update for currently deployed systems.

With the goal of minimizing risk of change for deployed systems, and in response
to customer and partner requirements, Red Hat takes a conservative approach when
evaluating changes for inclusion in maintenance updates for currently deployed
products. The primary objectives of update releases are to enable new hardware
platform support and to resolve critical defects. 


Comment 6 Tomas Mraz 2005-11-23 08:48:32 UTC
*** Bug 143553 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.