Bug 848778 - [abrt] mdadm-3.2.5-4.fc17: fprintf: Process /usr/sbin/mdadm was killed by signal 11 (SIGSEGV)
Summary: [abrt] mdadm-3.2.5-4.fc17: fprintf: Process /usr/sbin/mdadm was killed by sig...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: mdadm
Version: 17
Hardware: i686
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Jes Sorensen
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: abrt_hash:ab7e782d11ee456c9ea0941b12c...
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-08-16 11:49 UTC by Joseph Fraser
Modified: 2012-11-01 14:51 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-11-01 14:51:04 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
File: core_backtrace (358 bytes, text/plain)
2012-08-16 11:49 UTC, Joseph Fraser
no flags Details
File: environ (1.53 KB, text/plain)
2012-08-16 11:49 UTC, Joseph Fraser
no flags Details
File: backtrace (29.62 KB, text/plain)
2012-08-16 11:49 UTC, Joseph Fraser
no flags Details
File: limits (1.29 KB, text/plain)
2012-08-16 11:49 UTC, Joseph Fraser
no flags Details
File: cgroup (158 bytes, text/plain)
2012-08-16 11:49 UTC, Joseph Fraser
no flags Details
File: maps (883 bytes, text/plain)
2012-08-16 11:49 UTC, Joseph Fraser
no flags Details
File: dso_list (214 bytes, text/plain)
2012-08-16 11:49 UTC, Joseph Fraser
no flags Details
File: var_log_messages (1.26 KB, text/plain)
2012-08-16 11:49 UTC, Joseph Fraser
no flags Details
File: open_fds (135 bytes, text/plain)
2012-08-16 11:49 UTC, Joseph Fraser
no flags Details

Description Joseph Fraser 2012-08-16 11:49:31 UTC
Description of problem:
one of my hard drives reported this error 
Aug 16 21:36:48 quadboy kernel: [128464.487589] sd 9:0:1:0: [sdl] Unhandled sense code
Aug 16 21:36:48 quadboy kernel: [128464.487592] sd 9:0:1:0: [sdl]
Aug 16 21:36:48 quadboy kernel: [128464.487595] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Aug 16 21:36:48 quadboy kernel: [128464.487597] sd 9:0:1:0: [sdl]
Aug 16 21:36:48 quadboy kernel: [128464.487598] Sense Key : Hardware Error [current]
Aug 16 21:36:48 quadboy kernel: [128464.487601] sd 9:0:1:0: [sdl]
Aug 16 21:36:48 quadboy kernel: [128464.487604] Add. Sense: Internal target failure
Aug 16 21:36:48 quadboy kernel: [128464.487607] sd 9:0:1:0: [sdl] CDB:
Aug 16 21:36:48 quadboy kernel: [128464.487608] Read(10): 28 00 00 00 00 00 00 00 08 00
Aug 16 21:36:48 quadboy kernel: [128464.487615] end_request: critical target error, dev sdl, sector 0
Aug 16 21:36:48 quadboy kernel: [128464.488073] sd 9:0:1:0: [sdl] Unhandled sense code
Aug 16 21:36:48 quadboy kernel: [128464.488075] sd 9:0:1:0: [sdl]
Aug 16 21:36:48 quadboy kernel: [128464.488077] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
i have no idea if the drive is faulty as it seems to be random
i removed the drive from mdadm and removed from pc and re-added and tried to re-add to the raid4 it then spewed everywhere.


Version-Release number of selected component:
mdadm-3.2.5-4.fc17

Additional info:
libreport version: 2.0.12
abrt_version:   2.0.11
backtrace_rating: 4
cmdline:        mdadm --manage /dev/md0 --add /dev/sdl
crash_function: fprintf
kernel:         3.5.1-1.fc17.i686.PAE

truncated backtrace:
:Thread no. 1 (3 frames)
: #0 fprintf at /usr/include/bits/stdio2.h
: #1 write_init_super1 at super1.c
: #2 Manage_subdevs at Manage.c

Comment 1 Joseph Fraser 2012-08-16 11:49:35 UTC
Created attachment 604906 [details]
File: core_backtrace

Comment 2 Joseph Fraser 2012-08-16 11:49:37 UTC
Created attachment 604907 [details]
File: environ

Comment 3 Joseph Fraser 2012-08-16 11:49:40 UTC
Created attachment 604908 [details]
File: backtrace

Comment 4 Joseph Fraser 2012-08-16 11:49:42 UTC
Created attachment 604909 [details]
File: limits

Comment 5 Joseph Fraser 2012-08-16 11:49:44 UTC
Created attachment 604910 [details]
File: cgroup

Comment 6 Joseph Fraser 2012-08-16 11:49:47 UTC
Created attachment 604911 [details]
File: maps

Comment 7 Joseph Fraser 2012-08-16 11:49:49 UTC
Created attachment 604912 [details]
File: dso_list

Comment 8 Joseph Fraser 2012-08-16 11:49:52 UTC
Created attachment 604913 [details]
File: var_log_messages

Comment 9 Joseph Fraser 2012-08-16 11:49:54 UTC
Created attachment 604914 [details]
File: open_fds

Comment 10 Jes Sorensen 2012-10-09 12:51:12 UTC
Joseph,

The dmesg output you posted strongly suggests that the drive itself is faulty.
Does this happen only with this drive or also if you use other drives in the
system?

Any chance you can provide the output of /proc/mdstat? I know it's been a
while since you reported this, so you may not have it anymore.

Cheers,
Jes

Comment 11 Joseph Fraser 2012-10-09 20:31:13 UTC
md1 : active raid5 sdd[0] sdh[4] sdg[3] sdf[2] sde[5]
      7813529088 blocks super 1.2 level 5, 128k chunk, algorithm 2 [5/5] [UUUUU]


it dont seem to matter what drive was used at the time i did fix the problem by moveing that one drive off the raid card and putting it on a spair port on the motherboard i do kinda expect it faulty the drive or card not 100% as it seem to be every drive i used it get to %95-99% resynced and then spit the dummy and drop that 1 drive with errors above

the raid card i'm using aptaptec 2600SA

Comment 12 Jes Sorensen 2012-10-10 08:41:24 UTC
Hi Joseph,

Interesting, it sounds like mdadm crashed because it was trying to write a
superblock to the defective drive. I need to check whether that has been
fixed, but I am glad you found a solution.

It sounds like you do have a bad drive at hand. If I understand you right,
it is the same drive that always fails? In that case I would recommend
replacing it.

Maybe start by checking that you are not suffering from bad cables, especially
if you used the same cable to connect to the motherboard port.

Cheers,
Jes

Comment 13 Joseph Fraser 2012-10-10 08:53:18 UTC
Hi Jes,

yes it been fixed but i tried 2 other drives in its replacement before i put it on the onboard sata port and that drive been going fine since i reported it. smart status says drive was ok (not i rely on it) 

i all so swap leads on sata and power but the same cable that was on raid card is on the same drive on the onboard drive.
i did all so scan(read/write) the disk with Hdat2 from end to end with no errors on that drive i then readded then it rebuild get to about 95-99%(or complete) and then drop the drive from the raid

but i think the problem is gone as i said before i'm unsure why it was doing what it was doing i did think the drive was faulty hance why i did some testing on it it passed everytime even doing a read/write/verify scan passed as well

Comment 14 Jes Sorensen 2012-11-01 14:51:04 UTC
Had a look at this a bit further. The crash happens in write_init_super1(),
and I believe this was fixed in the following upstream commit:

commit 4687f160276a8f7815675ca758c598d881f04fd7
Author: majianpeng <majianpeng>
Date:   Tue May 29 09:21:51 2012 +1000

    mdadm: Fix Segmentation fault.
    
    In function write_init_super1():
    If "rv = store_super1(st, di->fd)" return error and the di is the last.
    Then the di = NULL && rv > 0, so exec:
    if (rv)
        fprintf(stderr, Name ": Failed to write metadata to%s\n",
                 di->devname);
    will be segmentation fault.
    
    Signed-off-by: majianpeng <majianpeng>
    Signed-off-by: NeilBrown <neilb>

This patch is included in the mdadm-3.2.6 which was pushed into updates-testing
recently, as such I believe this bug has been fixed.

If you can reproduce this problem with 3.2.6-1 or later, please open a new
Bugzilla about it.

Thanks,
Jes


Note You need to log in before you can comment on or make changes to this bug.