Description of problem: one of my hard drives reported this error Aug 16 21:36:48 quadboy kernel: [128464.487589] sd 9:0:1:0: [sdl] Unhandled sense code Aug 16 21:36:48 quadboy kernel: [128464.487592] sd 9:0:1:0: [sdl] Aug 16 21:36:48 quadboy kernel: [128464.487595] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Aug 16 21:36:48 quadboy kernel: [128464.487597] sd 9:0:1:0: [sdl] Aug 16 21:36:48 quadboy kernel: [128464.487598] Sense Key : Hardware Error [current] Aug 16 21:36:48 quadboy kernel: [128464.487601] sd 9:0:1:0: [sdl] Aug 16 21:36:48 quadboy kernel: [128464.487604] Add. Sense: Internal target failure Aug 16 21:36:48 quadboy kernel: [128464.487607] sd 9:0:1:0: [sdl] CDB: Aug 16 21:36:48 quadboy kernel: [128464.487608] Read(10): 28 00 00 00 00 00 00 00 08 00 Aug 16 21:36:48 quadboy kernel: [128464.487615] end_request: critical target error, dev sdl, sector 0 Aug 16 21:36:48 quadboy kernel: [128464.488073] sd 9:0:1:0: [sdl] Unhandled sense code Aug 16 21:36:48 quadboy kernel: [128464.488075] sd 9:0:1:0: [sdl] Aug 16 21:36:48 quadboy kernel: [128464.488077] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE i have no idea if the drive is faulty as it seems to be random i removed the drive from mdadm and removed from pc and re-added and tried to re-add to the raid4 it then spewed everywhere. Version-Release number of selected component: mdadm-3.2.5-4.fc17 Additional info: libreport version: 2.0.12 abrt_version: 2.0.11 backtrace_rating: 4 cmdline: mdadm --manage /dev/md0 --add /dev/sdl crash_function: fprintf kernel: 3.5.1-1.fc17.i686.PAE truncated backtrace: :Thread no. 1 (3 frames) : #0 fprintf at /usr/include/bits/stdio2.h : #1 write_init_super1 at super1.c : #2 Manage_subdevs at Manage.c
Created attachment 604906 [details] File: core_backtrace
Created attachment 604907 [details] File: environ
Created attachment 604908 [details] File: backtrace
Created attachment 604909 [details] File: limits
Created attachment 604910 [details] File: cgroup
Created attachment 604911 [details] File: maps
Created attachment 604912 [details] File: dso_list
Created attachment 604913 [details] File: var_log_messages
Created attachment 604914 [details] File: open_fds
Joseph, The dmesg output you posted strongly suggests that the drive itself is faulty. Does this happen only with this drive or also if you use other drives in the system? Any chance you can provide the output of /proc/mdstat? I know it's been a while since you reported this, so you may not have it anymore. Cheers, Jes
md1 : active raid5 sdd[0] sdh[4] sdg[3] sdf[2] sde[5] 7813529088 blocks super 1.2 level 5, 128k chunk, algorithm 2 [5/5] [UUUUU] it dont seem to matter what drive was used at the time i did fix the problem by moveing that one drive off the raid card and putting it on a spair port on the motherboard i do kinda expect it faulty the drive or card not 100% as it seem to be every drive i used it get to %95-99% resynced and then spit the dummy and drop that 1 drive with errors above the raid card i'm using aptaptec 2600SA
Hi Joseph, Interesting, it sounds like mdadm crashed because it was trying to write a superblock to the defective drive. I need to check whether that has been fixed, but I am glad you found a solution. It sounds like you do have a bad drive at hand. If I understand you right, it is the same drive that always fails? In that case I would recommend replacing it. Maybe start by checking that you are not suffering from bad cables, especially if you used the same cable to connect to the motherboard port. Cheers, Jes
Hi Jes, yes it been fixed but i tried 2 other drives in its replacement before i put it on the onboard sata port and that drive been going fine since i reported it. smart status says drive was ok (not i rely on it) i all so swap leads on sata and power but the same cable that was on raid card is on the same drive on the onboard drive. i did all so scan(read/write) the disk with Hdat2 from end to end with no errors on that drive i then readded then it rebuild get to about 95-99%(or complete) and then drop the drive from the raid but i think the problem is gone as i said before i'm unsure why it was doing what it was doing i did think the drive was faulty hance why i did some testing on it it passed everytime even doing a read/write/verify scan passed as well
Had a look at this a bit further. The crash happens in write_init_super1(), and I believe this was fixed in the following upstream commit: commit 4687f160276a8f7815675ca758c598d881f04fd7 Author: majianpeng <majianpeng> Date: Tue May 29 09:21:51 2012 +1000 mdadm: Fix Segmentation fault. In function write_init_super1(): If "rv = store_super1(st, di->fd)" return error and the di is the last. Then the di = NULL && rv > 0, so exec: if (rv) fprintf(stderr, Name ": Failed to write metadata to%s\n", di->devname); will be segmentation fault. Signed-off-by: majianpeng <majianpeng> Signed-off-by: NeilBrown <neilb> This patch is included in the mdadm-3.2.6 which was pushed into updates-testing recently, as such I believe this bug has been fixed. If you can reproduce this problem with 3.2.6-1 or later, please open a new Bugzilla about it. Thanks, Jes