Description of problem: The SAN initiators use RAID1 mirrors to access SAN targets in two geographically isolated data-centers. We've been having issues with SAN targets periodically failing which present as I/O's never completing to the RAID1 driver. As has been discussed earlier on linux-raid this is an error scenario which the RAID1 driver cannot address short of a timer based solution which arguably involves a layering violation. As processes on the initiators attempt to do I/O they go into D state which eventually produces very high load levels. Our response to this is to restart the hung target. This generates a LIP in the affected zone which the HBA's can of course detect which in turn results in the underlying device being failed from the mirror. Version-Release number of selected component (if applicable): 2.6.18-53.1.13.el5 How reproducible: Always Actual results: The RAID1 driver detects the anomaly, kicks the device and indicates its continueing on one device but then throws an I/O error up through LVM and into the overlying filesystem. This causes the filesystem to go read-only but in some cases not before generating filesystem corruption which needs to be corrected by a full ext3 filesystem check. Expected results: The 2.6.22.x/2.6.23.x kernels see the same I/O stalls when the targets hang but they do handle the failure scenario correctly. The MD device picks up on the target reboot induced LIP, fails out the device and the machines go forward as they should. Additional info: URL of the whole post: http://www.issociate.de/board/post/478736/MD_sysfs/kobject_issues_in_SAN_environment..html
More details from GregKH: Here is an exerpt from logging at 'info' and 'notice' priority levels for the event: ------------------------------------------------------------ --------------- Feb 22 01:36:54 rsg1 kernel: qla2xxx 0000:03:09.0: scsi(3:0:1): Abort command issued -- 1 3818fa 2002. Feb 22 01:37:20 rsg1 hotplug: Event 871 requested remove for device: scsi_disk Feb 22 01:37:20 rsg1 hotplug: Event 870 requested remove for device: scsi_device Feb 22 01:37:20 rsg1 hotplug: Event 872 requested remove for device: block Feb 22 01:37:20 rsg1 hotplug: Event 873 requested remove for device: block Feb 22 01:37:20 rsg1 hotplug: Event 874 requested remove for device: scsi Feb 22 01:37:24 rsg1 kernel: qla2xxx 0000:03:09.0: scsi(3:0:1): Abort command issued -- 1 3818fb 2002. Feb 22 01:37:24 rsg1 kernel: scsi 3:0:0:1: scsi: Device offlined - not ready after error recovery Feb 22 01:37:24 rsg1 kernel: scsi 3:0:0:1: [sdd] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK Feb 22 01:39:24 rsg1 kernel: scsi scan: INQUIRY result too short (5), using 36 Feb 22 01:39:24 rsg1 kernel: scsi 3:0:0:1: Direct-Access FCTARGET lvm(58,6) 0.9 PQ: 0 ANSI: 3 Feb 22 01:39:24 rsg1 kernel: sd 3:0:0:1: [sde] 418775040 512-byte hardware sectors (214413 MB) Feb 22 01:39:24 rsg1 kernel: sd 3:0:0:1: [sde] Write Protect is off Feb 22 01:39:24 rsg1 kernel: sd 3:0:0:1: [sde] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA Feb 22 01:39:24 rsg1 hotplug: Event 876 requested add for device: scsi_disk Feb 22 01:39:24 rsg1 hotplug: Event 875 requested add for device: scsi Feb 22 01:39:24 rsg1 kernel: sde: sde1 Feb 22 01:39:24 rsg1 hotplug: Event 877 requested add for device: block Feb 22 01:39:24 rsg1 hotplug: Event 878 requested add for device: block Feb 22 01:39:24 rsg1 kernel: sd 3:0:0:1: [sde] Attached SCSI disk ------------------------------------------------------------ --------------- Which corresponds to the following messages at 'warn' priority: ------------------------------------------------------------ --------------- Feb 22 01:37:20 rsg1 kernel: rport-3:0-1: blocked FC remote port time out: removing target and saving binding Feb 22 01:37:24 rsg1 kernel: end_request: I/O error, dev sdd, sector 418774920 Feb 22 01:37:24 rsg1 kernel: md: super_written gets error=-5, uptodate=0 Feb 22 01:37:24 rsg1 kernel: raid1: Disk failure on sdd1, disabling device. Feb 22 01:37:24 rsg1 kernel: ^IOperation continuing on 1 devices Feb 22 01:37:24 rsg1 kernel: scsi 3:0:0:1: rejecting I/O to dead device Feb 22 01:37:24 rsg1 kernel: md: super_written gets error=-5, uptodate=0 Feb 22 01:37:24 rsg1 kernel: scsi 3:0:0:1: rejecting I/O to dead device Feb 22 01:37:24 rsg1 kernel: md: super_written gets error=-5, uptodate=0 Feb 22 01:37:24 rsg1 kernel: RAID1 conf printout: Feb 22 01:37:24 rsg1 kernel: --- wd:1 rd:2 Feb 22 01:37:24 rsg1 kernel: disk 0, wo:0, o:1, dev:sdc1 Feb 22 01:37:24 rsg1 kernel: disk 1, wo:1, o:0, dev:sdd1 Feb 22 01:37:24 rsg1 kernel: RAID1 conf printout: Feb 22 01:37:24 rsg1 kernel: --- wd:1 rd:2 Feb 22 01:37:24 rsg1 kernel: disk 0, wo:0, o:1, dev:sdc1 Feb 22 01:39:26 rsg1 kernel: kobject_add failed for 3:0:0:1 with -EEXIST, don't try to register things with the same name in the same directory. Feb 22 01:39:26 rsg1 kernel: [<c01bddde>] kobject_shadow_add+0x101/0x10a Feb 22 01:39:26 rsg1 kernel: [<c01edc5d>] device_add+0x7c/0x374 Feb 22 01:39:26 rsg1 kernel: [<c020cb47>] scsi_sysfs_add_sdev+0x27/0x148 Feb 22 01:39:26 rsg1 kernel: [<c020b09f>] scsi_add_lun+0x2b7/0x2cc Feb 22 01:39:26 rsg1 kernel: [<c020b228>] scsi_probe_and_add_lun+0x174/0x206 Feb 22 01:39:26 rsg1 kernel: [<c020b37d>] scsi_sequential_lun_scan+0xc3/0xda Feb 22 01:39:26 rsg1 kernel: [<c020b98e>] __scsi_scan_target+0xd1/0xe8 Feb 22 01:39:26 rsg1 kernel: [<c020ba4d>] scsi_scan_target+0xa8/0xc2 Feb 22 01:39:26 rsg1 kernel: [<c0213529>] fc_scsi_scan_rport+0x0/0x73 Feb 22 01:39:26 rsg1 kernel: [<c021357e>] fc_scsi_scan_rport+0x55/0x73 Feb 22 01:39:26 rsg1 kernel: [<c01227f9>] run_workqueue+0x77/0xf3 Feb 22 01:39:26 rsg1 kernel: [<c0122875>] worker_thread+0x0/0xb1 Feb 22 01:39:26 rsg1 kernel: [<c012291c>] worker_thread+0xa7/0xb1 Feb 22 01:39:26 rsg1 kernel: [<c01256e9>] autoremove_wake_function+0x0/0x33 Feb 22 01:39:26 rsg1 kernel: [<c01256e9>] autoremove_wake_function+0x0/0x33 Feb 22 01:39:26 rsg1 kernel: [<c0122875>] worker_thread+0x0/0xb1 Feb 22 01:39:26 rsg1 kernel: [<c0125307>] kthread+0x34/0x55 Feb 22 01:39:26 rsg1 kernel: [<c01252d3>] kthread+0x0/0x55 Feb 22 01:39:26 rsg1 kernel: [<c01030c7>] kernel_thread_helper+0x7/0x10 Feb 22 01:39:26 rsg1 kernel: ======================= ------------------------------------------------------------ --------------- As can be seen from the first set of logs the /dev/sdd device is what gets kicked out of the RAID1 device. The SAN target coming back on-line triggers an event which results in the same device being re-discovered as /dev/sde. My assumption is device discovery is forced as the 'next' SCSI device due to the fact the RAID1 MD device is still holding a reference to the failed /dev/sdd device, thus preventing its re-use. Thats at least how I interpret the: Feb 22 01:39:26 rsg1 kernel: kobject_add failed for 3:0:0:1 with -EEXIST, don't try to register things with the same name in the same directory. And the resultant stack trace. The interaction of this with udev tends to complicate the situation from a systems administration point of view. The loss of the /dev/sdd device is picked up by udevd which causes node entries for the block device and its associated partitions to be removed. This tends to violate the premise of 'least surprise' when the systems staff logs in to recover the situation and can't remove the failed device from the RAID1 device and are potentially forced to add a completely different device back into the MD device to recover from the event.
Contacted Alan and Aristeu Rozanski due this one -- if this issue is the same as BZ#239604 [RHEL5] console: kobject_add failed, which is already tracked in BZ. Aristeu's reaction: Hello Jan, These issues aren't related. this message is displayed when any part of the kernel try to register an object with an existing name and usually means that there's a bug on the caller. On the vcs case, there's a race on the virtual console code while opening a not totally closed virtual console. If there isn't a BZ# about this RAID bug, please create one and contact the RAID maintainer. -- Aristeu
There are two distinctly different bugs in this issue. The first is the kicking the error up to the filesystem and causing the ext3 filesystem to go readonly. That particular bug is a duplicate of bug 430984, so I'm going to entirely ignore that part of this bug report. The second issue is the reregistration of the fc disk under a different sd name due to the kobject still existing when the device is rescanned. I'm going to defer my decision on whether or not that can be fixed in this release cycle until after I've completed 430984 and can see if it impacts this at all.
This request was evaluated by Red Hat Product Management for inclusion, but this component is not scheduled to be updated in the current Red Hat Enterprise Linux release. If you would like this request to be reviewed for the next minor release, ask your support representative to set the next rhel-x.y flag to "?".
Unfortunately the previous automated notification about the non-inclusion of this request in Red Hat Enterprise Linux 5.3 used the wrong text template. It should have read: this request has been reviewed by Product Management and is not planned for inclusion in the current minor release of Red Hat Enterprise Linux. If you would like this request to be reviewed for the next minor release, ask your support representative to set the next rhel-x.y flag to "?" or raise an exception.
(In reply to comment #3) > There are two distinctly different bugs in this issue. The first is the kicking > the error up to the filesystem and causing the ext3 filesystem to go readonly. > That particular bug is a duplicate of bug 430984, so I'm going to entirely > ignore that part of this bug report. The second issue is the reregistration of > the fc disk under a different sd name due to the kobject still existing when the > device is rescanned. I'm going to defer my decision on whether or not that can > be fixed in this release cycle until after I've completed 430984 and can see if > it impacts this at all. Looks like the second issue is a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=668905 even though it was reported there much later. There's a reference to an upstream fix, so I will dupe this bug to 668905 and work it there. Jes *** This bug has been marked as a duplicate of bug 668905 ***