| Summary: | multipath recovery from scsi devices offlined by scsi err handler | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Mark Goodwin <mgoodwin> |
| Component: | device-mapper-multipath | Assignee: | Ben Marzinski <bmarzins> |
| Status: | CLOSED DUPLICATE | QA Contact: | Red Hat Kernel QE team <kernel-qe> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 5.6 | CC: | agk, bmarzins, bmr, dwysocha, heinzm, mbroz, mchristi, prajnoha, prockai, zkabelac |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2011-03-14 19:49:11 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Mark Goodwin
2011-02-18 04:23:33 UTC
So you want multipath to notice that the scsi device state is offlined in sysfs, and reset it to running? I suppose that multipath could do that, but it would make it controllable by a configuration variable, since I don't think that everyone would want this behaviour. There was a bug in 5.5/ where if a device was offlined and the transport came back the fc class could not set the devices back to running. This was fixed in 5.6. Not sure if that is what you are hitting. This is for cases where the remote port goes from Online->Blocked->something else like Not Present->Online. For other cases where the remote port is not affected (so the port state stays in Online the entire time), then the fix in 5.6 would not help you. And you probably want to make this configurable. If the device has gone bad, I think a something like a INQUIRY (some path testers use that, right?) could work in some cases on some targets but READs/WRITEs might fail. (In reply to comment #1) > So you want multipath to notice that the scsi device state is offlined in > sysfs, and reset it to running? Well, only if the fix for BZ 641193 doesn't help, but it looks like that bug may be the root cause here - this site is running 2.6.18-194.11.1.el5 and so is affected by the regression that Mike mentioned (where only devices in state SDEV_BLOCK would be auto transitioned back to SDEV_RUNNING). The fix in RHEL5.6 transitions devices from any state (including SDEV_OFFLINE) back to SDEV_RUNNING when the rport returns. So I think the only reason we'd want this RFE for multipath to force the transition backto SDEV_RUNNING is for the case where the rport remains online, despite the devices being offlined. I don't know how often that has been hit in the wild, if ever? so perhaps DUP this to BZ 641193. Thoughts? Regards and thanks -- Mark Goodwin This is the first report I've heard of multipath path devices getting incorrectly marked as offline, so I'm leaning towards DUPing it. Mike, do you know of any other cases where we'd need to worry about this? Ben, are you going to DUP this one? (did you hear back from Mike?) Regards -- Mark *** This bug has been marked as a duplicate of bug 641193 *** |