Description of problem: Using readsector0 path checker, multipath displays the incorrect path information the first time after recovery. I believe this is because the ioctl in multipaths readsector0 path checker is the first user space. Version-Release number of selected component (if applicable): evice-mapper-multipath-0.4.7-23.el5.src.rpm How reproducible: Every time Steps to Reproduce: 1. Unplug san connection 2. Have path fail. 3. Restore connection (see connections restore in log) 4. Run multipath -ll Actual results: Paths show as failed, even as step 4 as run. Expected results: Paths to be shown in current state. Additional info: I've whipped up a patch that seems to work. May not be the best way, but works for me.
The driver status bit indicates that there is sense data in the sense buffer. Instead of simply retrying here, we should look at the sense data, and do the appropriate thing. If you look at the upstream readsector0 code, it does this. I'm not sure that it will retry for your case, but it should be easy to add the code to do it. I'll try to backport the upstream code, with some extra debugging output, tomorrow. And you can see if fixes the issue for you, or if not, what sense data the driver is setting.
Created attachment 344873 [details] Patch to read sense data and retry on Unit Attention Can you try this patch and see if it fixes the issues. Even if it doesn't, it should spit out the Sense data key, which we can then check for and retry on.
Gday Ben, Unfortunately I was unable to reproduce the issue and customer does not wish to test. (My local issue gives a different sense error). Will try to confirm that its sane on the SAN here locally. Will update when I get results.
Gday Ben, Unable to specifically reproduce the issue here (can't seem to simulate a working multipath with scsi_debug module) or on the real hardware. Wade
Patch applied.
I was unsuccessful with reproduction so I did SanityOnly. # diff multipath-tools-0.4.7.rhel5.17/libcheckers/readsector0.c \ /root/readsector0.c.0.4.7-10.el5 17d16 < #include "../libmultipath/debug.h" 54d52 < int retry_count = 3; 80,81d77 < retry: < memset(senseBuff, 0, SENSE_BUFF_LEN); 96,113d91 < int key = 0; < < if (io_hdr.sb_len_wr > 3) { < if (senseBuff[0] == 0x72 || senseBuff[0] == 0x73) < key = senseBuff[1] & 0x0f; < else if (io_hdr.sb_len_wr > 13 && < ((senseBuff[0] & 0x7f) == 0x70 || < (senseBuff[0] & 0x7f) == 0x71)) < key = senseBuff[2] & 0x0f; < } < < /* < * Retry if UNIT_ATTENTION check condition. < */ < if (key == 0x6) { < if (--retry_count) < goto retry; < }
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2010-0255.html