Bug 499080 - Using readsector0 path checker, multipath displays the incorrect path information the first time after recovery.
Using readsector0 path checker, multipath displays the incorrect path informa...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: device-mapper-multipath (Show other bugs)
5.3
All Linux
low Severity medium
: rc
: ---
Assigned To: Ben Marzinski
Cluster QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-05-05 02:29 EDT by Wade Mealing
Modified: 2010-10-23 05:24 EDT (History)
18 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-03-30 04:31:49 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Patch to read sense data and retry on Unit Attention (1.42 KB, patch)
2009-05-20 16:05 EDT, Ben Marzinski
no flags Details | Diff

  None (edit)
Description Wade Mealing 2009-05-05 02:29:22 EDT
Description of problem:

Using readsector0 path checker, multipath displays the incorrect path information the first time after recovery.    I believe this is because the ioctl in multipaths readsector0 path checker is the first user space.

Version-Release number of selected component (if applicable):

evice-mapper-multipath-0.4.7-23.el5.src.rpm

How reproducible:

Every time

Steps to Reproduce:
1. Unplug san connection
2. Have path fail.
3. Restore connection (see connections restore in log)
4. Run multipath -ll
  
Actual results:

Paths show as failed, even as step 4 as run.

Expected results:

Paths to be shown in current state.

Additional info:

I've whipped up a patch that seems to work.  May not be the best way, but works for me.
Comment 2 Ben Marzinski 2009-05-19 19:39:49 EDT
The driver status bit indicates that there is sense data in the sense buffer. Instead of simply retrying here, we should look at the sense data, and do the
appropriate thing. If you look at the upstream readsector0 code, it does this.  I'm not sure that it will retry for your case, but it should be easy to add the code to do it.

I'll try to backport the upstream code, with some extra debugging output, tomorrow. And you can see if fixes the issue for you, or if not, what sense data the driver is setting.
Comment 3 Ben Marzinski 2009-05-20 16:05:31 EDT
Created attachment 344873 [details]
Patch to read sense data and retry on Unit Attention

Can you try this patch and see if it fixes the issues.  Even if it doesn't, it should spit out the Sense data key, which we can then check for and retry on.
Comment 4 Wade Mealing 2009-05-20 21:58:21 EDT
Gday Ben,

Unfortunately I was unable to reproduce the issue and customer does not wish to test.  (My local issue gives a different sense error).  Will try to confirm that its sane on the SAN here locally.

Will update when I get results.
Comment 7 Wade Mealing 2009-06-23 19:39:38 EDT
Gday Ben,

Unable to specifically reproduce the issue here (can't seem to simulate a working multipath with scsi_debug module) or on the real hardware.

Wade
Comment 11 Ben Marzinski 2009-09-11 10:34:12 EDT
Patch applied.
Comment 13 michal novacek 2010-01-08 11:40:05 EST
I was unsuccessful with reproduction so I did SanityOnly.

# diff multipath-tools-0.4.7.rhel5.17/libcheckers/readsector0.c \ /root/readsector0.c.0.4.7-10.el5 

17d16
< #include "../libmultipath/debug.h"
54d52
< 	int retry_count = 3;
80,81d77
< retry:
< 	memset(senseBuff, 0, SENSE_BUFF_LEN);
96,113d91
< 		int key = 0;
< 
< 		if (io_hdr.sb_len_wr > 3) {
< 			if (senseBuff[0] == 0x72 || senseBuff[0] == 0x73)
< 				key = senseBuff[1] & 0x0f;
< 			else if (io_hdr.sb_len_wr > 13 &&
< 				 ((senseBuff[0] & 0x7f) == 0x70 ||
< 				  (senseBuff[0] & 0x7f) == 0x71))
< 				key = senseBuff[2] & 0x0f;
< 		}
< 
< 		/*
< 		 * Retry if UNIT_ATTENTION check condition.
< 		 */
< 		if (key == 0x6) {
< 			if (--retry_count)
< 				goto retry;
< 		}
Comment 16 errata-xmlrpc 2010-03-30 04:31:49 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2010-0255.html

Note You need to log in before you can comment on or make changes to this bug.