Bug 499080 - Using readsector0 path checker, multipath displays the incorrect path information the first time after recovery.
Summary: Using readsector0 path checker, multipath displays the incorrect path informa...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: device-mapper-multipath
Version: 5.3
Hardware: All
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Ben Marzinski
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-05-05 06:29 UTC by Wade Mealing
Modified: 2018-10-27 14:26 UTC (History)
18 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-03-30 08:31:49 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Patch to read sense data and retry on Unit Attention (1.42 KB, patch)
2009-05-20 20:05 UTC, Ben Marzinski
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2010:0255 0 normal SHIPPED_LIVE device-mapper-multipath bug fix and enhancement update 2010-03-29 12:47:17 UTC

Description Wade Mealing 2009-05-05 06:29:22 UTC
Description of problem:

Using readsector0 path checker, multipath displays the incorrect path information the first time after recovery.    I believe this is because the ioctl in multipaths readsector0 path checker is the first user space.

Version-Release number of selected component (if applicable):

evice-mapper-multipath-0.4.7-23.el5.src.rpm

How reproducible:

Every time

Steps to Reproduce:
1. Unplug san connection
2. Have path fail.
3. Restore connection (see connections restore in log)
4. Run multipath -ll
  
Actual results:

Paths show as failed, even as step 4 as run.

Expected results:

Paths to be shown in current state.

Additional info:

I've whipped up a patch that seems to work.  May not be the best way, but works for me.

Comment 2 Ben Marzinski 2009-05-19 23:39:49 UTC
The driver status bit indicates that there is sense data in the sense buffer. Instead of simply retrying here, we should look at the sense data, and do the
appropriate thing. If you look at the upstream readsector0 code, it does this.  I'm not sure that it will retry for your case, but it should be easy to add the code to do it.

I'll try to backport the upstream code, with some extra debugging output, tomorrow. And you can see if fixes the issue for you, or if not, what sense data the driver is setting.

Comment 3 Ben Marzinski 2009-05-20 20:05:31 UTC
Created attachment 344873 [details]
Patch to read sense data and retry on Unit Attention

Can you try this patch and see if it fixes the issues.  Even if it doesn't, it should spit out the Sense data key, which we can then check for and retry on.

Comment 4 Wade Mealing 2009-05-21 01:58:21 UTC
Gday Ben,

Unfortunately I was unable to reproduce the issue and customer does not wish to test.  (My local issue gives a different sense error).  Will try to confirm that its sane on the SAN here locally.

Will update when I get results.

Comment 7 Wade Mealing 2009-06-23 23:39:38 UTC
Gday Ben,

Unable to specifically reproduce the issue here (can't seem to simulate a working multipath with scsi_debug module) or on the real hardware.

Wade

Comment 11 Ben Marzinski 2009-09-11 14:34:12 UTC
Patch applied.

Comment 13 michal novacek 2010-01-08 16:40:05 UTC
I was unsuccessful with reproduction so I did SanityOnly.

# diff multipath-tools-0.4.7.rhel5.17/libcheckers/readsector0.c \ /root/readsector0.c.0.4.7-10.el5 

17d16
< #include "../libmultipath/debug.h"
54d52
< 	int retry_count = 3;
80,81d77
< retry:
< 	memset(senseBuff, 0, SENSE_BUFF_LEN);
96,113d91
< 		int key = 0;
< 
< 		if (io_hdr.sb_len_wr > 3) {
< 			if (senseBuff[0] == 0x72 || senseBuff[0] == 0x73)
< 				key = senseBuff[1] & 0x0f;
< 			else if (io_hdr.sb_len_wr > 13 &&
< 				 ((senseBuff[0] & 0x7f) == 0x70 ||
< 				  (senseBuff[0] & 0x7f) == 0x71))
< 				key = senseBuff[2] & 0x0f;
< 		}
< 
< 		/*
< 		 * Retry if UNIT_ATTENTION check condition.
< 		 */
< 		if (key == 0x6) {
< 			if (--retry_count)
< 				goto retry;
< 		}

Comment 16 errata-xmlrpc 2010-03-30 08:31:49 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2010-0255.html


Note You need to log in before you can comment on or make changes to this bug.