Bug 499080

Summary: Using readsector0 path checker, multipath displays the incorrect path information the first time after recovery.
Product: Red Hat Enterprise Linux 5 Reporter: Wade Mealing <wmealing>
Component: device-mapper-multipathAssignee: Ben Marzinski <bmarzins>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: low    
Version: 5.3CC: agk, bdonahue, bmarzins, bmr, christophe.varoqui, coughlan, dwysocha, edamato, egoggin, heinzm, junichi.nomura, kueda, lmb, mbroz, mnovacek, prockai, tao, tranlan
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-03-30 08:31:49 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Patch to read sense data and retry on Unit Attention none

Description Wade Mealing 2009-05-05 06:29:22 UTC
Description of problem:

Using readsector0 path checker, multipath displays the incorrect path information the first time after recovery.    I believe this is because the ioctl in multipaths readsector0 path checker is the first user space.

Version-Release number of selected component (if applicable):

evice-mapper-multipath-0.4.7-23.el5.src.rpm

How reproducible:

Every time

Steps to Reproduce:
1. Unplug san connection
2. Have path fail.
3. Restore connection (see connections restore in log)
4. Run multipath -ll
  
Actual results:

Paths show as failed, even as step 4 as run.

Expected results:

Paths to be shown in current state.

Additional info:

I've whipped up a patch that seems to work.  May not be the best way, but works for me.

Comment 2 Ben Marzinski 2009-05-19 23:39:49 UTC
The driver status bit indicates that there is sense data in the sense buffer. Instead of simply retrying here, we should look at the sense data, and do the
appropriate thing. If you look at the upstream readsector0 code, it does this.  I'm not sure that it will retry for your case, but it should be easy to add the code to do it.

I'll try to backport the upstream code, with some extra debugging output, tomorrow. And you can see if fixes the issue for you, or if not, what sense data the driver is setting.

Comment 3 Ben Marzinski 2009-05-20 20:05:31 UTC
Created attachment 344873 [details]
Patch to read sense data and retry on Unit Attention

Can you try this patch and see if it fixes the issues.  Even if it doesn't, it should spit out the Sense data key, which we can then check for and retry on.

Comment 4 Wade Mealing 2009-05-21 01:58:21 UTC
Gday Ben,

Unfortunately I was unable to reproduce the issue and customer does not wish to test.  (My local issue gives a different sense error).  Will try to confirm that its sane on the SAN here locally.

Will update when I get results.

Comment 7 Wade Mealing 2009-06-23 23:39:38 UTC
Gday Ben,

Unable to specifically reproduce the issue here (can't seem to simulate a working multipath with scsi_debug module) or on the real hardware.

Wade

Comment 11 Ben Marzinski 2009-09-11 14:34:12 UTC
Patch applied.

Comment 13 michal novacek 2010-01-08 16:40:05 UTC
I was unsuccessful with reproduction so I did SanityOnly.

# diff multipath-tools-0.4.7.rhel5.17/libcheckers/readsector0.c \ /root/readsector0.c.0.4.7-10.el5 

17d16
< #include "../libmultipath/debug.h"
54d52
< 	int retry_count = 3;
80,81d77
< retry:
< 	memset(senseBuff, 0, SENSE_BUFF_LEN);
96,113d91
< 		int key = 0;
< 
< 		if (io_hdr.sb_len_wr > 3) {
< 			if (senseBuff[0] == 0x72 || senseBuff[0] == 0x73)
< 				key = senseBuff[1] & 0x0f;
< 			else if (io_hdr.sb_len_wr > 13 &&
< 				 ((senseBuff[0] & 0x7f) == 0x70 ||
< 				  (senseBuff[0] & 0x7f) == 0x71))
< 				key = senseBuff[2] & 0x0f;
< 		}
< 
< 		/*
< 		 * Retry if UNIT_ATTENTION check condition.
< 		 */
< 		if (key == 0x6) {
< 			if (--retry_count)
< 				goto retry;
< 		}

Comment 16 errata-xmlrpc 2010-03-30 08:31:49 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2010-0255.html