Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 454093

Summary: Install successfully but fail to failover
Product: Red Hat Enterprise Linux 5 Reporter: I-Chung Ho <tom.ho>
Component: device-mapper-multipathAssignee: Ben Marzinski <bmarzins>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Cluster QE <mspqa-list>
Severity: urgent Docs Contact:
Priority: low    
Version: 5.0CC: agk, Anthony.Laker, bmarzins, bmr, christophe.varoqui, dennis.zhou, dwysocha, egoggin, heinzm, iannis, junichi.nomura, kueda, lmb, mark.rugare, mbroz, prockai, tom.ho, tranlan
Target Milestone: rc   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-05-12 17:24:40 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Sytem log when we perform hot plug miniSAS cable
none
system log
none
System log, one disk's path missing none

Description I-Chung Ho 2008-07-04 15:43:41 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; SLCC1; .NET CLR 2.0.50727; .NET CLR 3.0.04506; Tablet PC 2.0)

Description of problem:
We have Sun Storage J4200 and LSI 1068E HBA card on RHEL5.
We try to use device-mapper to handle the multipath.
There is no problem to install and enable the multipath.
And no problem to create software RAID (mdadm) with the merged devices(/dev/dm*) and run IO on the raid. 

The problem occurs when we try to failover.
We perform the hot plug in/out miniSAS cables on the host.
Before plugging out the miniSAS cable we do check that all paths are ready.
But after several cycles hot plugging, the IO fail when we plug out the miniSAS cable. The other path remain ready and active and the link LED indicates normal.

We have tried different mutipath setting, such as set path_grouping_policy=failover or different path_checker.
The symptom remains the same.

Version-Release number of selected component (if applicable):
device-mapper-multipath-0.4.7-8.el5.i386

How reproducible:
Always


Steps to Reproduce:
1.One host connect to Sun Storage J4200 with 2 paths.
2.Enable the multipath.
3.Create RAID(any RAID, 1, 5, 6) by mdadm with the merged devices.
4.Start IO.
5.Plug out one miniSAS on host.
6.Make sure that half of the path have gone.
7.Plug in the miniSAS cable.
8.Make sure that all the paths have back.
9.Plug out another miniSAS on host.
10.Make sure that half of the paths have gone.
11.Plug in the miniSAS cable.
12.Make sure that all paths have back.
13.Repeat step 5~12.

Actual Results:
After 2~3 cycles of hot plugging the miniSAS cable, the IO hangs.

Expected Results:
When we plug out one miniSAS cable the other cable should remain maintaining the IO. And IO should remain running no matter how many cycles of hot plugging the miniSAS calbe we have performed.

Additional info:
HBA driver: 4.00.21.00-1

multipath.conf:
device {
	vendor				"SEAGATE"
	product				"ST314655SSUN146G"
	path_grouping_policy	        multibus 
	getuid_callout 			"/sbin/scsi_id -g -u -s /block/%n"
	prio_callout			"none"
	path_checker			tur
	path_selector			"round-robin 0"
	failback			immediate
	rr_weight			uniform
	hardware_handler		"0"
	no_path_retry			fail
	user_friendly_name		no
}

Comment 1 Ben Marzinski 2008-07-07 19:57:01 UTC
First off, I trust you have multipathd running.  If it's not, please run

# service multipathd start
# chkconfig multipathd on

and see if that fixes your problem.

Could you please send me the output of /var/log/messages while you are pulling
the cables?

You could also try setting no_path_retry to something other than "fail". For
instance, if you have

no_path_retry    10

multipath will not fail the IO until it checks for active paths 10 times in a
row, and doesn't find any (With the default polling_interval of 5 seconds, this
should allow multipath to deal with all paths being down for up to 50 seconds).
 However, if this fixes the problem, please let me know.  If you are making sure
that the paths lost during the last cable pull have come back up before pulling
the next cable, like you mentioned in your steps to reproduce, then I don't see
why this should be necessary.

Comment 2 I-Chung Ho 2008-07-09 07:20:30 UTC
Created attachment 311349 [details]
Sytem log when we perform hot plug miniSAS cable

Comment 3 Ben Marzinski 2008-08-13 16:29:14 UTC
Looking at your system log, there is no multipathd output at all. Are you sure that multipathd is running?

Can you please send me the output of
# multipath -v3 -ll

from before the start of your test.  Then run
# service multipathd stop
# multipathd -v3

Then start your test, and collect the output to /var/log/messages for the entire length of your test.

Comment 4 I-Chung Ho 2008-08-19 11:15:31 UTC
Created attachment 314532 [details]
system log

System log when we perform hot plug miniSAS cabe, one path of dm-53 didn't come back.

Comment 5 I-Chung Ho 2008-08-19 11:18:10 UTC
We have tried the command you suggested:
# service multipathd stop
# multipathd -v3

It pass 5 cycles of hot plugging miniSAS.
(What is different between:
# service multipathd stop
and
# multipathd -v3)

But one path of dm-53 didn't comeback after we plug the miniSAS cable.
It's a SAS disk.

We will try on SATA disk.

Comment 6 I-Chung Ho 2008-09-05 16:41:31 UTC
Created attachment 315917 [details]
System log, one disk's path missing

We still have issue with hot-plugging sas cable.
A disk's path() won't come back after several cycles of hot-plug.

Comment 8 Ben Marzinski 2010-05-05 19:31:54 UTC
Is this still an issue?  There have been multiple changes that may have solved this issue.  If this is still reproduceable, please let me know, otherwise I'm going to close this bug.

Comment 9 Ben Marzinski 2010-05-12 17:24:40 UTC
Numerous SAS fixes have happened since 5.0.  This bug should be solved, and there is not enough information for me to debug in this bugzilla.