Bug 859258

Summary: RHEL 6.2 reported IO error during Remove one path at a time
Product: Red Hat Enterprise Linux 6 Reporter: savang <sthun>
Component: device-mapper-multipathAssignee: Ben Marzinski <bmarzins>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 6.2CC: agk, bmarzins, dwysocha, heinzm, msnitzer, prajnoha, prockai, rbalakri, zkabelac
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-10-14 14:27:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
messages, dmesg and multipath -ll none

Description savang 2012-09-20 23:16:48 UTC
Created attachment 615170 [details]
messages, dmesg and multipath -ll

Description of problem:
This is a fabric (Mellanox 6025F FDR switch) environment setup to test multipath switched environment with 12K-40 DDN's Storage Fusion Architecture with FDR InfiniBand. 

RHEL 6.2 reports IO error immediately after disconnecting a cable connected to host channel 0 on controller (0), while it still has remaining active paths between RHEL 6.2 and 12K-40 controller 0 and 1.

Version-Release number of selected component (if applicable):


How reproducible:
Disconnect a cable connected to host channel on controller (0)


Steps to Reproduce:
1. Start multipath IOs to multipath devices using simple -v 1 -p s and let it run for almost 15 minutes.

2. IOs run without any errors when pulling three cables connected to host channels (6, 4 and 2) on controller 0.

Actual results:
The problem is that RHEL 6.2 reports IO error immediately after disconnecting a cable connected to host channel 0 on the same controller (0), while it still has remaining active paths between RHEL 6.2 and 12K-40.
  

Expected results:
IOs should keep running without anyu errors when it still has some remaining active paths between RHEL6.2 and 12K-40 controllers.

Additional info:
[root@yamoto ~]# cat /etc/issue
Red Hat Enterprise Linux Server release 6.2 (Santiago)
Kernel \r on an \m

[root@yamoto ~]# uname -a
Linux yamoto.datadirect.datadirectnet.com 2.6.32-279.1.1.el6.x86_64 #1 SMP Wed Jun 20 11:41:22 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux
[root@yamoto ~]# ibstat
CA 'mlx4_0'
	CA type: MT4099
	Number of ports: 2
	Firmware version: 2.10.700
	Hardware version: 0
	Node GUID: 0x0002c9030038cf20
	System image GUID: 0x0002c9030038cf23
	Port 1:
		State: Active
		Physical state: LinkUp
		Rate: 40 (FDR10)
		Base lid: 1
		LMC: 0
		SM lid: 4
		Capability mask: 0x0251486a
		Port GUID: 0x0002c9030038cf21
		Link layer: InfiniBand
	Port 2:
		State: Down
		Physical state: Disabled
		Rate: 40
		Base lid: 0
		LMC: 0
		SM lid: 0
		Capability mask: 0x0251486a
		Port GUID: 0x0002c9030038cf22
		Link layer: InfiniBand
CA 'mlx4_1'
	CA type: MT4099
	Number of ports: 2
	Firmware version: 2.10.700
	Hardware version: 0
	Node GUID: 0x0002c9030038cef0
	System image GUID: 0x0002c9030038cef3
	Port 1:
		State: Active
		Physical state: LinkUp
		Rate: 40 (FDR10)
		Base lid: 5
		LMC: 0
		SM lid: 4
		Capability mask: 0x0251486a
		Port GUID: 0x0002c9030038cef1
		Link layer: InfiniBand
	Port 2:
		State: Down
		Physical state: Disabled
		Rate: 40
		Base lid: 0
		LMC: 0
		SM lid: 0
		Capability mask: 0x0251486a
		Port GUID: 0x0002c9030038cef2
		Link layer: InfiniBand
[root@yamoto ~]# 
[root@yamoto ~]# rpm -qa | grep device-mapper
device-mapper-multipath-libs-0.4.9-56.el6.x86_64
device-mapper-1.02.74-10.el6.x86_64
device-mapper-multipath-0.4.9-56.el6.x86_64
device-mapper-event-libs-1.02.74-10.el6.x86_64
device-mapper-event-1.02.74-10.el6.x86_64
device-mapper-libs-1.02.74-10.el6.x86_64
[root@yamoto ~]# rpm -qa | grep ddn
ddn_mpath_RHEL6-1.3-2.el6.x86_64
[root@yamoto ~]#

Comment 2 Ben Marzinski 2012-09-22 04:57:41 UTC
There are a lot of things in your messages file that could be issues. However, it's hard to tell what's going on and when your problem occured, since the messages file covers more than 24 hours.

Your multipath -ll output looks even more confusing. According to that, multipath knows it has active paths for all of its devices. Is this output from when things are going wrong? Have the failed paths been removed?

could I see the output of
# multipath -ll

from when it is working and when it isn't, and what the time was when you disconnected the cables which caused the failure, along with the messages. I'd really like to know which path devices are getting disconnected, and which are supposed to still be connected, and it's hard to figure that out without knowing what you were doing when the messages were getting logged.

Do you just get a single IO error, or do you continue to get IO errors when you use the multipath device?  If you are getting errors, but multipath -ll says that you have working paths, you should first make sure that multipathd is running

# service multipathd status

Then you should check if multipath is having problems checking the paths by running

# multipathd show paths

Comment 3 RHEL Program Management 2012-12-14 08:50:23 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 4 Ben Marzinski 2015-10-14 14:27:38 UTC
This bug has been in needinfo for years. Closing it.

Comment 5 Red Hat Bugzilla 2023-09-14 01:37:35 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days