Bug 859258 - RHEL 6.2 reported IO error during Remove one path at a time [NEEDINFO]
RHEL 6.2 reported IO error during Remove one path at a time
Status: CLOSED INSUFFICIENT_DATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: device-mapper-multipath (Show other bugs)
6.2
x86_64 Linux
unspecified Severity urgent
: rc
: ---
Assigned To: Ben Marzinski
Red Hat Kernel QE team
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-09-20 19:16 EDT by savang
Modified: 2015-10-14 10:27 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-10-14 10:27:38 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
bmarzins: needinfo? (sthun)


Attachments (Terms of Use)
messages, dmesg and multipath -ll (41.88 KB, application/x-gzip)
2012-09-20 19:16 EDT, savang
no flags Details

  None (edit)
Description savang 2012-09-20 19:16:48 EDT
Created attachment 615170 [details]
messages, dmesg and multipath -ll

Description of problem:
This is a fabric (Mellanox 6025F FDR switch) environment setup to test multipath switched environment with 12K-40 DDN's Storage Fusion Architecture with FDR InfiniBand. 

RHEL 6.2 reports IO error immediately after disconnecting a cable connected to host channel 0 on controller (0), while it still has remaining active paths between RHEL 6.2 and 12K-40 controller 0 and 1.

Version-Release number of selected component (if applicable):


How reproducible:
Disconnect a cable connected to host channel on controller (0)


Steps to Reproduce:
1. Start multipath IOs to multipath devices using simple -v 1 -p s and let it run for almost 15 minutes.

2. IOs run without any errors when pulling three cables connected to host channels (6, 4 and 2) on controller 0.

Actual results:
The problem is that RHEL 6.2 reports IO error immediately after disconnecting a cable connected to host channel 0 on the same controller (0), while it still has remaining active paths between RHEL 6.2 and 12K-40.
  

Expected results:
IOs should keep running without anyu errors when it still has some remaining active paths between RHEL6.2 and 12K-40 controllers.

Additional info:
[root@yamoto ~]# cat /etc/issue
Red Hat Enterprise Linux Server release 6.2 (Santiago)
Kernel \r on an \m

[root@yamoto ~]# uname -a
Linux yamoto.datadirect.datadirectnet.com 2.6.32-279.1.1.el6.x86_64 #1 SMP Wed Jun 20 11:41:22 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux
[root@yamoto ~]# ibstat
CA 'mlx4_0'
	CA type: MT4099
	Number of ports: 2
	Firmware version: 2.10.700
	Hardware version: 0
	Node GUID: 0x0002c9030038cf20
	System image GUID: 0x0002c9030038cf23
	Port 1:
		State: Active
		Physical state: LinkUp
		Rate: 40 (FDR10)
		Base lid: 1
		LMC: 0
		SM lid: 4
		Capability mask: 0x0251486a
		Port GUID: 0x0002c9030038cf21
		Link layer: InfiniBand
	Port 2:
		State: Down
		Physical state: Disabled
		Rate: 40
		Base lid: 0
		LMC: 0
		SM lid: 0
		Capability mask: 0x0251486a
		Port GUID: 0x0002c9030038cf22
		Link layer: InfiniBand
CA 'mlx4_1'
	CA type: MT4099
	Number of ports: 2
	Firmware version: 2.10.700
	Hardware version: 0
	Node GUID: 0x0002c9030038cef0
	System image GUID: 0x0002c9030038cef3
	Port 1:
		State: Active
		Physical state: LinkUp
		Rate: 40 (FDR10)
		Base lid: 5
		LMC: 0
		SM lid: 4
		Capability mask: 0x0251486a
		Port GUID: 0x0002c9030038cef1
		Link layer: InfiniBand
	Port 2:
		State: Down
		Physical state: Disabled
		Rate: 40
		Base lid: 0
		LMC: 0
		SM lid: 0
		Capability mask: 0x0251486a
		Port GUID: 0x0002c9030038cef2
		Link layer: InfiniBand
[root@yamoto ~]# 
[root@yamoto ~]# rpm -qa | grep device-mapper
device-mapper-multipath-libs-0.4.9-56.el6.x86_64
device-mapper-1.02.74-10.el6.x86_64
device-mapper-multipath-0.4.9-56.el6.x86_64
device-mapper-event-libs-1.02.74-10.el6.x86_64
device-mapper-event-1.02.74-10.el6.x86_64
device-mapper-libs-1.02.74-10.el6.x86_64
[root@yamoto ~]# rpm -qa | grep ddn
ddn_mpath_RHEL6-1.3-2.el6.x86_64
[root@yamoto ~]#
Comment 2 Ben Marzinski 2012-09-22 00:57:41 EDT
There are a lot of things in your messages file that could be issues. However, it's hard to tell what's going on and when your problem occured, since the messages file covers more than 24 hours.

Your multipath -ll output looks even more confusing. According to that, multipath knows it has active paths for all of its devices. Is this output from when things are going wrong? Have the failed paths been removed?

could I see the output of
# multipath -ll

from when it is working and when it isn't, and what the time was when you disconnected the cables which caused the failure, along with the messages. I'd really like to know which path devices are getting disconnected, and which are supposed to still be connected, and it's hard to figure that out without knowing what you were doing when the messages were getting logged.

Do you just get a single IO error, or do you continue to get IO errors when you use the multipath device?  If you are getting errors, but multipath -ll says that you have working paths, you should first make sure that multipathd is running

# service multipathd status

Then you should check if multipath is having problems checking the paths by running

# multipathd show paths
Comment 3 RHEL Product and Program Management 2012-12-14 03:50:23 EST
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.
Comment 4 Ben Marzinski 2015-10-14 10:27:38 EDT
This bug has been in needinfo for years. Closing it.

Note You need to log in before you can comment on or make changes to this bug.