RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 859258 - RHEL 6.2 reported IO error during Remove one path at a time
Summary: RHEL 6.2 reported IO error during Remove one path at a time
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: device-mapper-multipath
Version: 6.2
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: rc
: ---
Assignee: Ben Marzinski
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-09-20 23:16 UTC by savang
Modified: 2023-09-14 01:37 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-10-14 14:27:38 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
messages, dmesg and multipath -ll (41.88 KB, application/x-gzip)
2012-09-20 23:16 UTC, savang
no flags Details

Description savang 2012-09-20 23:16:48 UTC
Created attachment 615170 [details]
messages, dmesg and multipath -ll

Description of problem:
This is a fabric (Mellanox 6025F FDR switch) environment setup to test multipath switched environment with 12K-40 DDN's Storage Fusion Architecture with FDR InfiniBand. 

RHEL 6.2 reports IO error immediately after disconnecting a cable connected to host channel 0 on controller (0), while it still has remaining active paths between RHEL 6.2 and 12K-40 controller 0 and 1.

Version-Release number of selected component (if applicable):


How reproducible:
Disconnect a cable connected to host channel on controller (0)


Steps to Reproduce:
1. Start multipath IOs to multipath devices using simple -v 1 -p s and let it run for almost 15 minutes.

2. IOs run without any errors when pulling three cables connected to host channels (6, 4 and 2) on controller 0.

Actual results:
The problem is that RHEL 6.2 reports IO error immediately after disconnecting a cable connected to host channel 0 on the same controller (0), while it still has remaining active paths between RHEL 6.2 and 12K-40.
  

Expected results:
IOs should keep running without anyu errors when it still has some remaining active paths between RHEL6.2 and 12K-40 controllers.

Additional info:
[root@yamoto ~]# cat /etc/issue
Red Hat Enterprise Linux Server release 6.2 (Santiago)
Kernel \r on an \m

[root@yamoto ~]# uname -a
Linux yamoto.datadirect.datadirectnet.com 2.6.32-279.1.1.el6.x86_64 #1 SMP Wed Jun 20 11:41:22 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux
[root@yamoto ~]# ibstat
CA 'mlx4_0'
	CA type: MT4099
	Number of ports: 2
	Firmware version: 2.10.700
	Hardware version: 0
	Node GUID: 0x0002c9030038cf20
	System image GUID: 0x0002c9030038cf23
	Port 1:
		State: Active
		Physical state: LinkUp
		Rate: 40 (FDR10)
		Base lid: 1
		LMC: 0
		SM lid: 4
		Capability mask: 0x0251486a
		Port GUID: 0x0002c9030038cf21
		Link layer: InfiniBand
	Port 2:
		State: Down
		Physical state: Disabled
		Rate: 40
		Base lid: 0
		LMC: 0
		SM lid: 0
		Capability mask: 0x0251486a
		Port GUID: 0x0002c9030038cf22
		Link layer: InfiniBand
CA 'mlx4_1'
	CA type: MT4099
	Number of ports: 2
	Firmware version: 2.10.700
	Hardware version: 0
	Node GUID: 0x0002c9030038cef0
	System image GUID: 0x0002c9030038cef3
	Port 1:
		State: Active
		Physical state: LinkUp
		Rate: 40 (FDR10)
		Base lid: 5
		LMC: 0
		SM lid: 4
		Capability mask: 0x0251486a
		Port GUID: 0x0002c9030038cef1
		Link layer: InfiniBand
	Port 2:
		State: Down
		Physical state: Disabled
		Rate: 40
		Base lid: 0
		LMC: 0
		SM lid: 0
		Capability mask: 0x0251486a
		Port GUID: 0x0002c9030038cef2
		Link layer: InfiniBand
[root@yamoto ~]# 
[root@yamoto ~]# rpm -qa | grep device-mapper
device-mapper-multipath-libs-0.4.9-56.el6.x86_64
device-mapper-1.02.74-10.el6.x86_64
device-mapper-multipath-0.4.9-56.el6.x86_64
device-mapper-event-libs-1.02.74-10.el6.x86_64
device-mapper-event-1.02.74-10.el6.x86_64
device-mapper-libs-1.02.74-10.el6.x86_64
[root@yamoto ~]# rpm -qa | grep ddn
ddn_mpath_RHEL6-1.3-2.el6.x86_64
[root@yamoto ~]#

Comment 2 Ben Marzinski 2012-09-22 04:57:41 UTC
There are a lot of things in your messages file that could be issues. However, it's hard to tell what's going on and when your problem occured, since the messages file covers more than 24 hours.

Your multipath -ll output looks even more confusing. According to that, multipath knows it has active paths for all of its devices. Is this output from when things are going wrong? Have the failed paths been removed?

could I see the output of
# multipath -ll

from when it is working and when it isn't, and what the time was when you disconnected the cables which caused the failure, along with the messages. I'd really like to know which path devices are getting disconnected, and which are supposed to still be connected, and it's hard to figure that out without knowing what you were doing when the messages were getting logged.

Do you just get a single IO error, or do you continue to get IO errors when you use the multipath device?  If you are getting errors, but multipath -ll says that you have working paths, you should first make sure that multipathd is running

# service multipathd status

Then you should check if multipath is having problems checking the paths by running

# multipathd show paths

Comment 3 RHEL Program Management 2012-12-14 08:50:23 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 4 Ben Marzinski 2015-10-14 14:27:38 UTC
This bug has been in needinfo for years. Closing it.

Comment 5 Red Hat Bugzilla 2023-09-14 01:37:35 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.