Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Created attachment 615170[details]
messages, dmesg and multipath -ll
Description of problem:
This is a fabric (Mellanox 6025F FDR switch) environment setup to test multipath switched environment with 12K-40 DDN's Storage Fusion Architecture with FDR InfiniBand.
RHEL 6.2 reports IO error immediately after disconnecting a cable connected to host channel 0 on controller (0), while it still has remaining active paths between RHEL 6.2 and 12K-40 controller 0 and 1.
Version-Release number of selected component (if applicable):
How reproducible:
Disconnect a cable connected to host channel on controller (0)
Steps to Reproduce:
1. Start multipath IOs to multipath devices using simple -v 1 -p s and let it run for almost 15 minutes.
2. IOs run without any errors when pulling three cables connected to host channels (6, 4 and 2) on controller 0.
Actual results:
The problem is that RHEL 6.2 reports IO error immediately after disconnecting a cable connected to host channel 0 on the same controller (0), while it still has remaining active paths between RHEL 6.2 and 12K-40.
Expected results:
IOs should keep running without anyu errors when it still has some remaining active paths between RHEL6.2 and 12K-40 controllers.
Additional info:
[root@yamoto ~]# cat /etc/issue
Red Hat Enterprise Linux Server release 6.2 (Santiago)
Kernel \r on an \m
[root@yamoto ~]# uname -a
Linux yamoto.datadirect.datadirectnet.com 2.6.32-279.1.1.el6.x86_64 #1 SMP Wed Jun 20 11:41:22 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux
[root@yamoto ~]# ibstat
CA 'mlx4_0'
CA type: MT4099
Number of ports: 2
Firmware version: 2.10.700
Hardware version: 0
Node GUID: 0x0002c9030038cf20
System image GUID: 0x0002c9030038cf23
Port 1:
State: Active
Physical state: LinkUp
Rate: 40 (FDR10)
Base lid: 1
LMC: 0
SM lid: 4
Capability mask: 0x0251486a
Port GUID: 0x0002c9030038cf21
Link layer: InfiniBand
Port 2:
State: Down
Physical state: Disabled
Rate: 40
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x0251486a
Port GUID: 0x0002c9030038cf22
Link layer: InfiniBand
CA 'mlx4_1'
CA type: MT4099
Number of ports: 2
Firmware version: 2.10.700
Hardware version: 0
Node GUID: 0x0002c9030038cef0
System image GUID: 0x0002c9030038cef3
Port 1:
State: Active
Physical state: LinkUp
Rate: 40 (FDR10)
Base lid: 5
LMC: 0
SM lid: 4
Capability mask: 0x0251486a
Port GUID: 0x0002c9030038cef1
Link layer: InfiniBand
Port 2:
State: Down
Physical state: Disabled
Rate: 40
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x0251486a
Port GUID: 0x0002c9030038cef2
Link layer: InfiniBand
[root@yamoto ~]#
[root@yamoto ~]# rpm -qa | grep device-mapper
device-mapper-multipath-libs-0.4.9-56.el6.x86_64
device-mapper-1.02.74-10.el6.x86_64
device-mapper-multipath-0.4.9-56.el6.x86_64
device-mapper-event-libs-1.02.74-10.el6.x86_64
device-mapper-event-1.02.74-10.el6.x86_64
device-mapper-libs-1.02.74-10.el6.x86_64
[root@yamoto ~]# rpm -qa | grep ddn
ddn_mpath_RHEL6-1.3-2.el6.x86_64
[root@yamoto ~]#
There are a lot of things in your messages file that could be issues. However, it's hard to tell what's going on and when your problem occured, since the messages file covers more than 24 hours.
Your multipath -ll output looks even more confusing. According to that, multipath knows it has active paths for all of its devices. Is this output from when things are going wrong? Have the failed paths been removed?
could I see the output of
# multipath -ll
from when it is working and when it isn't, and what the time was when you disconnected the cables which caused the failure, along with the messages. I'd really like to know which path devices are getting disconnected, and which are supposed to still be connected, and it's hard to figure that out without knowing what you were doing when the messages were getting logged.
Do you just get a single IO error, or do you continue to get IO errors when you use the multipath device? If you are getting errors, but multipath -ll says that you have working paths, you should first make sure that multipathd is running
# service multipathd status
Then you should check if multipath is having problems checking the paths by running
# multipathd show paths
Comment 3RHEL Program Management
2012-12-14 08:50:23 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.
Created attachment 615170 [details] messages, dmesg and multipath -ll Description of problem: This is a fabric (Mellanox 6025F FDR switch) environment setup to test multipath switched environment with 12K-40 DDN's Storage Fusion Architecture with FDR InfiniBand. RHEL 6.2 reports IO error immediately after disconnecting a cable connected to host channel 0 on controller (0), while it still has remaining active paths between RHEL 6.2 and 12K-40 controller 0 and 1. Version-Release number of selected component (if applicable): How reproducible: Disconnect a cable connected to host channel on controller (0) Steps to Reproduce: 1. Start multipath IOs to multipath devices using simple -v 1 -p s and let it run for almost 15 minutes. 2. IOs run without any errors when pulling three cables connected to host channels (6, 4 and 2) on controller 0. Actual results: The problem is that RHEL 6.2 reports IO error immediately after disconnecting a cable connected to host channel 0 on the same controller (0), while it still has remaining active paths between RHEL 6.2 and 12K-40. Expected results: IOs should keep running without anyu errors when it still has some remaining active paths between RHEL6.2 and 12K-40 controllers. Additional info: [root@yamoto ~]# cat /etc/issue Red Hat Enterprise Linux Server release 6.2 (Santiago) Kernel \r on an \m [root@yamoto ~]# uname -a Linux yamoto.datadirect.datadirectnet.com 2.6.32-279.1.1.el6.x86_64 #1 SMP Wed Jun 20 11:41:22 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux [root@yamoto ~]# ibstat CA 'mlx4_0' CA type: MT4099 Number of ports: 2 Firmware version: 2.10.700 Hardware version: 0 Node GUID: 0x0002c9030038cf20 System image GUID: 0x0002c9030038cf23 Port 1: State: Active Physical state: LinkUp Rate: 40 (FDR10) Base lid: 1 LMC: 0 SM lid: 4 Capability mask: 0x0251486a Port GUID: 0x0002c9030038cf21 Link layer: InfiniBand Port 2: State: Down Physical state: Disabled Rate: 40 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x0251486a Port GUID: 0x0002c9030038cf22 Link layer: InfiniBand CA 'mlx4_1' CA type: MT4099 Number of ports: 2 Firmware version: 2.10.700 Hardware version: 0 Node GUID: 0x0002c9030038cef0 System image GUID: 0x0002c9030038cef3 Port 1: State: Active Physical state: LinkUp Rate: 40 (FDR10) Base lid: 5 LMC: 0 SM lid: 4 Capability mask: 0x0251486a Port GUID: 0x0002c9030038cef1 Link layer: InfiniBand Port 2: State: Down Physical state: Disabled Rate: 40 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x0251486a Port GUID: 0x0002c9030038cef2 Link layer: InfiniBand [root@yamoto ~]# [root@yamoto ~]# rpm -qa | grep device-mapper device-mapper-multipath-libs-0.4.9-56.el6.x86_64 device-mapper-1.02.74-10.el6.x86_64 device-mapper-multipath-0.4.9-56.el6.x86_64 device-mapper-event-libs-1.02.74-10.el6.x86_64 device-mapper-event-1.02.74-10.el6.x86_64 device-mapper-libs-1.02.74-10.el6.x86_64 [root@yamoto ~]# rpm -qa | grep ddn ddn_mpath_RHEL6-1.3-2.el6.x86_64 [root@yamoto ~]#