Bug 859258
Summary: | RHEL 6.2 reported IO error during Remove one path at a time | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | savang <sthun> | ||||
Component: | device-mapper-multipath | Assignee: | Ben Marzinski <bmarzins> | ||||
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 6.2 | CC: | agk, bmarzins, dwysocha, heinzm, msnitzer, prajnoha, prockai, rbalakri, zkabelac | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2015-10-14 14:27:38 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
There are a lot of things in your messages file that could be issues. However, it's hard to tell what's going on and when your problem occured, since the messages file covers more than 24 hours. Your multipath -ll output looks even more confusing. According to that, multipath knows it has active paths for all of its devices. Is this output from when things are going wrong? Have the failed paths been removed? could I see the output of # multipath -ll from when it is working and when it isn't, and what the time was when you disconnected the cables which caused the failure, along with the messages. I'd really like to know which path devices are getting disconnected, and which are supposed to still be connected, and it's hard to figure that out without knowing what you were doing when the messages were getting logged. Do you just get a single IO error, or do you continue to get IO errors when you use the multipath device? If you are getting errors, but multipath -ll says that you have working paths, you should first make sure that multipathd is running # service multipathd status Then you should check if multipath is having problems checking the paths by running # multipathd show paths This request was not resolved in time for the current release. Red Hat invites you to ask your support representative to propose this request, if still desired, for consideration in the next release of Red Hat Enterprise Linux. This bug has been in needinfo for years. Closing it. The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |
Created attachment 615170 [details] messages, dmesg and multipath -ll Description of problem: This is a fabric (Mellanox 6025F FDR switch) environment setup to test multipath switched environment with 12K-40 DDN's Storage Fusion Architecture with FDR InfiniBand. RHEL 6.2 reports IO error immediately after disconnecting a cable connected to host channel 0 on controller (0), while it still has remaining active paths between RHEL 6.2 and 12K-40 controller 0 and 1. Version-Release number of selected component (if applicable): How reproducible: Disconnect a cable connected to host channel on controller (0) Steps to Reproduce: 1. Start multipath IOs to multipath devices using simple -v 1 -p s and let it run for almost 15 minutes. 2. IOs run without any errors when pulling three cables connected to host channels (6, 4 and 2) on controller 0. Actual results: The problem is that RHEL 6.2 reports IO error immediately after disconnecting a cable connected to host channel 0 on the same controller (0), while it still has remaining active paths between RHEL 6.2 and 12K-40. Expected results: IOs should keep running without anyu errors when it still has some remaining active paths between RHEL6.2 and 12K-40 controllers. Additional info: [root@yamoto ~]# cat /etc/issue Red Hat Enterprise Linux Server release 6.2 (Santiago) Kernel \r on an \m [root@yamoto ~]# uname -a Linux yamoto.datadirect.datadirectnet.com 2.6.32-279.1.1.el6.x86_64 #1 SMP Wed Jun 20 11:41:22 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux [root@yamoto ~]# ibstat CA 'mlx4_0' CA type: MT4099 Number of ports: 2 Firmware version: 2.10.700 Hardware version: 0 Node GUID: 0x0002c9030038cf20 System image GUID: 0x0002c9030038cf23 Port 1: State: Active Physical state: LinkUp Rate: 40 (FDR10) Base lid: 1 LMC: 0 SM lid: 4 Capability mask: 0x0251486a Port GUID: 0x0002c9030038cf21 Link layer: InfiniBand Port 2: State: Down Physical state: Disabled Rate: 40 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x0251486a Port GUID: 0x0002c9030038cf22 Link layer: InfiniBand CA 'mlx4_1' CA type: MT4099 Number of ports: 2 Firmware version: 2.10.700 Hardware version: 0 Node GUID: 0x0002c9030038cef0 System image GUID: 0x0002c9030038cef3 Port 1: State: Active Physical state: LinkUp Rate: 40 (FDR10) Base lid: 5 LMC: 0 SM lid: 4 Capability mask: 0x0251486a Port GUID: 0x0002c9030038cef1 Link layer: InfiniBand Port 2: State: Down Physical state: Disabled Rate: 40 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x0251486a Port GUID: 0x0002c9030038cef2 Link layer: InfiniBand [root@yamoto ~]# [root@yamoto ~]# rpm -qa | grep device-mapper device-mapper-multipath-libs-0.4.9-56.el6.x86_64 device-mapper-1.02.74-10.el6.x86_64 device-mapper-multipath-0.4.9-56.el6.x86_64 device-mapper-event-libs-1.02.74-10.el6.x86_64 device-mapper-event-1.02.74-10.el6.x86_64 device-mapper-libs-1.02.74-10.el6.x86_64 [root@yamoto ~]# rpm -qa | grep ddn ddn_mpath_RHEL6-1.3-2.el6.x86_64 [root@yamoto ~]#