Bug 1112861
Summary: | [Blocked] iSCSI multipath fails to work and only succeeds after adding configuration values for network using sysctl | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Amador Pahim <asegundo> | ||||||||
Component: | vdsm | Assignee: | Maor <mlipchuk> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Elad <ebenahar> | ||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | high | ||||||||||
Version: | 3.4.0 | CC: | acanan, amureini, asegundo, bazulay, bmarzins, daniel.helgenberger, danken, derez, dornelas, ebenahar, ederevea, howey.vernon, iheim, lpeer, mkalinin, mlipchuk, schlegel, scohen, tnisan, yeylon | ||||||||
Target Milestone: | ovirt-3.6.0-rc | Keywords: | ZStream | ||||||||
Target Release: | 3.6.0 | Flags: | amureini:
Triaged+
|
||||||||
Hardware: | All | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: |
Previously, a host became non-operational when a network interface was blocked in a multipath environment where the host contained two network devices both configured for the same subnet and configured with an iSCSI bond. This occurred because by the time the failing path is active or ready again, it is not considered by multipath anymore until a new host changes state to Up. Now, when configuring the iSCSI bond network interfaces, VDSM configures the multipath with the correct interface passed by the engine. As a result, when one of the network interfaces on the same subnet becomes non-responsive, path 2 will be used to reach the iSCSI target, and hosts will continue to operate normally.
|
Story Points: | --- | ||||||||
Clone Of: | |||||||||||
: | 1178838 (view as bug list) | Environment: | |||||||||
Last Closed: | 2016-03-09 19:21:44 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | 1184497 | ||||||||||
Bug Blocks: | 1178838 | ||||||||||
Attachments: |
|
Description
Amador Pahim
2014-06-24 20:28:12 UTC
Additional information: 3: eth1:... ... inet 192.168.25.111/24 brd 192.168.25.255 scope global eth1 ... 4: eth2:... ... inet 192.168.25.112/24 brd 192.168.25.255 scope global eth2 ... Besides the "iface.hwaddress", another change is needed in worder to get iSCSI multipath working in this setup: Add the following values to "/etc/sysctl.conf": net.ipv4.conf.all.arp_ignore = 1 net.ipv4.conf.all.arp_announce = 2 net.ipv4.conf.all.rp_filter = 2 net.netfilter.nf_conntrack_tcp_be_liberal = 1 Then execute: # sysctl -p So, I think vdsm initial configuration or ovirt-host-deploy should handle to have them properly configured. # cat /var/lib/iscsi/ifaces/eth1 # BEGIN RECORD 6.2.0-873.10.el6 iface.iscsi_ifacename = eth1 iface.transport_name = tcp iface.vlan_id = 0 iface.vlan_priority = 0 iface.iface_num = 0 iface.mtu = 0 iface.port = 0 # END RECORD # cat /var/lib/iscsi/ifaces/eth2 # BEGIN RECORD 6.2.0-873.10.el6 iface.iscsi_ifacename = eth2 iface.transport_name = tcp iface.vlan_id = 0 iface.vlan_priority = 0 iface.iface_num = 0 iface.mtu = 0 iface.port = 0 # END RECORD Created attachment 913847 [details]
vdsm log
Created attachment 913848 [details]
engine log
The engine.log attached starts at Jun 26, Amador, do you have the relevant logs? (In reply to Elad from comment #7) > The engine.log attached starts at Jun 26, Amador, do you have the relevant > logs? I did all tests again on Jul 1st: 2014-07-01 15:05:00,020 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (org.ovirt.thread.pool-4-thread-11) [178db2f7] START, ConnectStorageServerVDSCommand(HostName = node01.pahim.org, HostId = a92a651b-a364-4e90-af90-8b922d21cee3, storagePoolId = 00000002-0002-0002-0002-000000000357, storageType = ISCSI, connectionList = [{ id: d6d7d1d0-28b6-4a0f-a509-6aa4f59db593, connection: 192.168.25.118, iqn: iqn.2012-07.com.lenovoemc:storage.ix2-73.temp1, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null, iface: eth1 };{ id: 77b43196-ca97-4a96-a071-8b601a8f96c1, connection: 192.168.25.118, iqn: iqn.2012-07.com.lenovoemc:storage.ix2-73.temp1, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null, iface: eth2 };]), log id: 47d0e4a0 What exactely are you looking for? Looked for the reason of the host Non-operational state. Initially, it sounded like it might be a required network issue. Apparently, it's not the case. In(In reply to Elad from comment #9) > Looked for the reason of the host Non-operational state. Initially, it > sounded like it might be a required network issue. Apparently, it's not the > case. Indeed this is not the case. I'm aware about bz#1093393. When one path goes down, the storage cannot be accessed by any device. But configuring sysctl as in comment#2 and running commands bellow, then it works as expected, using the available path to remain connected. # iscsiadm -m iface -I eth1 --op=update -n iface.hwaddress -v $(ifconfig eth1 | head -n 1 | awk '{print $5}') # iscsiadm -m iface -I eth2 --op=update -n iface.hwaddress -v $(ifconfig eth2 | head -n 1 | awk '{print $5}') Daniel, can you take a look please? Reduced priority due to the availability of a workaround. Having said that, we need to close this issue - Daniel, any news? I think this is only a configuration issue. Amador, can u please try to use the workaround using iface.net_ifacename -v eth# instead of iface.hwaddress? I think that what actually solves the problem is what described at https://bugzilla.redhat.com/show_bug.cgi?id=1112861#c2. I'm not sure that iface.hwaddress is relevant for the solution... (In reply to Maor from comment #14) > I think this is only a configuration issue. > Amador, can u please try to use the workaround using iface.net_ifacename -v > eth# instead of iface.hwaddress? Yes, It has the same effect indeed. I noticed in this test sometimes the host goes to "initializing" state before multipath is able to switch from the failed path to the active one. When this happens, RHEV executes a target re-scan, where the failing path is not used: Invalid status on Data Center Default. Setting Data Center status to Non Responsive (On host rhevh01, Error: Network error during communication with the Host.). Host rhevh01 is initializing. Message: Recovering from crash or Initializing So, after the re-initialization, the host is UP and operational. But the failing path was just removed: [root@rhevh01 admin]# multipath -ll 35005907f1cb9a3d9 dm-5 LENOVO,LIFELINE-DISK size=50G features='0' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=1 status=active `- 8:0:0:0 sdb 8:16 active ready running The problem here is when the failing path is active/ready again, it will not be considered by multipath anymore until a new host change to UP state. Notice this is very different from the expected multipath state with a falty path: [root@rhevh01 iscsi]# multipath -ll 35005907f1cb9a3d9 dm-5 LENOVO,LIFELINE-DISK size=50G features='0' hwhandler='0' wp=rw |-+- policy='round-robin 0' prio=1 status=active | `- 33:0:0:0 sda 8:0 active ready running `-+- policy='round-robin 0' prio=0 status=enabled `- 34:0:0:0 sdb 8:16 active faulty running > > I think that what actually solves the problem is what described at > https://bugzilla.redhat.com/show_bug.cgi?id=1112861#c2. I'm not sure that > iface.hwaddress is relevant for the solution... Hi Amador, can you please upload the output of: - iscsiadm -m session -P3 - multipath -ll Update the summary description to indicate the problem more clearly. (In reply to Maor from comment #17) > Update the summary description to indicate the problem more clearly. Besides the Host re-initialization, tunables from comment#2 are required to make multipath work properly. Perhaps we should update this bug to focus on the network configuration described in comment#2, and maybe open another bug on the re-initialized scenario decribed in comment#15 (with all the logs and the output of what Elad commented at comment#16 Allon? (In reply to Maor from comment #19) > Perhaps we should update this bug to focus on the network configuration > described in comment#2, and maybe open another bug on the re-initialized > scenario decribed in comment#15 (with all the logs and the output of what > Elad commented at comment#16 > > Allon? Sounds good to me - please do that. (In reply to Allon Mureinik from comment #20) > (In reply to Maor from comment #19) > > Perhaps we should update this bug to focus on the network configuration > > described in comment#2, and maybe open another bug on the re-initialized > > scenario decribed in comment#15 (with all the logs and the output of what > > Elad commented at comment#16 > > > > Allon? > Sounds good to me - please do that. Changing the summary of the bug to indicate the issue is at the network configuration of the host. Amador, please feel free to modify the summary as you see right. Please also open a separate bug on the re-initialized scenario with the full logs and outputs as described in comment#16 Thanks Meanwhile removing storage from whiteboard, and moving it back to bugs, so that it will be re-assigned to the appropriate team (perhaps Network or Host Deploy) These are the suggested settings. Lior - can someone from your team review this please? (In reply to Amador Pahim from comment #2) > Besides the "iface.hwaddress", another change is needed in worder to get > iSCSI multipath working in this setup: > > Add the following values to "/etc/sysctl.conf": > > net.ipv4.conf.all.arp_ignore = 1 > net.ipv4.conf.all.arp_announce = 2 > net.ipv4.conf.all.rp_filter = 2 > net.netfilter.nf_conntrack_tcp_be_liberal = 1 > > Then execute: > > # sysctl -p > > So, I think vdsm initial configuration or ovirt-host-deploy should handle to > have them properly configured. Created attachment 924928 [details]
Results of "multipath -ll" and "iscsiadm -m session -P3"
(In reply to Elad from comment #16) > Hi Amador, can you please upload the output of: > > - iscsiadm -m session -P3 > - multipath -ll See attachment https://bugzilla.redhat.com/attachment.cgi?id=924928 Setting cleaned needinfo (comment#23) These changes sound like they have potential to wreak some havoc :) Let's call in the experts - Dan? We've been using and testing multipath on top of iSCSI for a very long time, with Linux's ARP defaults. Am I wrong and this has it never worked? Amador, could you point me to a reference explaining why these configurables are required for multipath? Ben, could you assist us here? (In reply to Dan Kenigsberg from comment #28) > We've been using and testing multipath on top of iSCSI for a very long time, > with Linux's ARP defaults. Am I wrong and this has it never worked? Here the use case is different: we have multipath using two physical devices on host, both configured for the same subnet, and two iSCSI connections to the same target IP, each connection using its own host physical device: eth1 (192.168.25.200) --| |-- iSCSI Target (192.168.25.10) eth2 (192.168.25.201) --| > > Amador, could you point me to a reference explaining why these configurables > are required for multipath? Ben, could you assist us here? http://en.community.dell.com/dell-groups/dtcmedia/m/mediagallery/20371245/download.aspx (Appendix C: Configure Your Linux Operating System). multipath isn't doing anything special to check if the device is active. Without looking at the configuration, I can't be sure, but I assume that it's just doing a directIO read. If you fail one path, and then try running this on the other path: # dd if=/dev/<other_path_dev> iflag=direct of=/dev/null bs=1k count=1 does this fail without doing the additional configuration? If so, then multipath is going to fail the path for simple reason is that path is not accepting IO. If multipath is telling you that the path doesn't work when that dd does, then there's a multipath issue. Perhaps it's using the wrong path checker. But if that dd fails, then this issue is below multipath. Ben, may I rephrase my question: I'd like to know if you are aware of the sysconf tweaks suggested on comment 2. Are they indeed suggested for deployments such as the ascii-art of comment 29, and tolerable in every other? I am not aware of these tweaks, but I must admit that the only multipath testing I do on iscsi is faked, with multiple path devices using the same physical port on the harware NIC. QA might know about them. Amador, if both nics are using the same LAN, what is the benefit of using multipath over a simple network bond? The latter would have a single IP address and as such, would require no arp tweaks. (In reply to Dan Kenigsberg from comment #34) > Amador, if both nics are using the same LAN, what is the benefit of using > multipath over a simple network bond? The latter would have a single IP > address and as such, would require no arp tweaks. Please read the the original RFE description: https://bugzilla.redhat.com/show_bug.cgi?id=753541#c0 Notice the document mentioned in item 3 is in fact this one: https://support.equallogic.com/WorkArea/DownloadAsset.aspx?id=8727 In my understanding, Dell recommends multipath over bonding to improve the throughput of a single I/O flow. Okay, so having gone over the three customer tickets: * 01166225 seems both unrelated (as NICs are on different subnets) and resolved (simply by marking networks as non-required). * 01124730 seems like a required network issue as well (assuming "the cluster goes down" means host become non-operational), so probably unrelated. Amador, am I right and can these be removed? This means that at the moment we only have one customer facing this issue, which makes sense because I would expect most iSCSI bonds to have slaves on different subnets for increased robustness. Both Dan and I feel uncomfortable to override OS defaults for these sysctl.conf values on all deployments, just for the sake of a configuration that doesn't seem to be hugely common. Even though they don't seem to be destructive. I would prefer to simply advise the current customer to set these sysctl.conf values themselves, as well as document this somewhere so other users/customers can get along. If we see that more customers are trying to setup their iSCSI bond this way, we can make it a default. Does that sound reasonable? https://support.equallogic.com/WorkArea/DownloadAsset.aspx?id=8727 mentions placing both interfaces on the same subnet; but does it recommends that? why? (In reply to Dan Kenigsberg from comment #37) > https://support.equallogic.com/WorkArea/DownloadAsset.aspx?id=8727 mentions > placing both interfaces on the same subnet; but does it recommends that? why? In this use case, there is only one target IP. nics in same subnet is implicit to the topology, assuming routing the traffic between the nic from different subnet and the target IP is not reasonable. (In reply to Lior Vernia from comment #36) > Okay, so having gone over the three customer tickets: > > * 01166225 seems both unrelated (as NICs are on different subnets) and > resolved (simply by marking networks as non-required). Ok, we can remove this one. > * 01124730 seems like a required network issue as well (assuming "the > cluster goes down" means host become non-operational), so probably unrelated. Actually this one faced 2 issues: the "required network" issue and the "nics in same subnet" issue. So, keeping this one. > > Amador, am I right and can these be removed? > > This means that at the moment we only have one customer facing this issue, > which makes sense because I would expect most iSCSI bonds to have slaves on > different subnets for increased robustness. I'm aware about two more cases on trying this same use case. Attaching. Notice the original request (BZ#753541) has 16 cases attached to it. > > Both Dan and I feel uncomfortable to override OS defaults for these > sysctl.conf values on all deployments, just for the sake of a configuration > that doesn't seem to be hugely common. Even though they don't seem to be > destructive. > > I would prefer to simply advise the current customer to set these > sysctl.conf values themselves, as well as document this somewhere so other > users/customers can get along. If we see that more customers are trying to > setup their iSCSI bond this way, we can make it a default. Does that sound > reasonable? Sounds reasonable to have the sysctl changes documented and pointed as the solutio. But please notice sysctl tunables are not the only change needed. Also, when adding iscsi connection, we have to specify the "iface.hwaddress" (or "iface.net_ifacename") that will be used to connect to the target. And making this manual does not sounds reasonable. As I now understand not all the steps required by https://access.redhat.com/solutions/545553 where incorporated in bug 753541 as they should have. We still miss passing the network-level interface name/hwaddr in the iscsiadm command line. This is required for any implementation of iscsi bond, over the a single subnet or multiple ones. For a single subnet usecase, which was the main drive of this feature, we also need sysctl tweaks. Amador can add them to a /etc/sysctl.d/vdsm.conf. He and the network team would research and test their side effects. Moving to Post, since the VDSM part was not yet merged It seems that all the patches in the tracker were merged, can this move to MODIFIED? There is the latest one still in review. Adding to the tracker. I guess it needs to be added to the tracker in BZ 1178838? (In reply to Tal Nisan from comment #46) > I guess it needs to be added to the tracker in BZ 1178838? Done. Maor, please add steps to reproduce and test so the QA can verify? Reproduce steps: 1. Have a Host with 2 Network Devices both configured for the same subnet 2. Add an iSCSI storage with 2 targets in the same Storage Server. 3. Initialize a Data Center with the Host, an iSCSI Storage and two network interfaces configured 4. Configure an iSCSI bond in the Data Center using those 2 network interfaces 5. Confirmation: Check under /var/lib/iscsi/ that you have two network interfaces which connected to the nodes of the iSCSI 6. Use iptables to block eth2 from the Storage Server Result: Host become Non Operational Expected Result: Host should still be Active Amador, can you please go over the reproduce steps, is there any thing that is missing or some step that should be rephrased/change? (In reply to Maor from comment #49) > Reproduce steps: > 1. Have a Host with 2 Network Devices both configured for the same subnet > 2. Add an iSCSI storage with 2 targets in the same Storage Server. > 3. Initialize a Data Center with the Host, an iSCSI Storage and two network > interfaces configured > 4. Configure an iSCSI bond in the Data Center using those 2 network > interfaces > 5. Confirmation: Check under /var/lib/iscsi/ that you have two network > interfaces which connected to the nodes of the iSCSI > 6. Use iptables to block eth2 from the Storage Server Also, unblock eth2, wait the path to recover (multipath -ll) and block eth1. If host is still active, then we are good. > > Result: > Host become Non Operational > > Expected Result: > Host should still be Active > > Amador, can you please go over the reproduce steps, is there any thing that > is missing or some step that should be rephrased/change? Maor, is there anything to document here? (In reply to Allon Mureinik from comment #53) > Maor, is there anything to document here? I've documented what I think might be necessary, please feel free to change/fix it sorry currently have encountered several network bugs side by side to storage bugs, this flow just doesn't work on ovirt3.6, will open or find those bugs and add them as blockers in order to configure a dedicated iscsi multhipathing through local nick, device's network masking needs to be 255.255.255.0 when attemting to configure the network, the operation is not executed, or executed and returns to previous state. (see bug #1184497) #####Paths to all 4 iSCSI targets are operational from the 2 non-VM networks attached to the iSCSI bond. /var/log/messages: ##### Nov 10 13:13:35 puma25 iscsid: Connection17:0 to [target: iqn.1992-04.com.emc:cx.ckm00121000438.b6, portal: 10.35.160.106,3260] through [iface: enp4s0f1] is operational now Nov 10 13:13:36 puma25 iscsid: Connection18:0 to [target: iqn.1992-04.com.emc:cx.ckm00121000438.a7, portal: 10.35.160.105,3260] through [iface: enp4s0f1] is operational now Nov 10 13:13:36 puma25 iscsid: Connection19:0 to [target: iqn.1992-04.com.emc:cx.ckm00121000438.a6, portal: 10.35.160.104,3260] through [iface: enp4s0f1] is operational now Nov 10 13:13:37 puma25 iscsid: Connection20:0 to [target: iqn.1992-04.com.emc:cx.ckm00121000438.b7, portal: 10.35.160.107,3260] through [iface: enp4s0f1] is operational now Nov 10 13:13:38 puma25 iscsid: Connection21:0 to [target: iqn.1992-04.com.emc:cx.ckm00121000438.b6, portal: 10.35.160.106,3260] through [iface: enp5s0f0] is operational now Nov 10 13:13:39 puma25 iscsid: Connection22:0 to [target: iqn.1992-04.com.emc:cx.ckm00121000438.a7, portal: 10.35.160.105,3260] through [iface: enp5s0f0] is operational now Nov 10 13:13:41 puma25 iscsid: Connection23:0 to [target: iqn.1992-04.com.emc:cx.ckm00121000438.a6, portal: 10.35.160.104,3260] through [iface: enp5s0f0] is operational now Nov 10 13:13:43 puma25 iscsid: Connection24:0 to [target: iqn.1992-04.com.emc:cx.ckm00121000438.b7, portal: 10.35.160.107,3260] through [iface: enp5s0f0] is operational now [root@puma25 vdsm]# multipath -ll |grep -A 13 360060160f4a0300033a3d219ce83e511 360060160f4a0300033a3d219ce83e511 dm-4 DGC ,VRAID size=50G features='1 retain_attached_hw_handler' hwhandler='1 alua' wp=rw |-+- policy='service-time 0' prio=50 status=active | |- 23:0:0:5 sdo 8:224 active ready running | |- 24:0:0:5 sdw 65:96 active ready running | |- 27:0:0:5 sdau 66:224 active ready running | `- 28:0:0:5 sdbc 67:96 active ready running `-+- policy='service-time 0' prio=10 status=enabled |- 22:0:0:5 sdg 8:96 active ready running |- 25:0:0:5 sdae 65:224 active ready running |- 26:0:0:5 sdam 66:96 active ready running `- 29:0:0:5 sdbk 67:224 active ready running [root@puma25 vdsm]# iscsiadm -m session -R Rescanning session [sid: 17, target: iqn.1992-04.com.emc:cx.ckm00121000438.b6, portal: 10.35.160.106,3260] Rescanning session [sid: 18, target: iqn.1992-04.com.emc:cx.ckm00121000438.a7, portal: 10.35.160.105,3260] Rescanning session [sid: 19, target: iqn.1992-04.com.emc:cx.ckm00121000438.a6, portal: 10.35.160.104,3260] Rescanning session [sid: 20, target: iqn.1992-04.com.emc:cx.ckm00121000438.b7, portal: 10.35.160.107,3260] Rescanning session [sid: 21, target: iqn.1992-04.com.emc:cx.ckm00121000438.b6, portal: 10.35.160.106,3260] Rescanning session [sid: 22, target: iqn.1992-04.com.emc:cx.ckm00121000438.a7, portal: 10.35.160.105,3260] Rescanning session [sid: 23, target: iqn.1992-04.com.emc:cx.ckm00121000438.a6, portal: 10.35.160.104,3260] Rescanning session [sid: 24, target: iqn.1992-04.com.emc:cx.ckm00121000438.b7, portal: 10.35.160.107,3260] [root@puma25 vdsm]# tree /var/lib/iscsi/ /var/lib/iscsi/ |-- ifaces | |-- enp4s0f1 | `-- enp5s0f0 |-- isns |-- nodes | |-- iqn.1992-04.com.emc:cx.ckm00121000438.a6 | | `-- 10.35.160.104,3260,1 | | |-- default | | |-- enp4s0f1 | | `-- enp5s0f0 | |-- iqn.1992-04.com.emc:cx.ckm00121000438.a7 | | |-- 10.35.160.105,3260,1 | | | `-- default | | `-- 10.35.160.105,3260,2 | | |-- default | | |-- enp4s0f1 | | `-- enp5s0f0 | |-- iqn.1992-04.com.emc:cx.ckm00121000438.b6 | | |-- 10.35.160.106,3260,1 | | | `-- default | | `-- 10.35.160.106,3260,3 | | |-- default | | |-- enp4s0f1 | | `-- enp5s0f0 | `-- iqn.1992-04.com.emc:cx.ckm00121000438.b7 | |-- 10.35.160.107,3260,1 | | `-- default | `-- 10.35.160.107,3260,4 | |-- default | |-- enp4s0f1 | `-- enp5s0f0 |-- send_targets |-- slp `-- static |-- iqn.1992-04.com.emc:cx.ckm00121000438.a6,10.35.160.104,3260,1,default -> /var/lib/iscsi/nodes/iqn.1992-04.com.emc:cx.ckm00121000438.a6/10.35.160.104,3260,1 |-- iqn.1992-04.com.emc:cx.ckm00121000438.a6,10.35.160.104,3260,1,enp4s0f1 -> /var/lib/iscsi/nodes/iqn.1992-04.com.emc:cx.ckm00121000438.a6/10.35.160.104,3260,1 |-- iqn.1992-04.com.emc:cx.ckm00121000438.a6,10.35.160.104,3260,1,enp5s0f0 -> /var/lib/iscsi/nodes/iqn.1992-04.com.emc:cx.ckm00121000438.a6/10.35.160.104,3260,1 |-- iqn.1992-04.com.emc:cx.ckm00121000438.a7,10.35.160.105,3260,1,default -> /var/lib/iscsi/nodes/iqn.1992-04.com.emc:cx.ckm00121000438.a7/10.35.160.105,3260,1 |-- iqn.1992-04.com.emc:cx.ckm00121000438.a7,10.35.160.105,3260,2,default -> /var/lib/iscsi/nodes/iqn.1992-04.com.emc:cx.ckm00121000438.a7/10.35.160.105,3260,2 |-- iqn.1992-04.com.emc:cx.ckm00121000438.a7,10.35.160.105,3260,2,enp4s0f1 -> /var/lib/iscsi/nodes/iqn.1992-04.com.emc:cx.ckm00121000438.a7/10.35.160.105,3260,2 |-- iqn.1992-04.com.emc:cx.ckm00121000438.a7,10.35.160.105,3260,2,enp5s0f0 -> /var/lib/iscsi/nodes/iqn.1992-04.com.emc:cx.ckm00121000438.a7/10.35.160.105,3260,2 |-- iqn.1992-04.com.emc:cx.ckm00121000438.b6,10.35.160.106,3260,1,default -> /var/lib/iscsi/nodes/iqn.1992-04.com.emc:cx.ckm00121000438.b6/10.35.160.106,3260,1 |-- iqn.1992-04.com.emc:cx.ckm00121000438.b6,10.35.160.106,3260,3,default -> /var/lib/iscsi/nodes/iqn.1992-04.com.emc:cx.ckm00121000438.b6/10.35.160.106,3260,3 |-- iqn.1992-04.com.emc:cx.ckm00121000438.b6,10.35.160.106,3260,3,enp4s0f1 -> /var/lib/iscsi/nodes/iqn.1992-04.com.emc:cx.ckm00121000438.b6/10.35.160.106,3260,3 |-- iqn.1992-04.com.emc:cx.ckm00121000438.b6,10.35.160.106,3260,3,enp5s0f0 -> /var/lib/iscsi/nodes/iqn.1992-04.com.emc:cx.ckm00121000438.b6/10.35.160.106,3260,3 |-- iqn.1992-04.com.emc:cx.ckm00121000438.b7,10.35.160.107,3260,1,default -> /var/lib/iscsi/nodes/iqn.1992-04.com.emc:cx.ckm00121000438.b7/10.35.160.107,3260,1 |-- iqn.1992-04.com.emc:cx.ckm00121000438.b7,10.35.160.107,3260,4,default -> /var/lib/iscsi/nodes/iqn.1992-04.com.emc:cx.ckm00121000438.b7/10.35.160.107,3260,4 |-- iqn.1992-04.com.emc:cx.ckm00121000438.b7,10.35.160.107,3260,4,enp4s0f1 -> /var/lib/iscsi/nodes/iqn.1992-04.com.emc:cx.ckm00121000438.b7/10.35.160.107,3260,4 `-- iqn.1992-04.com.emc:cx.ckm00121000438.b7,10.35.160.107,3260,4,enp5s0f0 -> /var/lib/iscsi/nodes/iqn.1992-04.com.emc:cx.ckm00121000438.b7/10.35.160.107,3260,4 ####Failing enp5s0f0: ##### 2: enp5s0f0: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN qlen 1000 link/ether 44:1e:a1:73:3c:a2 brd ff:ff:ff:ff:ff:ff 5: enp4s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 link/ether 00:9c:02:b0:9f:b4 brd ff:ff:ff:ff:ff:ff inet 10.35.160.3/24 brd 10.35.160.255 scope global enp4s0f1 valid_lft forever preferred_lft forever inet6 fe80::29c:2ff:feb0:9fb4/64 scope link valid_lft forever preferred_lft forever #####4 of the 8 paths reported as faulty and then disappear: ##### [root@puma25 vdsm]# multipath -ll |grep -A 13 360060160f4a0300033a3d219ce83e511 360060160f4a0300033a3d219ce83e511 dm-4 DGC ,VRAID size=50G features='1 retain_attached_hw_handler' hwhandler='1 alua' wp=rw |-+- policy='service-time 0' prio=50 status=active | |- 23:0:0:5 sdo 8:224 active ready running | |- 24:0:0:5 sdw 65:96 active ready running | |- 27:0:0:5 sdau 66:224 failed faulty running | `- 28:0:0:5 sdbc 67:96 failed faulty running `-+- policy='service-time 0' prio=10 status=enabled |- 22:0:0:5 sdg 8:96 active ready running |- 25:0:0:5 sdae 65:224 active ready running |- 26:0:0:5 sdam 66:96 active faulty running `- 29:0:0:5 sdbk 67:224 failed faulty running [root@puma25 vdsm]# multipath -ll |grep -A 13 360060160f4a0300033a3d219ce83e511 360060160f4a0300033a3d219ce83e511 dm-4 DGC ,VRAID size=50G features='1 retain_attached_hw_handler' hwhandler='1 alua' wp=rw |-+- policy='service-time 0' prio=50 status=active | |- 23:0:0:5 sdo 8:224 active ready running | `- 24:0:0:5 sdw 65:96 active ready running `-+- policy='service-time 0' prio=10 status=enabled |- 22:0:0:5 sdg 8:96 active ready running `- 25:0:0:5 sdae 65:224 active ready running #####Results: ##### Host remains up, it doesn't change its state to non-operational. Verified using: vdsm-4.17.10.1-0.el7ev.noarch device-mapper-multipath-0.4.9-85.el7.x86_64 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0362.html |