Bug 1278209

Summary: Multipath ISCSI connections left open after disconnecting volume with libvirt
Product: Red Hat OpenStack Reporter: openstack-dev
Component: openstack-novaAssignee: Lee Yarwood <lyarwood>
Status: CLOSED DUPLICATE QA Contact: nlevinki <nlevinki>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.0 (Kilo)CC: berrange, chorn, dasmith, dmesser, eglynn, kchamart, lyarwood, openstack-dev, patrick.east, rtweed, rzaleski, sbauza, sferdjao, sgordon, simon, srevivo, vromanso
Target Milestone: ---Keywords: ZStream
Target Release: 7.0 (Kilo)   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-01-20 20:46:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1275937    
Bug Blocks: 1290377    

Description openstack-dev 2015-11-04 23:07:45 UTC
Description of problem:

https://bugs.launchpad.net/nova/+bug/1385798

When disconnecting a volume from an instance the ISCSI multipath connection is not always cleaned up correctly. When running the temepest tests we see test failures related to this as the connection is not closed, but then it is requesting to disconnect through the cinder driver which ends up breaking the iscsi connection. The end result being that there are still entries in /dev/disk/by-path for the old ISCSI connections, but they are in an error state and cannot be used.

This issue is fixed in the latest os-brick, and a proposed backport of those fixes is up for review https://review.openstack.org/#/c/229152

Comment 3 openstack-dev 2015-11-12 17:18:42 UTC
Hi, for this fix we would actually want it to be targeted for 7.0 release. Looks like it has been automatically set to 8.0. Would it be possible for someone to update it, please?

Comment 4 Lee Yarwood 2015-11-20 16:45:33 UTC
(In reply to openstack-dev from comment #0)
> Description of problem:
> 
> https://bugs.launchpad.net/nova/+bug/1385798
> 
> When disconnecting a volume from an instance the ISCSI multipath connection
> is not always cleaned up correctly. When running the temepest tests we see
> test failures related to this as the connection is not closed, but then it
> is requesting to disconnect through the cinder driver which ends up breaking
> the iscsi connection. The end result being that there are still entries in
> /dev/disk/by-path for the old ISCSI connections, but they are in an error
> state and cannot be used.
> 
> This issue is fixed in the latest os-brick, and a proposed backport of those
> fixes is up for review https://review.openstack.org/#/c/229152

Thanks for the patch but we really can't take it as one huge (+307, -98) change. I propose we use this BZ as a tracker and create individual bugs for issues not already addressed in 7.0.z (we already have the Multipath error parsing patches for example).

Making a start on this now but if you have this patch broken out into individual patches with tests etc then please let me know!

(In reply to openstack-dev from comment #3)
> Hi, for this fix we would actually want it to be targeted for 7.0 release.
> Looks like it has been automatically set to 8.0. Would it be possible for
> someone to update it, please?

Moved to 7.0.z as this isn't required for 8.0 with the move to os-brick.

Comment 5 Lee Yarwood 2016-06-27 13:44:53 UTC
(In reply to openstack-dev from comment #0)
> Description of problem:
> 
> https://bugs.launchpad.net/nova/+bug/1385798
> 
> When disconnecting a volume from an instance the ISCSI multipath connection
> is not always cleaned up correctly. When running the temepest tests we see
> test failures related to this as the connection is not closed, but then it
> is requesting to disconnect through the cinder driver which ends up breaking
> the iscsi connection. The end result being that there are still entries in
> /dev/disk/by-path for the old ISCSI connections, but they are in an error
> state and cannot be used.
> 
> This issue is fixed in the latest os-brick, and a proposed backport of those
> fixes is up for review https://review.openstack.org/#/c/229152

Can I ask that you provide examples of these failures against the latest OSP 7 builds that we have made available? Several fixes have landed downstream since this bug was created and I want to ensure that there are still issues to be addressed here.

Comment 6 Patrick East 2016-08-02 16:51:53 UTC
(In reply to Lee Yarwood from comment #5)
> (In reply to openstack-dev from comment #0)
> > Description of problem:
> > 
> > https://bugs.launchpad.net/nova/+bug/1385798
> > 
> > When disconnecting a volume from an instance the ISCSI multipath connection
> > is not always cleaned up correctly. When running the temepest tests we see
> > test failures related to this as the connection is not closed, but then it
> > is requesting to disconnect through the cinder driver which ends up breaking
> > the iscsi connection. The end result being that there are still entries in
> > /dev/disk/by-path for the old ISCSI connections, but they are in an error
> > state and cannot be used.
> > 
> > This issue is fixed in the latest os-brick, and a proposed backport of those
> > fixes is up for review https://review.openstack.org/#/c/229152
> 
> Can I ask that you provide examples of these failures against the latest OSP
> 7 builds that we have made available? Several fixes have landed downstream
> since this bug was created and I want to ensure that there are still issues
> to be addressed here.

We are still seeing customers of ours running into this problem, so I'm assuming the issue still exists. Can you link the fixes that landed that would have addressed this?

Comment 7 Lee Yarwood 2016-08-11 09:20:16 UTC
(In reply to Patrick East from comment #6)
> We are still seeing customers of ours running into this problem, so I'm
> assuming the issue still exists. Can you link the fixes that landed that
> would have addressed this?

Hello Patrick, 

I've included a list of changes made to nova/virt/libvirt/volume.py in OSP 7 below [1]. IMHO the recent removal of additional rescans during disconnect_volume for iSCSI volumes will likely help here. Again if you can provide actual examples I'd be more than happy to review logs in order to confirm that this should now be addressed.

Also can I ask that you set a NEEDINFO against me when providing any such examples just to ensure I am aware of your update.

Thanks in advance,

Lee

[1] # git log --pretty=email 2015.1.0..HEAD nova/virt/libvirt/volume.py | egrep '(Subject|Resolves)'                                                            
Subject: [PATCH] Use stashed volume connector in _local_cleanup_bdm_volumes
Resolves: rhbz#1351662

Subject: [PATCH] volume: remove iSCSI rescans during disconnect_volume
Resolves: rhbz #1351169

Subject: [PATCH] Optimize multipath call to identify IQN
Resolves: rhbz#1331256

Subject: [PATCH] libvirt: Reduce iscsiadm use when using multipath
Resolves: rhbz#1334161

Subject: [PATCH] libvirt: Remove devices from the connection_info data dict
Resolves: rhbz #1313624

Subject: [PATCH] volume: Fix the removal of multipath backed iSCSI LUNs
Resolves: rhbz#1298283

Subject: [PATCH] libvirt: Detect and remove multipath devices on disconnect
Resolves rhbz: 1273473

Subject: [PATCH] libvirt: Retry multipathd queries during FC volume connection
Resolves rhbz: 1273473

Subject: [PATCH] libvirt: Parse FCoE sysfs device paths
Resolves rhbz: 1274054

Subject: [PATCH] Multipath commands with error messages in stdout fail to
Resolves: rhbz#1275937

Subject: [PATCH] libvirt: Revert _get_host_devices return value
Resolves: rhbz#1268051

Subject: [PATCH] libvirt: Enhance iSCSI volume multipath support
Resolves: rhbz#1228295

Subject: [PATCH] Handle FC LUN IDs greater 255 correctly on s390x

Comment 8 Patrick East 2016-08-11 16:13:59 UTC
(In reply to Lee Yarwood from comment #7)
> (In reply to Patrick East from comment #6)
> > We are still seeing customers of ours running into this problem, so I'm
> > assuming the issue still exists. Can you link the fixes that landed that
> > would have addressed this?
> 
> Hello Patrick, 
> 
> I've included a list of changes made to nova/virt/libvirt/volume.py in OSP 7
> below [1]. IMHO the recent removal of additional rescans during
> disconnect_volume for iSCSI volumes will likely help here. Again if you can
> provide actual examples I'd be more than happy to review logs in order to
> confirm that this should now be addressed.
> 
> Also can I ask that you set a NEEDINFO against me when providing any such
> examples just to ensure I am aware of your update.
> 
> Thanks in advance,
> 
> Lee
> 
> [1] # git log --pretty=email 2015.1.0..HEAD nova/virt/libvirt/volume.py |
> egrep '(Subject|Resolves)'                                                  
> 
> Subject: [PATCH] Use stashed volume connector in _local_cleanup_bdm_volumes
> Resolves: rhbz#1351662
> 
> Subject: [PATCH] volume: remove iSCSI rescans during disconnect_volume
> Resolves: rhbz #1351169
> 
> Subject: [PATCH] Optimize multipath call to identify IQN
> Resolves: rhbz#1331256
> 
> Subject: [PATCH] libvirt: Reduce iscsiadm use when using multipath
> Resolves: rhbz#1334161
> 
> Subject: [PATCH] libvirt: Remove devices from the connection_info data dict
> Resolves: rhbz #1313624
> 
> Subject: [PATCH] volume: Fix the removal of multipath backed iSCSI LUNs
> Resolves: rhbz#1298283
> 
> Subject: [PATCH] libvirt: Detect and remove multipath devices on disconnect
> Resolves rhbz: 1273473
> 
> Subject: [PATCH] libvirt: Retry multipathd queries during FC volume
> connection
> Resolves rhbz: 1273473
> 
> Subject: [PATCH] libvirt: Parse FCoE sysfs device paths
> Resolves rhbz: 1274054
> 
> Subject: [PATCH] Multipath commands with error messages in stdout fail to
> Resolves: rhbz#1275937
> 
> Subject: [PATCH] libvirt: Revert _get_host_devices return value
> Resolves: rhbz#1268051
> 
> Subject: [PATCH] libvirt: Enhance iSCSI volume multipath support
> Resolves: rhbz#1228295
> 
> Subject: [PATCH] Handle FC LUN IDs greater 255 correctly on s390x

Thanks Lee!

Comment 9 Lee Yarwood 2017-01-20 20:46:03 UTC
Closing this out as a duplicate of RHBZ#1410046 where we recently fixed a long standing issue removing path devices when using backends that only provided single target_{portal,iqn,lun} details via Cinder.

*** This bug has been marked as a duplicate of bug 1410046 ***

Comment 10 awaugama 2017-09-07 19:04:57 UTC
Dup -- QE will decide about automating the original