Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1280409

Summary:	Controller node does not fully detach multipath device and the device can not be removed by manual means
Product:	Red Hat OpenStack	Reporter:	Gorka Eguileor <geguileo>
Component:	openstack-cinder	Assignee:	Gorka Eguileor <geguileo>
Status:	CLOSED WONTFIX	QA Contact:	Avi Avraham <aavraham>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	7.0 (Kilo)	CC:	acanan, cschwede, dmaley, dsulliva, egafford, eharney, fpercoco, geguileo, johfulto, nlevinki, pgrist, scohen, sgotliv, srevivo
Target Milestone:	async	Keywords:	Triaged, ZStream
Target Release:	7.0 (Kilo)
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Cause: When systems are under heavy load multipath detection may take longer than expected. Also detaching was not done on device validation failure. Consequence: Cinder doesn't detect in time the existence of the multipath and works in single path mode even when underlying OS ends up detecting the multipath. So when we disconnect the device we do it as if it were a single path device and not a multipath, so we end up with leftovers in the system. On multipath when we have properly detected the paths but they are all in a failed state when we try to read from the device on validation we will not disconnect the device. Fix: When Cinder is configured for multipath we'll do several retries to detect multipaths on connect and we'll also check on disconnect if what initially was a single path is now a multipath. This patch tries to detach the volume if we fail when validating the device after we have attached the volume. Result: We no longer have leftovers or leave devices attached on validation failure.	Story Points:	---
Clone Of:	1255523	Environment:
Last Closed:	2018-08-17 09:06:27 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1255523
Bug Blocks:	1278590, 1280401

Description Gorka Eguileor 2015-11-11 16:54:28 UTC

+++ This bug was initially created as a clone of Bug #1255523 +++

Description of problem:

Load testing of the following commands eventually finds a situation where the controller node shows a faulting mpio device that also shows as a backed mpio device for a launched instance on the compute node.

lsof | grep dm-12 #for the faulting mpio path

showed blkid and kpartx

low levels attempts were made to kill these off first with kill -15 (workded  for blkid) but kill -9 needed on kpartx

were still not able to delete the faulty path

This is using rhos5 on rhel6 using hp 3par backend.




Version-Release number of selected component (if applicable):




How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:


Proper (online) cleanup procedure for failed path (multipath -F is intrusive!!!):

======

MPATHDEV="/dev/dm-58"
multipath -ll $MPATHDEV

for i in $( multipath -ll $MPATHDEV | awk '/ failed / { print $3 }' 
do 
  echo "Removing: $i"; echo 1 > /sys/block/${i}/device/delete
done

multipath -ll $MPATHDEV
multipath -f $MPATHDEV

=====

This seems like a different issue but worth noting here.

multipath errors being parsed as device names.
https://bugzilla.redhat.com/show_bug.cgi?id=1235786

Comment 1 Mike McCune 2016-03-28 22:38:41 UTC

This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions