Bug 1389723

Summary: nova volume detach hangs in detaching status even after detaching command is run. Running on a ceph 2.1 cluster
Product: Red Hat OpenStack Reporter: rakesh <rgowdege>
Component: cephAssignee: Sébastien Han <shan>
Status: CLOSED NOTABUG QA Contact: Warren <wusui>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 9.0 (Mitaka)CC: berrange, dasmith, dcadzow, eglynn, hnallurv, jdillama, jdurgin, kchamart, lhh, nlevine, rgowdege, sbauza, sferdjao, sgordon, srevivo, vromanso
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-01-05 21:29:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description rakesh 2016-10-28 10:18:10 UTC
Description of problem:

The nova volume does not gets detached even after running detach command. the status shows detaching. this is on a ceph cluster. 2.1  

Version-Release number of selected component (if applicable):

1. ceph version 10.2.3-10.el7cp

2. openstack-cinder-8.1.1-1.el7ost.noarch

3. openstack-nova-api-13.1.1-7.el7ost.noarch


logs -
-----------------
2016-10-28 07:30:22.495 1253 WARNING oslo_db.sqlalchemy.engines [req-14335e37-bfb6-4354-95a3-1bd46f45efb4 - - - - -] SQL connection failed. 10 attempts left.
2016-10-28 07:30:32.785 1253 INFO cinder.rpc [req-14335e37-bfb6-4354-95a3-1bd46f45efb4 - - - - -] Automatically selected cinder-scheduler objects version 1.3 as minimum se
rvice version.
2016-10-28 07:30:32.799 1253 INFO cinder.rpc [req-14335e37-bfb6-4354-95a3-1bd46f45efb4 - - - - -] Automatically selected cinder-scheduler RPC version 2.0 as minimum servic
e version.
2016-10-28 07:30:33.125 1253 INFO cinder.volume.manager [req-14335e37-bfb6-4354-95a3-1bd46f45efb4 - - - - -] Determined volume DB was not empty at startup.
2016-10-28 07:30:33.169 1253 INFO cinder.volume.manager [req-14335e37-bfb6-4354-95a3-1bd46f45efb4 - - - - -] Image-volume cache disabled for host magna083@rbd.
2016-10-28 07:30:33.172 1253 INFO oslo_service.service [req-14335e37-bfb6-4354-95a3-1bd46f45efb4 - - - - -] Starting 1 workers
2016-10-28 07:30:33.190 10880 INFO cinder.service [-] Starting cinder-volume node (version 8.1.1)
2016-10-28 07:30:33.192 10880 INFO cinder.volume.manager [req-f70b5ba6-6fcc-4e98-b0d2-0989da8e2577 - - - - -] Starting volume driver RBDDriver (1.2.0)
2016-10-28 07:35:33.308 10880 ERROR cinder.volume.drivers.rbd [req-f70b5ba6-6fcc-4e98-b0d2-0989da8e2577 - - - - -] Error connecting to ceph cluster.
2016-10-28 07:35:33.308 10880 ERROR cinder.volume.drivers.rbd Traceback (most recent call last):
2016-10-28 07:35:33.308 10880 ERROR cinder.volume.drivers.rbd   File "/usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py", line 338, in _connect_to_rados
2016-10-28 07:35:33.308 10880 ERROR cinder.volume.drivers.rbd     client.connect()
2016-10-28 07:35:33.308 10880 ERROR cinder.volume.drivers.rbd   File "rados.pyx", line 785, in rados.Rados.connect (rados.c:8969)
2016-10-28 07:35:33.308 10880 ERROR cinder.volume.drivers.rbd TimedOut: error connecting to the cluster
2016-10-28 07:35:33.308 10880 ERROR cinder.volume.drivers.rbd
2016-10-28 07:40:43.356 10880 ERROR cinder.volume.drivers.rbd [req-f70b5ba6-6fcc-4e98-b0d2-0989da8e2577 - - - - -] Error connecting to ceph cluster.
2016-10-28 07:40:43.356 10880 ERROR cinder.volume.drivers.rbd Traceback (most recent call last):
2016-10-28 07:40:43.356 10880 ERROR cinder.volume.drivers.rbd   File "/usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py", line 338, in _connect_to_rados
2016-10-28 07:40:43.356 10880 ERROR cinder.volume.drivers.rbd     client.connect()
2016-10-28 07:40:43.356 10880 ERROR cinder.volume.drivers.rbd   File "rados.pyx", line 785, in rados.Rados.connect (rados.c:8969)
2016-10-28 07:40:43.356 10880 ERROR cinder.volume.drivers.rbd TimedOut: error connecting to the cluster
2016-10-28 07:40:43.356 10880 ERROR cinder.volume.drivers.rbd

Comment 2 rakesh 2016-11-07 08:42:39 UTC
workaround. 

1. open the dashboard via browser 
2. Change the status of the volume from "in use " to "available" ( Detaches from the server)

Comment 3 Derek 2016-11-16 13:46:34 UTC
Is the detach command issued in Console, Horizon, or Director?  (Trying to determine which document would be impacted.)

Comment 4 Harish NV Rao 2016-11-16 14:37:17 UTC
I checked with Rakesh. He says it's from Console.

Comment 5 Artom Lifshitz 2018-01-05 21:29:21 UTC
Hello,

I'm going to close this as NOTABUG. The trace clearly shows a timeout connecting to the ceph cluster. If by any chance this is still a problem, please reopen the bug and attach sosreports, including from the ceph machine(s).

Cheers!