Bug 1027790 - [RHS-RHOS] Nova volume-detach hangs during add-brick/rebalance
Summary: [RHS-RHOS] Nova volume-detach hangs during add-brick/rebalance
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterfs
Version: 2.1
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Bug Updates Notification Mailing List
QA Contact: Sudhir D
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-11-07 12:11 UTC by shilpa
Modified: 2014-03-27 10:08 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
virt rhos cinder integration
Last Closed: 2014-03-27 10:08:08 UTC
Embargoed:


Attachments (Terms of Use)

Description shilpa 2013-11-07 12:11:46 UTC
Description of problem:
While doing a add-brick/rebalance, detaching a cinder volume attached to a running instance hangs. 

Version-Release number of selected component (if applicable):
glusterfs-3.4.0.35.1u2rhs-1.el6rhs.x86_64
RHOS 4.0 openstack-cinder-2013.2-1.el6.noarch on RHEL 6.5 beta

How reproducible:
Tried once

Steps to Reproduce:
1. Create 2 6x2 Distributed-Replicate volumes for cinder and glance

2. Tag the volumes with group virt
   (i.e) gluster volume set cinder-vol group virt
         gluster volume set glance-vol group virt

3. Set the storage.owner-uid and storage.owner-gid of glance-vol to 161
         gluster volume set glance-vol storage.owner-uid 161
         gluster volume set glance-vol storage.owner-gid 161

4. Set the storage.owner-uid and storage.owner-gid of cinder-vol to 165
         gluster volume set cinder-vol storage.owner-uid 165
         gluster volume set cinder-vol storage.owner-gid 165

5. Volume info

Volume Name: cinder-vol
Type: Distributed-Replicate
Volume ID: 8b20ce62-3606-4c52-b36e-567f97ebff7f
Status: Started
Number of Bricks: 6 x 2 = 12
Transport-type: tcp
Bricks:
Brick1: 10.70.37.168:/rhs/brick2/c1
Brick2: 10.70.37.214:/rhs/brick2/c2
Brick3: 10.70.37.181:/rhs/brick2/c3
Brick4: 10.70.37.164:/rhs/brick2/c4
Brick5: 10.70.37.168:/rhs/brick2/c5
Brick6: 10.70.37.214:/rhs/brick2/c6
Brick7: 10.70.37.181:/rhs/brick2/c7
Brick8: 10.70.37.164:/rhs/brick2/c8
Brick9: 10.70.37.181:/rhs/brick2/c11
Brick10: 10.70.37.164:/rhs/brick2/c12
Brick11: 10.70.37.168:/rhs/brick2/c9
Brick12: 10.70.37.214:/rhs/brick2/c10
Options Reconfigured:
server.allow-insecure: on
storage.owner-uid: 165
storage.owner-gid: 165
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off

Configure cinder to use glusterfs volume

  a. 
      # openstack-config --set /etc/cinder/cinder.conf DEFAULT volume_driver cinder.volume.drivers.glusterfs.GlusterfsDriver
      # openstack-config --set /etc/cinder/cinder.conf DEFAULT glusterfs_shares_config /etc/cinder/shares.conf
      # openstack-config --set /etc/cinder/cinder.conf DEFAULT glusterfs_mount_point_base /var/lib/cinder/volumes
  
  b. # cat /etc/cinder/shares.conf
     10.70.37.168:cinder-vol

  c. for i in api scheduler volume; do sudo service openstack-cinder-${i} restart; done

7. Fuse mount the RHS glance volume on /mnt/gluster/glance/images

8.Upload an image and boot a VM instance. 

nova list;

# nova list
+--------------------------------------+------------+--------+------------+-------------+---------------------+
| ID                                   | Name       | Status | Task State | Power State | Networks            |
+--------------------------------------+------------+--------+------------+-------------+---------------------+
| f98890ba-f257-49cb-8471-08a523c263d5 | instance-2 | ACTIVE | None       | Running     | public=172.24.4.227 |

9. Create a cinder volume and attach it to the instance:

# cinder list
+--------------------------------------+-----------+--------------+------+-------------+----------+--------------------------------------+
|                  ID                  |   Status  | Display Name | Size | Volume Type | Bootable |             Attached to              |
+--------------------------------------+-----------+--------------+------+-------------+----------+--------------------------------------+
| 134de221-4d62-4a6a-816b-c5d022228b87 |   in-use  |     vol3     |  5   |     None    |  false   | f98890ba-f257-49cb-8471-08a523c263d5 |


10. Locate the volume 134de221-4d62-4a6a-816b-c5d022228b87 on the RHS nodes. 
Now do a add-brick followed by a rebalance. During the migration of 134de221-4d62-4a6a-816b-c5d022228b87, detach the volume from the instance by running:

#nova volume-detach f98890ba-f257-49cb-8471-08a523c263d5 134de221-4d62-4a6a-816b-c5d022228b87.

11. Check #cinder list to see if the volume has been successfully detached. (The volume status should be available)


Actual results:

The volume is stuck at "detaching" status:

# cinder list
+--------------------------------------+-----------+--------------+------+-------------+----------+--------------------------------------+
|                  ID                  |   Status  | Display Name | Size | Volume Type | Bootable |             Attached to              |
+--------------------------------------+-----------+--------------+------+-------------+----------+--------------------------------------+                                |
| 134de221-4d62-4a6a-816b-c5d022228b87 | detaching |     vol3     |  5   |     None    |  false   | f98890ba-f257-49cb-8471-08a523c263d5 |


Expected results:

The volume should have been detached and in "available" state.


Additional info:

Found timeout messages in cinder/api.log. Not sure if relevant. Will be attaching sosreports shortly.

2013-11-07 16:24:24.364 8794 TRACE cinder.api.middleware.fault Timeout: Timeout while waiting on RPC response - topic: "cinder-volume:rhs-client8.lab.eng.blr.redhat.com@GLUSTER", RPC method: "terminate_connection" info: "<unknown>"
2013-11-07 16:24:24.364 8794 TRACE cinder.api.middleware.fault 
2013-11-07 16:25:25.464 8794 ERROR cinder.api.middleware.fault [req-629ecf10-a0fb-4f11-96fd-047bca552d1b f87800a82e5e4277a36fa273b4db2fa9 6e5577905d77445ea93e363a898d2551] Caught error: Timeout while waiting on RPC response - topic: "cinder-volume:rhs-client8.lab.eng.blr.redhat.com@GLUSTER", RPC method: "terminate_connection" info: "<unknown>"
2013-11-07 16:25:25.464 8794 TRACE cinder.api.middleware.fault Traceback (most recent call last):
2013-11-07 16:25:25.464 8794 TRACE cinder.api.middleware.fault   File "/usr/lib/python2.6/site-packages/cinder/api/middleware/fault.py", line 77, in __call__
2013-11-07 16:25:25.464 8794 TRACE cinder.api.middleware.fault     return req.get_response(self.application)
2013-11-07 16:25:25.464 8794 TRACE cinder.api.middleware.fault   File "/usr/lib/python2.6/site-packages/webob/request.py", line 1296, in send
2013-11-07 16:25:25.464 8794 TRACE cinder.api.middleware.fault     application, catch_exc_info=False)
2013-11-07 16:25:25.464 8794 TRACE cinder.api.middleware.fault   File "/usr/lib/python2.6/site-packages/webob/request.py", line 1260, in call_application
2013-11-07 16:25:25.464 8794 TRACE cinder.api.middleware.fault     app_iter = application(self.environ, start_response)
2013-11-07 16:25:25.464 8794 TRACE cinder.api.middleware.fault   File "/usr/lib/python2.6/site-packages/webob/dec.py", line 144, in __call__
2013-11-07 16:25:25.464 8794 TRACE cinder.api.middleware.fault     return resp(environ, start_response)
2013-11-07 16:25:25.464 8794 TRACE cinder.api.middleware.fault   File "/usr/lib/python2.6/site-packages/keystoneclient/middleware/auth_token.py", line 571, in __call__
2013-11-07 16:25:25.464 8794 TRACE cinder.api.middleware.fault     return self.app(env, start_response)
2013-11-07 16:25:25.464 8794 TRACE cinder.api.middleware.fault   File "/usr/lib/python2.6/site-packages/webob/dec.py", line 144, in __call__
2013-11-07 16:25:25.464 8794 TRACE cinder.api.middleware.fault     return resp(environ, start_response)
2013-11-07 16:25:25.464 8794 TRACE cinder.api.middleware.fault   File "/usr/lib/python2.6/site-packages/webob/dec.py", line 144, in __call__
2013-11-07 16:25:25.464 8794 TRACE cinder.api.middleware.fault     return resp(environ, start_response)
2013-11-07 16:25:25.464 8794 TRACE cinder.api.middleware.fault   File "/usr/lib/python2.6/site-packages/Routes-1.12.3-py2.6.egg/routes/middleware.py", line 131, in __call__

Comment 2 shilpa 2014-03-27 07:46:46 UTC
Unable to reproduce it on the RHOS4.0 with glusterfs-3.4.0.59rhs-1.el6_4.x86_64


Note You need to log in before you can comment on or make changes to this bug.