Bug 1017716 - cinder-volume service does not start if it cannot mount gluster
cinder-volume service does not start if it cannot mount gluster
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-cinder (Show other bugs)
4.0
x86_64 Linux
unspecified Severity medium
: rc
: 4.0
Assigned To: Eric Harney
Haim
storage
:
Depends On:
Blocks: 1043547
  Show dependency treegraph
 
Reported: 2013-10-10 07:30 EDT by Dafna Ron
Modified: 2016-04-26 09:50 EDT (History)
7 users (show)

See Also:
Fixed In Version: openstack-cinder-2013.2-2.el6ost
Doc Type: Bug Fix
Doc Text:
Previously, a failure in the Block Storage volume driver initialization process resulted in 'cinder-volume' service failure at startup. Consequently, the 'cinder-volume' service was inaccessible, and a failure in one volume driver resulted in other volume drivers being unavailable, in a multiple-backend scenario. With this update, Block Storage now marks an uninitialized backend and disables requests to it. Volume driver initialization failures now only affect the driver, and not the entire 'cinder-volume' service.
Story Points: ---
Clone Of:
: 1043547 (view as bug list)
Environment:
Last Closed: 2013-12-19 19:27:27 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Launchpad 1237948 None None None Never

  None (edit)
Description Dafna Ron 2013-10-10 07:30:05 EDT
Description of problem:

if you configure cinder to work with gluster but the share cannot be accessed (so mount will fail) cinder-volume service is not started. 
I am not sure that this is the correct behaviour since if the storage is back we would need to manually restart the service instead of letting it all work out once the gluster is fixed. 

opening this for discussion on the correct behaviour. 
 
Version-Release number of selected component (if applicable):

[root@cougar06 ~(keystone_admin)]# rpm -qa |grep gluster 
glusterfs-api-3.4.0.33rhs-1.el6rhs.x86_64
glusterfs-fuse-3.4.0.33rhs-1.el6rhs.x86_64
glusterfs-libs-3.4.0.33rhs-1.el6rhs.x86_64
glusterfs-3.4.0.33rhs-1.el6rhs.x86_64
[root@cougar06 ~(keystone_admin)]# rpm -qa |grep cinder 
python-cinder-2013.2-0.9.b3.el6ost.noarch
python-cinderclient-1.0.5-1.el6ost.noarch
openstack-cinder-2013.2-0.9.b3.el6ost.noarch


How reproducible:

100%

Steps to Reproduce:
1. configure cinder to use gluster as backup
2. stop the volume on gluster 
3. restart cinder-voumes

Actual results:

we fail to mount gluster and cinder-volumes cannot start

Expected results:

we should allow cinder-volumes to start and try to mount every X time 

Additional info:
Comment 1 Eric Harney 2013-10-10 10:25:23 EDT
I believe the behavior here may have changed in Havana RC1 due to this change:
https://review.openstack.org/#/c/46843/

IIUC, this will cause the service to stay up but not allow volume driver operations when this occurs.

Can you retry this with the RC1 packages and see what result you get?
Comment 5 Dafna Ron 2013-12-12 07:39:08 EST
[root@cougar06 ~(keystone_admin)]# /etc/init.d/openstack-cinder-volume restart
Stopping openstack-cinder-volume:                          [  OK  ]
Starting openstack-cinder-volume:                          [  OK  ]
[root@cougar06 ~(keystone_admin)]# less /var/log/cinder/volume.log 
[root@cougar06 ~(keystone_admin)]# /etc/init.d/openstack-cinder-volume status
openstack-cinder-volume dead but pid file exists
[root@cougar06 ~(keystone_admin)]# 


not verified, service still fails to start:

2013-12-12 14:37:39.892 9829 ERROR cinder.service [req-267d5916-21e2-4d89-b226-d56ee214988b None None] Unhandled exception
2013-12-12 14:37:39.892 9829 TRACE cinder.service Traceback (most recent call last):
2013-12-12 14:37:39.892 9829 TRACE cinder.service   File "/usr/lib/python2.6/site-packages/cinder/service.py", line 228, in _start_child
2013-12-12 14:37:39.892 9829 TRACE cinder.service     self._child_process(wrap.server)
2013-12-12 14:37:39.892 9829 TRACE cinder.service   File "/usr/lib/python2.6/site-packages/cinder/service.py", line 205, in _child_process
2013-12-12 14:37:39.892 9829 TRACE cinder.service     launcher.run_server(server)
2013-12-12 14:37:39.892 9829 TRACE cinder.service   File "/usr/lib/python2.6/site-packages/cinder/service.py", line 96, in run_server
2013-12-12 14:37:39.892 9829 TRACE cinder.service     server.start()
2013-12-12 14:37:39.892 9829 TRACE cinder.service   File "/usr/lib/python2.6/site-packages/cinder/service.py", line 385, in start
2013-12-12 14:37:39.892 9829 TRACE cinder.service     self.manager.init_host()
2013-12-12 14:37:39.892 9829 TRACE cinder.service   File "/usr/lib/python2.6/site-packages/cinder/volume/manager.py", line 209, in init_host
2013-12-12 14:37:39.892 9829 TRACE cinder.service     self.driver.ensure_export(ctxt, volume)
2013-12-12 14:37:39.892 9829 TRACE cinder.service   File "/usr/lib/python2.6/site-packages/cinder/volume/drivers/glusterfs.py", line 839, in ensure_export
2013-12-12 14:37:39.892 9829 TRACE cinder.service     self._ensure_share_mounted(volume['provider_location'])
2013-12-12 14:37:39.892 9829 TRACE cinder.service   File "/usr/lib/python2.6/site-packages/cinder/volume/drivers/glusterfs.py", line 1016, in _ensure_share_mounted
2013-12-12 14:37:39.892 9829 TRACE cinder.service     self._mount_glusterfs(glusterfs_share, mount_path, ensure=True)
2013-12-12 14:37:39.892 9829 TRACE cinder.service   File "/usr/lib/python2.6/site-packages/cinder/volume/drivers/glusterfs.py", line 1099, in _mount_glusterfs
2013-12-12 14:37:39.892 9829 TRACE cinder.service     self._execute('mkdir', '-p', mount_path)
2013-12-12 14:37:39.892 9829 TRACE cinder.service   File "/usr/lib/python2.6/site-packages/cinder/utils.py", line 143, in execute
2013-12-12 14:37:39.892 9829 TRACE cinder.service     return processutils.execute(*cmd, **kwargs)
2013-12-12 14:37:39.892 9829 TRACE cinder.service   File "/usr/lib/python2.6/site-packages/cinder/openstack/common/processutils.py", line 173, in execute
2013-12-12 14:37:39.892 9829 TRACE cinder.service     cmd=' '.join(cmd))
2013-12-12 14:37:39.892 9829 TRACE cinder.service ProcessExecutionError: Unexpected error while running command.
2013-12-12 14:37:39.892 9829 TRACE cinder.service Command: mkdir -p /var/lib/cinder/mnt/249458a2755cd0a9f302b9d81eb3f35d
2013-12-12 14:37:39.892 9829 TRACE cinder.service Exit code: 1
2013-12-12 14:37:39.892 9829 TRACE cinder.service Stdout: ''
2013-12-12 14:37:39.892 9829 TRACE cinder.service Stderr: "mkdir: cannot create directory `/var/lib/cinder/mnt/249458a2755cd0a9f302b9d81eb3f35d': File exists\n"
2013-12-12 14:37:39.892 9829 TRACE cinder.service 
(END)
Comment 6 Eric Harney 2013-12-12 10:05:28 EST
IIRC, the only time mkdir -p can fail like this is if the directory exists but the mount has broken due to a Gluster client / fuse issue.  Is this a scenario where the Gluster server was unavailable or similar?
Comment 7 Dafna Ron 2013-12-12 10:10:59 EST
the server was up and so is the service - I just stopped the volume 

Steps to Reproduce:
1. configure cinder to use gluster as backup
2. stop the volume on gluster 
3. restart cinder-volumes

[root@vm-161-158 ~]# gluster volume stop Dafna_cougars1
Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y
volume stop: Dafna_cougars1: success
[root@vm-161-158 ~]# gluster volume status Dafna_cougars1
Volume Dafna_cougars1 is not started
[root@vm-161-158 ~]# /etc/init.d/glusterd status
glusterd (pid 1767) is running...
[root@vm-161-158 ~]# 

[root@cougar06 ~(keystone_admin)]# /etc/init.d/openstack-cinder-volume restart
Stopping openstack-cinder-volume:                          [  OK  ]
Starting openstack-cinder-volume:                          [  OK  ]
[root@cougar06 ~(keystone_admin)]# /etc/init.d/openstack-cinder-volume status
openstack-cinder-volume dead but pid file exists
[root@cougar06 ~(keystone_admin)]#
Comment 8 Eric Harney 2013-12-12 10:19:52 EST
The failure occurred before it even attempted the mount though.  (At mkdir.)

This means the failure is related to whatever the state was before that run.
Comment 9 Dafna Ron 2013-12-12 10:21:50 EST
what do you mean by whatever the state was before that run?
Comment 10 Eric Harney 2013-12-12 10:28:36 EST
If the /var/lib/cinder/mnt/<id> directory is in a "broken" state, i.e. fuse mounted but no longer functional, this failure will occur -- mkdir -p doesn't interpret it as an existing directory (probably because stat fails, or similar), and so tries to create it.  Creation fails because the directory already exists with that name.

If you want to simulate this, kill the glusterfs pid that is running for that mount point.  Restarting the cinder volume service will then do this.

It looks like this on the file system:
# pwd
/var/lib/cinder/mnt
# stat 5ad2a11c8e453f67725211d01aad7692 
stat: cannot stat `5ad2a11c8e453f67725211d01aad7692': Transport endpoint is not connected
# ls 5ad2a11c8e453f67725211d01aad7692 
ls: cannot access 5ad2a11c8e453f67725211d01aad7692: Transport endpoint is not connected



Anyway I think the bug here is that ProcessExecutionError exceptions aren't being translated into an exception type that the manager catches (VolumeBackend... or similar), which is why the service stops.  The original bug here as described is fixed I think, this is a different failure scenario.
Comment 11 Dafna Ron 2013-12-12 10:42:43 EST
but if you said that the failure happens before it even attempts to mount than how would you know if this is fixed or not?

here is the point:
1. we follow the steps and service does not come up - > bug cannot be verified by qe 
2. even if this is a completely different issue, service still fails to start which means that if there is something wrong with a target, the service will fail to start. 
3. even if the original issue was fixed, the current issue will block us from actually testing it... 

I think that if the exact steps are run and exact result is still there, than the bug is not fixed and cannot be verified by QE...
Comment 18 errata-xmlrpc 2013-12-19 19:27:27 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2013-1859.html

Note You need to log in before you can comment on or make changes to this bug.