Bug 1043547 - cinder-volume service does not start if Gluster mount point (client) is hung
Summary: cinder-volume service does not start if Gluster mount point (client) is hung
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-cinder
Version: 4.0
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: rc
: 5.0 (RHEL 7)
Assignee: Flavio Percoco
QA Contact: Dafna Ron
URL:
Whiteboard: storage
Depends On: 1017716
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-12-16 15:23 UTC by Eric Harney
Modified: 2023-09-14 01:55 UTC (History)
8 users (show)

Fixed In Version: openstack-cinder-2014.1-4.el6ost
Doc Type: Bug Fix
Doc Text:
In previous releases, it was possible for a failure in the Block Storage volume driver initialization process to prevent the 'openstack-cinder-volume' service to fail at startup. Whenever this occurred in a multiple back-end environment, the 'openstack-cinder-volume' service would become inaccessible, and a failure in one volume driver could result in other volume drivers becoming unavailable. With this update, the Block Storage service now marks uninitialized back-ends and disables requests to those back-ends. As a result, volume driver initialization failures now only affect the driver and not the entire 'openstack-cinder-volume' service.
Clone Of: 1017716
Environment:
Last Closed: 2014-07-08 15:30:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2014:0852 0 normal SHIPPED_LIVE Red Hat Enterprise Linux OpenStack Platform Enhancement - Block Storage 2014-07-08 19:22:44 UTC

Description Eric Harney 2013-12-16 15:23:28 UTC
+++ This bug was initially created as a clone of Bug #1017716 +++

[root@cougar06 ~(keystone_admin)]# /etc/init.d/openstack-cinder-volume restart
Stopping openstack-cinder-volume:                          [  OK  ]
Starting openstack-cinder-volume:                          [  OK  ]
[root@cougar06 ~(keystone_admin)]# less /var/log/cinder/volume.log 
[root@cougar06 ~(keystone_admin)]# /etc/init.d/openstack-cinder-volume status
openstack-cinder-volume dead but pid file exists
[root@cougar06 ~(keystone_admin)]# 


not verified, service still fails to start:

2013-12-12 14:37:39.892 9829 ERROR cinder.service [req-267d5916-21e2-4d89-b226-d56ee214988b None None] Unhandled exception
2013-12-12 14:37:39.892 9829 TRACE cinder.service Traceback (most recent call last):
2013-12-12 14:37:39.892 9829 TRACE cinder.service   File "/usr/lib/python2.6/site-packages/cinder/service.py", line 228, in _start_child
2013-12-12 14:37:39.892 9829 TRACE cinder.service     self._child_process(wrap.server)
2013-12-12 14:37:39.892 9829 TRACE cinder.service   File "/usr/lib/python2.6/site-packages/cinder/service.py", line 205, in _child_process
2013-12-12 14:37:39.892 9829 TRACE cinder.service     launcher.run_server(server)
2013-12-12 14:37:39.892 9829 TRACE cinder.service   File "/usr/lib/python2.6/site-packages/cinder/service.py", line 96, in run_server
2013-12-12 14:37:39.892 9829 TRACE cinder.service     server.start()
2013-12-12 14:37:39.892 9829 TRACE cinder.service   File "/usr/lib/python2.6/site-packages/cinder/service.py", line 385, in start
2013-12-12 14:37:39.892 9829 TRACE cinder.service     self.manager.init_host()
2013-12-12 14:37:39.892 9829 TRACE cinder.service   File "/usr/lib/python2.6/site-packages/cinder/volume/manager.py", line 209, in init_host
2013-12-12 14:37:39.892 9829 TRACE cinder.service     self.driver.ensure_export(ctxt, volume)
2013-12-12 14:37:39.892 9829 TRACE cinder.service   File "/usr/lib/python2.6/site-packages/cinder/volume/drivers/glusterfs.py", line 839, in ensure_export
2013-12-12 14:37:39.892 9829 TRACE cinder.service     self._ensure_share_mounted(volume['provider_location'])
2013-12-12 14:37:39.892 9829 TRACE cinder.service   File "/usr/lib/python2.6/site-packages/cinder/volume/drivers/glusterfs.py", line 1016, in _ensure_share_mounted
2013-12-12 14:37:39.892 9829 TRACE cinder.service     self._mount_glusterfs(glusterfs_share, mount_path, ensure=True)
2013-12-12 14:37:39.892 9829 TRACE cinder.service   File "/usr/lib/python2.6/site-packages/cinder/volume/drivers/glusterfs.py", line 1099, in _mount_glusterfs
2013-12-12 14:37:39.892 9829 TRACE cinder.service     self._execute('mkdir', '-p', mount_path)
2013-12-12 14:37:39.892 9829 TRACE cinder.service   File "/usr/lib/python2.6/site-packages/cinder/utils.py", line 143, in execute
2013-12-12 14:37:39.892 9829 TRACE cinder.service     return processutils.execute(*cmd, **kwargs)
2013-12-12 14:37:39.892 9829 TRACE cinder.service   File "/usr/lib/python2.6/site-packages/cinder/openstack/common/processutils.py", line 173, in execute
2013-12-12 14:37:39.892 9829 TRACE cinder.service     cmd=' '.join(cmd))
2013-12-12 14:37:39.892 9829 TRACE cinder.service ProcessExecutionError: Unexpected error while running command.
2013-12-12 14:37:39.892 9829 TRACE cinder.service Command: mkdir -p /var/lib/cinder/mnt/249458a2755cd0a9f302b9d81eb3f35d
2013-12-12 14:37:39.892 9829 TRACE cinder.service Exit code: 1
2013-12-12 14:37:39.892 9829 TRACE cinder.service Stdout: ''
2013-12-12 14:37:39.892 9829 TRACE cinder.service Stderr: "mkdir: cannot create directory `/var/lib/cinder/mnt/249458a2755cd0a9f302b9d81eb3f35d': File exists\n"
2013-12-12 14:37:39.892 9829 TRACE cinder.service 
(END)

--- Additional comment from Eric Harney on 2013-12-12 10:05:28 EST ---

IIRC, the only time mkdir -p can fail like this is if the directory exists but the mount has broken due to a Gluster client / fuse issue.  Is this a scenario where the Gluster server was unavailable or similar?

--- Additional comment from Dafna Ron on 2013-12-12 10:10:59 EST ---

the server was up and so is the service - I just stopped the volume 

Steps to Reproduce:
1. configure cinder to use gluster as backup
2. stop the volume on gluster 
3. restart cinder-volumes

[root@vm-161-158 ~]# gluster volume stop Dafna_cougars1
Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y
volume stop: Dafna_cougars1: success
[root@vm-161-158 ~]# gluster volume status Dafna_cougars1
Volume Dafna_cougars1 is not started
[root@vm-161-158 ~]# /etc/init.d/glusterd status
glusterd (pid 1767) is running...
[root@vm-161-158 ~]# 

[root@cougar06 ~(keystone_admin)]# /etc/init.d/openstack-cinder-volume restart
Stopping openstack-cinder-volume:                          [  OK  ]
Starting openstack-cinder-volume:                          [  OK  ]
[root@cougar06 ~(keystone_admin)]# /etc/init.d/openstack-cinder-volume status
openstack-cinder-volume dead but pid file exists
[root@cougar06 ~(keystone_admin)]#

--- Additional comment from Eric Harney on 2013-12-12 10:19:52 EST ---

The failure occurred before it even attempted the mount though.  (At mkdir.)

This means the failure is related to whatever the state was before that run.

--- Additional comment from Dafna Ron on 2013-12-12 10:21:50 EST ---

what do you mean by whatever the state was before that run?

--- Additional comment from Eric Harney on 2013-12-12 10:28:36 EST ---

If the /var/lib/cinder/mnt/<id> directory is in a "broken" state, i.e. fuse mounted but no longer functional, this failure will occur -- mkdir -p doesn't interpret it as an existing directory (probably because stat fails, or similar), and so tries to create it.  Creation fails because the directory already exists with that name.

If you want to simulate this, kill the glusterfs pid that is running for that mount point.  Restarting the cinder volume service will then do this.

It looks like this on the file system:
# pwd
/var/lib/cinder/mnt
# stat 5ad2a11c8e453f67725211d01aad7692 
stat: cannot stat `5ad2a11c8e453f67725211d01aad7692': Transport endpoint is not connected
# ls 5ad2a11c8e453f67725211d01aad7692 
ls: cannot access 5ad2a11c8e453f67725211d01aad7692: Transport endpoint is not connected

Comment 1 Flavio Percoco 2014-02-12 08:03:54 UTC
The upstream bug suggests this issue was already fixed by:

https://review.openstack.org/#/c/61088/

Dafna, could you please verify this? Otherwise I think this bug could be closed.

Comment 3 Sergey Gotliv 2014-05-08 14:02:14 UTC
Merged upstream on January.

Comment 5 Yogev Rabl 2014-06-12 12:47:19 UTC
verified in version:

python-cinderclient-1.0.8-1.el7ost.noarch
openstack-cinder-2014.1-4.el7ost.noarch
python-cinder-2014.1-4.el7ost.noarch

Comment 9 errata-xmlrpc 2014-07-08 15:30:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-0852.html

Comment 10 Red Hat Bugzilla 2023-09-14 01:55:33 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.