Bug 1380482 - The CephFS Native Manila Driver will Flood the Share Log with Errors when it Cannot Connect to Backing CephFS Cluster
Summary: The CephFS Native Manila Driver will Flood the Share Log with Errors when it ...
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-manila
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
Target Milestone: rc
: 10.0 (Newton)
Assignee: Jan Provaznik
QA Contact: Dustin Schoenbrun
Don Domingo
Depends On:
TreeView+ depends on / blocked
Reported: 2016-09-29 17:49 UTC by Dustin Schoenbrun
Modified: 2017-02-21 00:19 UTC (History)
7 users (show)

Fixed In Version: openstack-manila-3.0.0-5.el7ost
Doc Type: Bug Fix
Doc Text:
Prior to this update, the Manila Ceph FS driver did not check if it could connect to the Ceph server. Consequently, if the connection to the Ceph server did not work, `manila-share` service kept crashing or respawning without any timeout. With this update, there is now a check to confirm that the Ceph connection works when initializing the Manila Ceph FS driver. As a result, the Ceph driver checks the Ceph connection on driver init, and if it fails the driver is not initialized and no further steps are performed.
Clone Of:
Last Closed: 2016-12-14 16:06:11 UTC

Attachments (Terms of Use)
head of /var/log/manila/share.log after native cephfs driver deployed w/o actual cephfs backend (63.23 KB, text/plain)
2016-11-21 23:20 UTC, Tom Barron
no flags Details

System ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2016:2948 normal SHIPPED_LIVE Red Hat OpenStack Platform 10 enhancement update 2016-12-14 19:55:27 UTC
OpenStack gerrit 397744 None None None 2016-11-17 19:56:26 UTC
Launchpad 1640169 None None None 2016-11-17 19:54:59 UTC

Description Dustin Schoenbrun 2016-09-29 17:49:59 UTC
Description of problem:
When the CephFS Native Driver cannot connect to the backing CephFS Cluster, it will report an error to the Manila Share log saying that it cannot connect. It will then immediately attempt to reconnect again to the CephFS cluster where it will most likely fail again. There is seemingly no limit to the amount of retries on connecting to the CephFS cluster which will cause the Manila Share log to grow exceptionally quickly. 

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Set up OSP-10 using Packstack, ensuring that Manila is installed.
2. Configure the CephFS Native driver but ensure that the driver cannot connect to the backing CephFS cluster.
3. Observe that the driver cannot connect to the backing CephFS cluster and that the Manila Share log is flooded with error messages. 

Actual results:
The driver appears to attempt to reconnect continuously and will flood the Manila Share log with error messages.

Expected results:
The driver should only retry a certain number of times before giving up or should space out the retries over a longer period of time.

Comment 1 Tom Barron 2016-09-30 14:31:47 UTC
We should investigate whether this is a CephFS driver-specific issue or whether any manila backend that fails to connect to external storage will do the same thing.  And if the latter, is this a problem also in cinder?

Comment 2 Paul Grist 2016-10-14 18:05:29 UTC
Targeting 10z, but if this is very problematic then consider bringing it back.

Comment 3 Jan Provaznik 2016-11-08 13:58:15 UTC
upstream bug: https://bugs.launchpad.net/manila/+bug/1640169

Comment 5 Tom Barron 2016-11-21 23:20:21 UTC
Created attachment 1222498 [details]
head of /var/log/manila/share.log after native cephfs driver deployed w/o actual cephfs backend

Comment 6 Tom Barron 2016-11-21 23:27:33 UTC
I used OSPd to deploy the native cephfs backend for manila via '-e /usr/share/openstack-tripleo-heat-templates/environments/manila-cephfsnative-config.yaml' using the latest rhos10 puddle, core_puddle=2016-11-19.4.

Results are in https://bugzilla.redhat.com/attachment.cgi?id=1222498, where one can readily see that the manila share log shows that the manila share service
correctly determines that it cannot interact with the backend.  Instead of
retrying in a quick loop as reported in this BZ and in https://bugs.launchpad.net/manila/+bug/1640169 the share service instead declares:

2016-11-21 22:39:54.682 113290 ERROR oslo_service.periodic_task DriverNotInitialized: Share driver 'CephFSNativeDriver' not initialized.

This message is seen again on periodic task updates that require interaction
with the driver:

2016-11-21 22:40:54.682 113290 ERROR oslo_service.periodic_task DriverNotInitialized: Share driver 'CephFSNativeDriver' not initialized.

In other words, the current log shows behavior consistent with other backends,
and not the tight infinite loop of retries to connect to the CephFS cluster
as reported in this bug.

Comment 8 Dustin Schoenbrun 2016-11-22 22:22:58 UTC
Thanks for having a look at this, Tom! Looks good to me. Marking the bug as VERIFIED.

Comment 10 errata-xmlrpc 2016-12-14 16:06:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.