Bug 1797075

Summary: [OSP 16/RHCS 4.0] Can't create or revoke access to Manila shares - ceph-nfs-pacemaker is broken
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Tom Barron <tbarron>
Component: ContainerAssignee: Dimitri Savineau <dsavinea>
Status: CLOSED ERRATA QA Contact: Yogev Rabl <yrabl>
Severity: high Docs Contact: Karen Norteman <knortema>
Priority: high    
Version: 4.0CC: agunn, anharris, bniver, ccopello, ceph-eng-bugs, dhill, dsariel, dsavinea, ealcaniz, gabrioux, gcharot, gfidente, gouthamr, hyu, jbrier, kdreyer, knortema, lkuchlan, mhackett, mmurray, nsatsia, pasik, pgrist, rmandyam, sputhenp, tbarron, tchandra, tserlin, vhariria, vimartin, yrabl
Target Milestone: rcKeywords: AutomationBlocker, Regression
Target Release: 4.1   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: rhceph:ceph-4.0-rhel-8-containers-candidate-64223-20200206175240 Doc Type: Bug Fix
Doc Text:
.The `nfs-ganesha` daemon starts normally Previously, a configuration using `nfs-ganesha` with the RADOS backend would not start because the `nfs-ganesha-rados-urls` library was missing. This occurred because the `nfs-ganesha` library package for the RADOS backend was moved to a dedicated package. With this update, the `nfs-ganesha-rados-urls` package is added to the Ceph container image, so the `nfs-ganesha` daemon starts successfully.
Story Points: ---
Clone Of: 1797047 Environment:
Last Closed: 2020-06-03 16:22:16 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1760354, 1797047, 1798514, 1799098, 1816167    

Description Tom Barron 2020-01-31 20:41:07 UTC
+++ This bug was initially created as a clone of Bug #1797047 +++

Description of problem:

## description from original bug in OSP Manila follows, but I repeat here the ganesha log from the ceph-nfs container for clarity:

<log>
Jan 31 19:34:06 controller-0 podman[272774]: exec: PID 57: spawning /usr/bin/ganesha.nfsd  -F -L STDOUT
   Jan 31 19:34:06 controller-0 podman[272774]: exec: Waiting 57 to quit
   Jan 31 19:34:06 controller-0 podman[272774]: 31/01/2020 19:34:06 : epoch 5e34812e : controller-0 : ganesha.nfsd-57[main] main :MAIN :EVENT :ganesha.nfsd Starting: Ganesha Version /builddir/build/BUILD/nfs-ganesha-2.8.3/src, built at Jan 17 2020 20:56:03 on
   Jan 31 19:34:06 controller-0 podman[272774]: 31/01/2020 19:34:06 : epoch 5e34812e : controller-0 : ganesha.nfsd-57[main] load_rados_config :CONFIG :CRIT :Unknown urls backend
   Jan 31 19:34:06 controller-0 podman[272774]: 31/01/2020 19:34:06 : epoch 5e34812e : controller-0 : ganesha.nfsd-57[main] main :NFS STARTUP :CRIT :Error (token scan) while parsing (/etc/ganesha/ganesha.conf)
   Jan 31 19:34:06 controller-0 podman[272774]: 31/01/2020 19:34:06 : epoch 5e34812e : controller-0 : ganesha.nfsd-57[main] config_errs_to_log :CONFIG :CRIT :Config File (/etc/ganesha/ganesha.conf:24): new url (rados://manila_data/ganesha-export-index) open error (Success), ignored
   Jan 31 19:34:06 controller-0 podman[272774]: 31/01/2020 19:34:06 : epoch 5e34812e : controller-0 : ganesha.nfsd-57[main] main :NFS STARTUP :FATAL :Fatal errors.  Server exiting...
</log>

When we've seen this in the past, the ganesha packaged into the ceph-image was compiled without the proper cmake flags required for it to understand the rados url from ganesha.conf.


Original bug description:

I'm unable to allow/deny access or mount shares with an OSP 16 (GA candidate) puddle [1] and RHCS 4.0 [2]. When I query nfs server status from an OSP controller node the nfs-ganesha server doesn't seem to respond:

   [root@controller-0 manila]# rpcinfo -T tcp 172.17.5.126 100003
   rpcinfo: RPC: Program not registered


Version-Release number of selected component (if applicable): 16.0

[1] RHOS_TRUNK-16.0-RHEL-8-20200130.n.0
[2] Ceph Image details:
    "Labels": {
                "CEPH_POINT_RELEASE": "",
                "GIT_BRANCH": "stable-4.0",
                "GIT_CLEAN": "True",
                "GIT_COMMIT": "376b3b9a129c6fe1a1d081711fa662ccfd657452",
                "GIT_REPO": "https://github.com/ceph/ceph-container.git",
                "RELEASE": "stable-4.0",
                "architecture": "x86_64",
                "authoritative-source-url": "registry.access.redhat.com",
                "build-date": "2020-01-20T23:01:19.600649",
                "com.redhat.build-host": "cpt-1007.osbs.prod.upshift.rdu2.redhat.com",
                "com.redhat.component": "rhceph-container",
                "com.redhat.license_terms": "https://www.redhat.com/en/about/red-hat-end-user-license-agreements",
                "description": "Red Hat Ceph Storage 4",
                "distribution-scope": "public",
                "io.k8s.description": "Red Hat Ceph Storage 4",
                "io.k8s.display-name": "Red Hat Ceph Storage 4 on RHEL 8",
                "io.openshift.expose-services": "",
                "io.openshift.tags": "rhceph ceph",
                "maintainer": "Dimitri Savineau <dsavinea>",
                "name": "rhceph",
                "release": "121.20200120.ci.1",
                "summary": "Provides the latest Red Hat Ceph Storage 4 on RHEL 8 in a fully featured and supported base image.",
                "url": "https://access.redhat.com/containers/#/registry.access.redhat.com/rhceph/images/4-121.20200120.ci.1",
                "vcs-ref": "76bcc9029f35fc0bef5e4ab813a23fe95d3ad2e1",
                "vcs-type": "git",
                "vendor": "Red Hat, Inc.",
                "version": "4"
            },

How reproducible: Always


Steps to Reproduce:
1. Deploy RHOSP 16 beta (GA Candidate if you will), with manila and ceph via nfs-ganesha backend. The ceph image should be the RHCS4.0 (beta versions are available on access.redhat.com/containers). 
2. After the deployment finishes, create a manila share and allow access 


Actual results:

   Access rule transitions to "error" state

Expected results:

   Access rule transitions to "active" state

Additional info:

Check the Triage info in further comments

--- Additional comment from Goutham Pacha Ravi on 2020-01-31 19:42:21 UTC ---

Triage:

  0) Check manila share service logs on the controller hosting the manila-share pacemaker bundle:

    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server [req-260337f6-29ab-4300-8c53-7be6ef02be0a da6f2e04378d4274bf5a5ca166d0e99a 99bffc13f98a43ffa30f12c3a081853d - - -] Exception during message handling: manila.exception.GaneshaCommandFailure: Ganesha management command failed.
Command: dbus-send --print-reply --system --dest=org.ganesha.nfsd /org/ganesha/nfsd/ExportMgr org.ganesha.nfsd.exportmgr.AddExport string:/etc/ganesha/export.d/share-5036e505-be7b-42c0-81f7-12844726781e.conf.pbnc57 string:EXPORT(Export_Id=1001)
Exit code: 1
Stdout: ''
Stderr: 'Error org.freedesktop.DBus.Error.ServiceUnknown: The name org.ganesha.nfsd was not provided by any .service files\n'
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/manila/share/drivers/ganesha/manager.py", line 233, in _execut
e
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server     return execute(*args, **kwargs)
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/manila/share/drivers/ganesha/utils.py", line 59, in __call__
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server     return self.execute(*args, **exkwargs)
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/manila/utils.py", line 101, in execute
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server     return processutils.execute(*cmd, **kwargs)
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py", line 424, in execute
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server     cmd=sanitized_cmd)
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server Command: dbus-send --print-reply --system --dest=org.ganesha.nfsd /org/ganesha/nfsd/ExportMgr org.ganesha.nfsd.exportmgr.AddExport string:/etc/ganesha/export.d/share-5036e505-be7b-42c0-81f7-12844726781e.conf.pbnc57 string:EXPORT(Export_Id=1001)
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server Exit code: 1
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server Stdout: ''
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server Stderr: 'Error org.freedesktop.DBus.Error.ServiceUnknown: The name org.ganesha.nfsd was not provided by any .service files\n'
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server During handling of the above exception, another exception occurred:
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/manila/share/drivers/ganesha/manager.py", line 474, in add_export
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server     "string:EXPORT(Export_Id=%d)" % xid)
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server     message='dbus call %s.%s' % (service, method), **kwargs)
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/manila/share/drivers/ganesha/manager.py", line 242, in _execute
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server     cmd=e.cmd)
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server manila.exception.GaneshaCommandFailure: Ganesha management command failed.
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server Command: dbus-send --print-reply --system --dest=org.ganesha.nfsd /org/ganesha/nfsd/ExportMgr org.ganesha.nfsd.exportmgr.AddExport string:/etc/ganesha/export.d/share-5036e505-be7b-42c0-81f7-12844726781e.conf.pbnc57 string:EXPORT(Export_Id=1001)
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server Exit code: 1
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server Stdout: ''
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server Stderr: 'Error org.freedesktop.DBus.Error.ServiceUnknown: The name org.ganesha.nfsd was not provided by any .service files\n'
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server During handling of the above exception, another exception occurred:
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server     res = self.dispatcher.dispatch(message)
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 274, in dispatch
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server     return self._do_dispatch(endpoint, method, ctxt, args)
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 194, in _do_dispatch
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server     result = func(ctxt, **new_args)
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/manila/share/manager.py", line 187, in wrapped
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server     return f(self, *args, **kwargs)
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/manila/utils.py", line 568, in wrapper
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server     return func(self, *args, **kwargs)
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/manila/share/manager.py", line 3554, in update_access
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server     share_server=share_server)
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/manila/share/access.py", line 283, in update_access_rules
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server     share_server=share_server)
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/manila/share/access.py", line 322, in _update_access_rules
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server     share_server)
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/manila/share/access.py", line 390, in _update_rules_through_share_driver
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server     share_server=share_server
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/manila/share/access.py", line 390, in _update_rules_through_share_driver
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server     share_server=share_server
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/manila/share/drivers/cephfs/driver.py", line 289, in update_access
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server     share_server=share_server)
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/manila/share/drivers/ganesha/__init__.py", line 308, in update_access
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server     self.ganesha.add_export(share['name'], confdict)
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/manila/share/drivers/ganesha/manager.py", line 491, in add_export
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server     cmd=e.cmd)
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server manila.exception.GaneshaCommandFailure: Ganesha management command failed.
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server Command: dbus-send --print-reply --system --dest=org.ganesha.nfsd /org/ganesha/nfsd/ExportMgr org.ganesha.nfsd.exportmgr.AddExport string:/etc/ganesha/export.d/share-5036e505-be7b-42c0-81f7-12844726781e.conf.pbnc57 string:EXPORT(Export_Id=1001)
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server Exit code: 1
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server Stdout: ''
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server Stderr: 'Error org.freedesktop.DBus.Error.ServiceUnknown: The name org.ganesha.nfsd was not provided by any .service files\n'
    2020-01-31 19:01:16.739 42 ERROR oslo_messaging.rpc.server

  1) Log into one of the controller nodes and check the Podman containers for ceph.  "ceph-nfs-pacemaker" container is likely missing:

     podman ps | grep ceph
b279eaad8f5d  undercloud-0.ctlplane.redhat.local:8787/ceph/rhceph-4.0-rhel8:latest                                                   9 days ago  Up 9 days ago
     ceph-mds-controller-0
289e591ba4ec  undercloud-0.ctlplane.redhat.local:8787/ceph/rhceph-4.0-rhel8:latest                                                   9 days ago  Up 9 days ago
     ceph-mgr-controller-0
6400ff546651  undercloud-0.ctlplane.redhat.local:8787/ceph/rhceph-4.0-rhel8:latest                                                   9 days ago  Up 9 days ago
     ceph-mon-controller-0

    
  You can watch "podman ps" to see "ceph-nfs-pacemaker" container being restarted.

 2) journalctl -u ceph-nfs@pacemaker

   Jan 31 19:34:06 controller-0 podman[272774]: exec: PID 57: spawning /usr/bin/ganesha.nfsd  -F -L STDOUT
   Jan 31 19:34:06 controller-0 podman[272774]: exec: Waiting 57 to quit
   Jan 31 19:34:06 controller-0 podman[272774]: 31/01/2020 19:34:06 : epoch 5e34812e : controller-0 : ganesha.nfsd-57[main] main :MAIN :EVENT :ganesha.nfsd Starting: Ganesha Version /builddir/build/BUILD/nfs-ganesha-2.8.3/src, built at Jan 17 2020 20:56:03 on
   Jan 31 19:34:06 controller-0 podman[272774]: 31/01/2020 19:34:06 : epoch 5e34812e : controller-0 : ganesha.nfsd-57[main] load_rados_config :CONFIG :CRIT :Unknown urls backend
   Jan 31 19:34:06 controller-0 podman[272774]: 31/01/2020 19:34:06 : epoch 5e34812e : controller-0 : ganesha.nfsd-57[main] main :NFS STARTUP :CRIT :Error (token scan) while parsing (/etc/ganesha/ganesha.conf)
   Jan 31 19:34:06 controller-0 podman[272774]: 31/01/2020 19:34:06 : epoch 5e34812e : controller-0 : ganesha.nfsd-57[main] config_errs_to_log :CONFIG :CRIT :Config File (/etc/ganesha/ganesha.conf:24): new url (rados://manila_data/ganesha-export-index) open error (Success), ignored
   Jan 31 19:34:06 controller-0 podman[272774]: 31/01/2020 19:34:06 : epoch 5e34812e : controller-0 : ganesha.nfsd-57[main] main :NFS STARTUP :FATAL :Fatal errors.  Server exiting...
   Jan 31 19:34:06 controller-0 podman[272774]: teardown: managing teardown after SIGCHLD
   Jan 31 19:34:06 controller-0 podman[272774]: teardown: Waiting PID 57 to terminate
   Jan 31 19:34:06 controller-0 podman[272774]: teardown: Process 57 is terminated
   Jan 31 19:34:06 controller-0 podman[272774]: teardown: Bye Bye, container will die    with return code 0
   Jan 31 19:34:06 controller-0 podman[272774]: 2020-01-31 19:34:06.397920413 +0000 UTC m=+5.484116389 container died 6654dd25c3c73da64ea3bcc9fdf52e9900563db4791bda98b74e7d7af22dff8b (image=undercloud-0.ctlplane.redhat.local:8787/ceph/rhceph-4.0-rhel8:latest, name=ceph-nfs-pacemaker)
   Jan 31 19:34:06 controller-0 podman[272774]: 2020-01-31 19:34:06.456853837 +0000 UTC m=+5.543049781 container remove 6654dd25c3c73da64ea3bcc9fdf52e9900563db4791bda98b74e7d7af22dff8b (image=undercloud-0.ctlplane.redhat.local:8787/ceph/rhceph-4.0-rhel8:latest, name=ceph-nfs-pacemaker)
   Jan 31 19:34:06 controller-0 podman[273303]: Error: no container with name or ID ceph-nfs-pacemaker found: no such container

Comment 1 Dimitri Savineau 2020-01-31 21:54:38 UTC
Since the nfs-ganesha 2.8.3 rebase in RHCS 4 the nfs-ganesha package has been split with new packages like nfs-ganesha-rados-urls which contains the libganesha_rados_urls library used for handling RADOS URL configurations.

Comment 26 David Hill 2020-04-02 15:03:47 UTC
May we have the hotfix for my customer then ?

Comment 43 errata-xmlrpc 2020-06-03 16:22:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2385

Comment 44 Red Hat Bugzilla 2023-09-15 00:21:07 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days