Bug 1696717

Summary: [RFE] deploy manila cephfs-with-NFS with an external ceph cluster
Product: Red Hat OpenStack Reporter: Tom Barron <tbarron>
Component: openstack-tripleo-heat-templatesAssignee: Giulio Fidente <gfidente>
Status: CLOSED CURRENTRELEASE QA Contact: Yogev Rabl <yrabl>
Severity: high Docs Contact: Laura Marsh <lmarsh>
Priority: high    
Version: 16.0 (Train)CC: asimonel, ccopello, fiezzi, gcharot, gfidente, gouthamr, gregraka, jamsmith, jgrosso, jhardee, jmelvin, johfulto, jschluet, lmarsh, mburns, ndeevy, nlevinki, nweinber, nwolf, pasik, pgrist, sclewis, sisadoun, sputhenp, vhariria, yrabl
Target Milestone: z2Keywords: FutureFeature, TestOnly, Triaged
Target Release: 16.0 (Train on RHEL 8.1)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-11.3.1-0.20191126041653.414d4d9.el8ost Doc Type: Enhancement
Doc Text:
This feature enables the Red Hat OpenStack Platform director to deploy the Shared File System (manila) with an external Ceph Storage cluster. In this type of deployment, Ganesha still runs on the Controller nodes that Pacemaker manages using an active-passive configuration. This feature is supported with Ceph Storage 4.1 or later.
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-04-15 10:38:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1710358, 1801319, 1802066, 1814942, 1819988, 1822328, 1831285, 1831342    
Bug Blocks: 1766484, 1843668    
Attachments:
Description Flags
failed deployment logs none

Description Tom Barron 2019-04-05 13:35:18 UTC
Description of problem: Support for deploying manila with CephFS-with-NFS (via ganesha gateway) was added in OSP13 but this was only for deployments where Director installs the Ceph daemons and the Ganesha daemon.  While this meets the needs of some of our customers, some would also like to be able to deploy manila such that it references an externally deployed Ceph cluster.

There are several possible ways that this need might be met.  For example there is ongoing work to deploy the ceph daemons and ganesha via rook and kubernetes, but we don't have a concrete timeline for that work and it would not be something we could backport.  Alternatively, we may be able to modify the current TripleO heat templates and ceph-ansible playbooks so that if an external cluster is available we can reference it instead of installing the daemons ourselves.  There are two variations of this last approach -- one where only the  ceph daemons are external and we still deploy ganesha, and one where ganesha is also external.

An important consideration for all these possibilities is that ganesha is in the data path for share service and cannot today run active-active.  That is why when we introduced support for Cephfs-via-NFS in OSP13 we ran ganesha on controller nodes as part of the pacemaker cluster there, and that need drove the choice to lead with support only for Director-integrated deployment of ganesha and ceph daemons.

So this work may split into three phases:

  1) see if we can keep pacemaker control of ganesha for service availability
     but allow the ceph daemons themselves to be externally deployed.

  2) for deployments that can manage ganesha availability themselves,
     allow ceph daemons and ganesha to be externally deployed.  This scenario
     would likely always involve a Support Exception so that Red Hat is
     not held accountable for failure in the data path.

  3) longer term, work with Storage BU on the rook based deployment of
     external ceph daemons and ganesha service, where even if Director
     triggers the deployment of the ceph-ganesha infrastructure at the
     same time that it deploys the overcloud, it is technically external
     to OpenStack itself and where the HA for ganesha service is no longer
     maintained by pacemaker cluster in OpenStack.

Comment 30 Yogev Rabl 2020-02-10 15:18:18 UTC
Deployment failed in version 
openstack-tripleo-heat-templates-11.3.2-0.20200131125640.cc909b6.el8ost.noarch
with the error: 

  "fatal: [controller-0]: FAILED! => ",
        "  msg: |-",
        "    The task includes an option with an undefined variable. The error was: 'ansible.parsing.yaml.objects.AnsibleUnicode object' has no attribute 'name'",
        "  ",
        "    The error appears to be in '/usr/share/ceph-ansible/roles/ceph-nfs/tasks/main.yml': line 29, column 3, but may",
        "    be elsewhere in the file depending on the exact syntax problem.",
        "    The offending line appears to be:",
        "    - name: copy rgw keyring when deploying internal ganesha with external ceph cluster",
        "      ^ here",

Comment 31 Yogev Rabl 2020-02-10 15:19:02 UTC
Created attachment 1662167 [details]
failed deployment logs

Comment 33 Lon Hohberger 2020-02-13 11:40:17 UTC
According to our records, this should be resolved by openstack-tripleo-heat-templates-11.3.2-0.20200131125640.cc909b6.el8ost.  This build is available now.

Comment 44 Yogev Rabl 2020-04-15 00:59:09 UTC
verified with all manila tests passed successfully