Bug 1415236

Summary: [RFE] Scale up - Standalone MDS
Product: Red Hat OpenStack Reporter: jomurphy
Component: rhosp-directorAssignee: Yogev Rabl <yrabl>
Status: CLOSED CURRENTRELEASE QA Contact: Yogev Rabl <yrabl>
Severity: urgent Docs Contact: Derek <dcadzow>
Priority: urgent    
Version: 12.0 (Pike)CC: dbecker, dcadzow, jamsmith, jefbrown, johfulto, mburns, morazi, pgrist, rhel-osp-director-maint, scohen, yrabl
Target Milestone: Upstream M2Keywords: FutureFeature, TestOnly
Target Release: 13.0 (Queens)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
After scaling out CephMDS servers (for example, by increasing the MdsCount from 1 to 3 and re-running openstack overcloud deploy), follow the RHCS documentation to mark at least one of them as a standby MDS. https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html-single/ceph_file_system_guide/#configuring-standby-metadata-server-daemons
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-28 15:17:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1414467    

Description jomurphy 2017-01-20 16:10:17 UTC
Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 7 Yogev Rabl 2018-04-23 17:25:13 UTC
failed
The update of the overcloud ended successfully, but the MDS service in the Ceph cluster is not in the desirable status.

The end result of the update from a single MDS dedicate node to 3 nodes is
# ceph -s
  cluster:
    id:     e20d9670-46fb-11e8-a706-5254004123d2
    health: HEALTH_WARN
            insufficient standby MDS daemons available

  services:
    mon: 3 daemons, quorum controller-1,controller-0,controller-2
    mgr: controller-2(active), standbys: controller-1, controller-0
    mds: cephfs-3/3/3 up  {0=mds-0=up:active,1=mds-2=up:active,2=mds-1=up:active}
    osd: 5 osds: 5 up, 5 in

  data:
    pools:   7 pools, 416 pgs
    objects: 57 objects, 4870 bytes
    usage:   543 MB used, 99241 MB / 99784 MB avail
    pgs:     416 active+clean

Comment 8 John Fulton 2018-04-23 20:03:25 UTC
The scale up worked, in that the stack update completed without error and we have two additional MDS servers [0]. We just have the warning from #7.

Given the meaning of the warning, "insufficient standby daemons available", as per http://docs.ceph.com/docs/master/cephfs/health-messages 

One or more file systems are configured to have a certain number of standby daemons available (including daemons in standby-replay) but the cluster does not have enough standby daemons. The standby daemons not in replay count towards any file system (i.e. they may overlap). This warning can configured by setting ceph fs set <fs> standby_count_wanted <count>. Use zero for count to disable.

What extra steps should be taken to prevent the warning?

In other words, the steps that were followed to reproduce this were:

1. deploy [1]
2. change MdsCount from 1 to 3 in ~/composable_roles/roles/nodes.yaml [2]
3. re-run deploy

So during step2, what additional parameter should be changed? This may be as simple as adding additional documentation to include the following: 

https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html-single/ceph_file_system_guide/#configuring-standby-metadata-server-daemons


[0] 
(undercloud) [stack@undercloud-0 composable_roles]$ openstack stack list
+--------------------------------------+------------+----------------------------------+-----------------+----------------------+----------------------+
| ID                                   | Stack Name | Project                          | Stack Status    | Creation Time        | Updated Time         |
+--------------------------------------+------------+----------------------------------+-----------------+----------------------+----------------------+
| 5fe06325-352e-4ba8-a40f-9ed4bd843e72 | overcloud  | 99a8987f56be4a1c90a07b8cb5344e6b | UPDATE_COMPLETE | 2018-04-23T15:22:24Z | 2018-04-23T16:24:56Z |
+--------------------------------------+------------+----------------------------------+-----------------+----------------------+----------------------+
(undercloud) [stack@undercloud-0 composable_roles]$ openstack server list
+--------------------------------------+--------------+--------+------------------------+----------------+-----------+
| ID                                   | Name         | Status | Networks               | Image          | Flavor    |
+--------------------------------------+--------------+--------+------------------------+----------------+-----------+
| 0730e499-079b-44b4-9962-b464b9621d08 | mds-1        | ACTIVE | ctlplane=192.168.24.17 | overcloud-full | mds       |
| 144d0f0a-6601-4a8c-a56b-302528916b14 | mds-2        | ACTIVE | ctlplane=192.168.24.22 | overcloud-full | mds       |
| 0f2a82d7-4490-4c17-863f-1727af56b7a2 | controller-1 | ACTIVE | ctlplane=192.168.24.18 | overcloud-full | baremetal |
| 44652461-3696-4b11-800f-34845a7156f7 | controller-2 | ACTIVE | ctlplane=192.168.24.11 | overcloud-full | baremetal |
| 6c94343a-37b0-4e20-bcea-ec7ac6fdc27f | mds-0        | ACTIVE | ctlplane=192.168.24.10 | overcloud-full | mds       |
| 516755ea-1f73-4ebf-9b91-2ea5f7929711 | controller-0 | ACTIVE | ctlplane=192.168.24.16 | overcloud-full | baremetal |
| 828354a4-feca-421a-8bea-a25311fd17dc | compute-0    | ACTIVE | ctlplane=192.168.24.8  | overcloud-full | compute   |
| 178e15bc-a4a1-4427-9c23-da5e8e95052a | ceph-0       | ACTIVE | ctlplane=192.168.24.12 | overcloud-full | ceph      |
+--------------------------------------+--------------+--------+------------------------+----------------+-----------+
(undercloud) [stack@undercloud-0 composable_roles]$ 

[1] 
(undercloud) [stack@undercloud-0 composable_roles]$ cat ~/overcloud_deploy.sh 
#!/bin/bash

openstack overcloud deploy \
  --timeout 100 \
  --templates /usr/share/openstack-tripleo-heat-templates \
  --libvirt-type kvm \
  --stack overcloud \
  -r /home/stack/composable_roles/roles/roles_data.yaml \
  -e /home/stack/composable_roles/roles/nodes.yaml \
  -e /home/stack/composable_roles/roles/network-config.yaml \
-e /home/stack/composable_roles/internal.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /home/stack/composable_roles/network/network-environment.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml \
-e /home/stack/composable_roles/debug.yaml \
--environment-file /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-mds.yaml \
--environment-file /usr/share/openstack-tripleo-heat-templates/environments/manila-cephfsnative-config.yaml \
-e /home/stack/composable_roles/ceph-single-host-mode.yaml \
-e /home/stack/composable_roles/docker-images.yaml \
--log-file overcloud_deployment_69.log
(undercloud) [stack@undercloud-0 composable_roles]$ 


[2]
(undercloud) [stack@undercloud-0 composable_roles]$ cat ~/composable_roles/roles/nodes.yaml 
parameter_defaults:
   ControllerCount: 3
   OvercloudControlFlavor: controller
   MdsCount: 3
   OvercloudMdsFlavor: mds
   ComputeCount: 1
   OvercloudComputeFlavor: compute
   CephStorageCount: 1
   OvercloudCephStorageFlavor: ceph

Comment 9 John Fulton 2018-04-23 20:27:51 UTC
As per our CephFS documentation [1], we should have at least one standby MDSdaemon to remain highly available. I followed the documentation and the warning went away. The Ceph installation documentation [2] states: "After installing Metadata Servers, configure them. For details, see the Configuring Metadata Server Daemons chapter in the Ceph File System Guide for Red Hat Ceph Storage 3". Also, There are no additional tasks in ceph-ansible to do this for the MDS role [3] so TripleO couldn't trigger an additional step; by all indications this is an additional admin task like the one I did below.

I think the answer here is to ensure that there is doc text in this bug describing the additional step and linking to the Ceph documentation.

[root@controller-0 ~]# ceph fs status cephfs
cephfs - 0 clients
======
+------+--------+-------+---------------+-------+-------+
| Rank | State  |  MDS  |    Activity   |  dns  |  inos |
+------+--------+-------+---------------+-------+-------+
|  0   | active | mds-0 | Reqs:    0 /s |   10  |   12  |
|  1   | active | mds-2 | Reqs:    0 /s |   10  |   12  |
|  2   | active | mds-1 | Reqs:    0 /s |   10  |   12  |
+------+--------+-------+---------------+-------+-------+
+-----------------+----------+-------+-------+
|       Pool      |   type   |  used | avail |
+-----------------+----------+-------+-------+
| manila_metadata | metadata | 4870  | 92.0G |
|   manila_data   |   data   |    0  | 92.0G |
+-----------------+----------+-------+-------+

+-------------+
| Standby MDS |
+-------------+
+-------------+
MDS version: ceph version 12.2.4-6.el7cp (78f60b924802e34d44f7078029a40dbe6c0c922f) luminous (stable)
[root@controller-0 ~]# 

[root@controller-0 ~]# ceph fs set cephfs max_mds 2
[root@controller-0 ~]# 
[root@controller-0 ~]# ceph mds deactivate cephfs:2
telling mds.1:2 172.17.3.10:6800/3387458738 to deactivate
[root@controller-0 ~]# 


[root@controller-0 ~]# ceph fs status cephfs
cephfs - 0 clients
======
+------+----------+-------+---------------+-------+-------+
| Rank |  State   |  MDS  |    Activity   |  dns  |  inos |
+------+----------+-------+---------------+-------+-------+
|  0   |  active  | mds-0 | Reqs:    0 /s |   10  |   12  |
|  1   |  active  | mds-2 | Reqs:    0 /s |   10  |   12  |
|  2   | stopping | mds-1 |               |    0  |    1  |
+------+----------+-------+---------------+-------+-------+
+-----------------+----------+-------+-------+
|       Pool      |   type   |  used | avail |
+-----------------+----------+-------+-------+
| manila_metadata | metadata | 4870  | 92.0G |
|   manila_data   |   data   |    0  | 92.0G |
+-----------------+----------+-------+-------+

+-------------+
| Standby MDS |
+-------------+
+-------------+
MDS version: ceph version 12.2.4-6.el7cp (78f60b924802e34d44f7078029a40dbe6c0c922f) luminous (stable)
[root@controller-0 ~]# 

[root@controller-0 ~]# ceph fs status cephfs
cephfs - 0 clients
======
+------+--------+-------+---------------+-------+-------+
| Rank | State  |  MDS  |    Activity   |  dns  |  inos |
+------+--------+-------+---------------+-------+-------+
|  0   | active | mds-0 | Reqs:    0 /s |   10  |   12  |
|  1   | active | mds-2 | Reqs:    0 /s |   10  |   12  |
+------+--------+-------+---------------+-------+-------+
+-----------------+----------+-------+-------+
|       Pool      |   type   |  used | avail |
+-----------------+----------+-------+-------+
| manila_metadata | metadata | 4870  | 92.0G |
|   manila_data   |   data   |    0  | 92.0G |
+-----------------+----------+-------+-------+

+-------------+
| Standby MDS |
+-------------+
|    mds-1    |
+-------------+
MDS version: ceph version 12.2.4-6.el7cp (78f60b924802e34d44f7078029a40dbe6c0c922f) luminous (stable)
[root@controller-0 ~]# 

[root@controller-0 ~]# ceph -s
  cluster:
    id:     e20d9670-46fb-11e8-a706-5254004123d2
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum controller-1,controller-0,controller-2
    mgr: controller-2(active), standbys: controller-1, controller-0
    mds: cephfs-2/2/2 up  {0=mds-0=up:active,1=mds-2=up:active}, 1 up:standby
    osd: 5 osds: 5 up, 5 in
 
  data:
    pools:   7 pools, 416 pgs
    objects: 57 objects, 4870 bytes
    usage:   543 MB used, 99241 MB / 99784 MB avail
    pgs:     416 active+clean
 
[root@controller-0 ~]# 

 
[1] https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html-single/ceph_file_system_guide/#configuring-standby-metadata-server-daemons

[2] https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html-single/installation_guide_for_red_hat_enterprise_linux/#installing-metadata-servers

[3] https://github.com/ceph/ceph-ansible/tree/master/roles/ceph-mds/tasks

Comment 13 Yogev Rabl 2018-04-24 17:06:29 UTC
Thank you @fultonj for your input, I agree that we should inform the users via documentation about the additional steps for them to take

Comment 16 Red Hat Bugzilla 2023-09-14 03:52:31 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days