Bug 1415236
Summary: | [RFE] Scale up - Standalone MDS | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | jomurphy |
Component: | rhosp-director | Assignee: | Yogev Rabl <yrabl> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Yogev Rabl <yrabl> |
Severity: | urgent | Docs Contact: | Derek <dcadzow> |
Priority: | urgent | ||
Version: | 12.0 (Pike) | CC: | dbecker, dcadzow, jamsmith, jefbrown, johfulto, mburns, morazi, pgrist, rhel-osp-director-maint, scohen, yrabl |
Target Milestone: | Upstream M2 | Keywords: | FutureFeature, TestOnly |
Target Release: | 13.0 (Queens) | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Enhancement | |
Doc Text: |
After scaling out CephMDS servers (for example, by increasing the MdsCount from 1 to 3 and re-running openstack overcloud deploy), follow the RHCS documentation to mark at least one of them as a standby MDS.
https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html-single/ceph_file_system_guide/#configuring-standby-metadata-server-daemons
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2018-06-28 15:17:20 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1414467 |
Description
jomurphy
2017-01-20 16:10:17 UTC
failed The update of the overcloud ended successfully, but the MDS service in the Ceph cluster is not in the desirable status. The end result of the update from a single MDS dedicate node to 3 nodes is # ceph -s cluster: id: e20d9670-46fb-11e8-a706-5254004123d2 health: HEALTH_WARN insufficient standby MDS daemons available services: mon: 3 daemons, quorum controller-1,controller-0,controller-2 mgr: controller-2(active), standbys: controller-1, controller-0 mds: cephfs-3/3/3 up {0=mds-0=up:active,1=mds-2=up:active,2=mds-1=up:active} osd: 5 osds: 5 up, 5 in data: pools: 7 pools, 416 pgs objects: 57 objects, 4870 bytes usage: 543 MB used, 99241 MB / 99784 MB avail pgs: 416 active+clean The scale up worked, in that the stack update completed without error and we have two additional MDS servers [0]. We just have the warning from #7. Given the meaning of the warning, "insufficient standby daemons available", as per http://docs.ceph.com/docs/master/cephfs/health-messages One or more file systems are configured to have a certain number of standby daemons available (including daemons in standby-replay) but the cluster does not have enough standby daemons. The standby daemons not in replay count towards any file system (i.e. they may overlap). This warning can configured by setting ceph fs set <fs> standby_count_wanted <count>. Use zero for count to disable. What extra steps should be taken to prevent the warning? In other words, the steps that were followed to reproduce this were: 1. deploy [1] 2. change MdsCount from 1 to 3 in ~/composable_roles/roles/nodes.yaml [2] 3. re-run deploy So during step2, what additional parameter should be changed? This may be as simple as adding additional documentation to include the following: https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html-single/ceph_file_system_guide/#configuring-standby-metadata-server-daemons [0] (undercloud) [stack@undercloud-0 composable_roles]$ openstack stack list +--------------------------------------+------------+----------------------------------+-----------------+----------------------+----------------------+ | ID | Stack Name | Project | Stack Status | Creation Time | Updated Time | +--------------------------------------+------------+----------------------------------+-----------------+----------------------+----------------------+ | 5fe06325-352e-4ba8-a40f-9ed4bd843e72 | overcloud | 99a8987f56be4a1c90a07b8cb5344e6b | UPDATE_COMPLETE | 2018-04-23T15:22:24Z | 2018-04-23T16:24:56Z | +--------------------------------------+------------+----------------------------------+-----------------+----------------------+----------------------+ (undercloud) [stack@undercloud-0 composable_roles]$ openstack server list +--------------------------------------+--------------+--------+------------------------+----------------+-----------+ | ID | Name | Status | Networks | Image | Flavor | +--------------------------------------+--------------+--------+------------------------+----------------+-----------+ | 0730e499-079b-44b4-9962-b464b9621d08 | mds-1 | ACTIVE | ctlplane=192.168.24.17 | overcloud-full | mds | | 144d0f0a-6601-4a8c-a56b-302528916b14 | mds-2 | ACTIVE | ctlplane=192.168.24.22 | overcloud-full | mds | | 0f2a82d7-4490-4c17-863f-1727af56b7a2 | controller-1 | ACTIVE | ctlplane=192.168.24.18 | overcloud-full | baremetal | | 44652461-3696-4b11-800f-34845a7156f7 | controller-2 | ACTIVE | ctlplane=192.168.24.11 | overcloud-full | baremetal | | 6c94343a-37b0-4e20-bcea-ec7ac6fdc27f | mds-0 | ACTIVE | ctlplane=192.168.24.10 | overcloud-full | mds | | 516755ea-1f73-4ebf-9b91-2ea5f7929711 | controller-0 | ACTIVE | ctlplane=192.168.24.16 | overcloud-full | baremetal | | 828354a4-feca-421a-8bea-a25311fd17dc | compute-0 | ACTIVE | ctlplane=192.168.24.8 | overcloud-full | compute | | 178e15bc-a4a1-4427-9c23-da5e8e95052a | ceph-0 | ACTIVE | ctlplane=192.168.24.12 | overcloud-full | ceph | +--------------------------------------+--------------+--------+------------------------+----------------+-----------+ (undercloud) [stack@undercloud-0 composable_roles]$ [1] (undercloud) [stack@undercloud-0 composable_roles]$ cat ~/overcloud_deploy.sh #!/bin/bash openstack overcloud deploy \ --timeout 100 \ --templates /usr/share/openstack-tripleo-heat-templates \ --libvirt-type kvm \ --stack overcloud \ -r /home/stack/composable_roles/roles/roles_data.yaml \ -e /home/stack/composable_roles/roles/nodes.yaml \ -e /home/stack/composable_roles/roles/network-config.yaml \ -e /home/stack/composable_roles/internal.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -e /home/stack/composable_roles/network/network-environment.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml \ -e /home/stack/composable_roles/debug.yaml \ --environment-file /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-mds.yaml \ --environment-file /usr/share/openstack-tripleo-heat-templates/environments/manila-cephfsnative-config.yaml \ -e /home/stack/composable_roles/ceph-single-host-mode.yaml \ -e /home/stack/composable_roles/docker-images.yaml \ --log-file overcloud_deployment_69.log (undercloud) [stack@undercloud-0 composable_roles]$ [2] (undercloud) [stack@undercloud-0 composable_roles]$ cat ~/composable_roles/roles/nodes.yaml parameter_defaults: ControllerCount: 3 OvercloudControlFlavor: controller MdsCount: 3 OvercloudMdsFlavor: mds ComputeCount: 1 OvercloudComputeFlavor: compute CephStorageCount: 1 OvercloudCephStorageFlavor: ceph As per our CephFS documentation [1], we should have at least one standby MDSdaemon to remain highly available. I followed the documentation and the warning went away. The Ceph installation documentation [2] states: "After installing Metadata Servers, configure them. For details, see the Configuring Metadata Server Daemons chapter in the Ceph File System Guide for Red Hat Ceph Storage 3". Also, There are no additional tasks in ceph-ansible to do this for the MDS role [3] so TripleO couldn't trigger an additional step; by all indications this is an additional admin task like the one I did below. I think the answer here is to ensure that there is doc text in this bug describing the additional step and linking to the Ceph documentation. [root@controller-0 ~]# ceph fs status cephfs cephfs - 0 clients ====== +------+--------+-------+---------------+-------+-------+ | Rank | State | MDS | Activity | dns | inos | +------+--------+-------+---------------+-------+-------+ | 0 | active | mds-0 | Reqs: 0 /s | 10 | 12 | | 1 | active | mds-2 | Reqs: 0 /s | 10 | 12 | | 2 | active | mds-1 | Reqs: 0 /s | 10 | 12 | +------+--------+-------+---------------+-------+-------+ +-----------------+----------+-------+-------+ | Pool | type | used | avail | +-----------------+----------+-------+-------+ | manila_metadata | metadata | 4870 | 92.0G | | manila_data | data | 0 | 92.0G | +-----------------+----------+-------+-------+ +-------------+ | Standby MDS | +-------------+ +-------------+ MDS version: ceph version 12.2.4-6.el7cp (78f60b924802e34d44f7078029a40dbe6c0c922f) luminous (stable) [root@controller-0 ~]# [root@controller-0 ~]# ceph fs set cephfs max_mds 2 [root@controller-0 ~]# [root@controller-0 ~]# ceph mds deactivate cephfs:2 telling mds.1:2 172.17.3.10:6800/3387458738 to deactivate [root@controller-0 ~]# [root@controller-0 ~]# ceph fs status cephfs cephfs - 0 clients ====== +------+----------+-------+---------------+-------+-------+ | Rank | State | MDS | Activity | dns | inos | +------+----------+-------+---------------+-------+-------+ | 0 | active | mds-0 | Reqs: 0 /s | 10 | 12 | | 1 | active | mds-2 | Reqs: 0 /s | 10 | 12 | | 2 | stopping | mds-1 | | 0 | 1 | +------+----------+-------+---------------+-------+-------+ +-----------------+----------+-------+-------+ | Pool | type | used | avail | +-----------------+----------+-------+-------+ | manila_metadata | metadata | 4870 | 92.0G | | manila_data | data | 0 | 92.0G | +-----------------+----------+-------+-------+ +-------------+ | Standby MDS | +-------------+ +-------------+ MDS version: ceph version 12.2.4-6.el7cp (78f60b924802e34d44f7078029a40dbe6c0c922f) luminous (stable) [root@controller-0 ~]# [root@controller-0 ~]# ceph fs status cephfs cephfs - 0 clients ====== +------+--------+-------+---------------+-------+-------+ | Rank | State | MDS | Activity | dns | inos | +------+--------+-------+---------------+-------+-------+ | 0 | active | mds-0 | Reqs: 0 /s | 10 | 12 | | 1 | active | mds-2 | Reqs: 0 /s | 10 | 12 | +------+--------+-------+---------------+-------+-------+ +-----------------+----------+-------+-------+ | Pool | type | used | avail | +-----------------+----------+-------+-------+ | manila_metadata | metadata | 4870 | 92.0G | | manila_data | data | 0 | 92.0G | +-----------------+----------+-------+-------+ +-------------+ | Standby MDS | +-------------+ | mds-1 | +-------------+ MDS version: ceph version 12.2.4-6.el7cp (78f60b924802e34d44f7078029a40dbe6c0c922f) luminous (stable) [root@controller-0 ~]# [root@controller-0 ~]# ceph -s cluster: id: e20d9670-46fb-11e8-a706-5254004123d2 health: HEALTH_OK services: mon: 3 daemons, quorum controller-1,controller-0,controller-2 mgr: controller-2(active), standbys: controller-1, controller-0 mds: cephfs-2/2/2 up {0=mds-0=up:active,1=mds-2=up:active}, 1 up:standby osd: 5 osds: 5 up, 5 in data: pools: 7 pools, 416 pgs objects: 57 objects, 4870 bytes usage: 543 MB used, 99241 MB / 99784 MB avail pgs: 416 active+clean [root@controller-0 ~]# [1] https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html-single/ceph_file_system_guide/#configuring-standby-metadata-server-daemons [2] https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html-single/installation_guide_for_red_hat_enterprise_linux/#installing-metadata-servers [3] https://github.com/ceph/ceph-ansible/tree/master/roles/ceph-mds/tasks Thank you @fultonj for your input, I agree that we should inform the users via documentation about the additional steps for them to take The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |