Bug 1812796

Summary:	[DOCS] RHCS 4 Need ceph-ansible [mds] uninstall documentation
Product:	[Red Hat Storage] Red Hat Ceph Storage	Reporter:	Jerrin Jose <jjose>
Component:	Documentation	Assignee:	Ranjini M N <rmandyam>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Yogesh Mane <ymane>
Severity:	medium	Docs Contact:	Aron Gunn <agunn>
Priority:	unspecified
Version:	4.0	CC:	agunn, asriram, hyelloji, kdreyer, rmandyam, ymane
Target Milestone:	rc
Target Release:	4.3
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-09-16 13:40:47 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1966534

Comment 1 John Brier 2021-03-01 21:25:35 UTC

This should at least include covering shrink-mds.yml and perhaps other info related to ensuring the FS stays accessible on the standby MDS if required.

Comment 2 John Brier 2021-03-03 19:25:52 UTC

I tested this procedure. In my testing it did remove the node from the cluster but not from the Ansible inventory file, so it would be reprovisioned if site.yml/site-containers.yml was run again. I wonder if all the shrink playbooks do not remove the node from the inventory file. If so we should mention that/add a step to do that in the procedures we already have on removing OSDs and MONs.

Cool thing: I asked it to remove the active MDS and it did that and made the standby the new active.

When I rerun it on the remaining active MDS, it removes the FS too. It does not remove the pools, however.

Testing logs:

== Cluster state before remove

[admin@cluster1-node1 infrastructure-playbooks]$ cat ../hosts
[grafana-server]
cluster1-node2

[mons]
cluster1-node2
cluster1-node3
cluster1-node4

[osds]
cluster1-node2
cluster1-node3
cluster1-node4
cluster1-node5
cluster1-node6

[mgrs]
cluster1-node2
cluster1-node3
cluster1-node4

[mdss]
cluster1-node5
cluster1-node6

[clients]



[root@cluster1-node2 ~]# ceph -s
  cluster:
    id:     bb89661e-7d6c-48af-8473-ebfe6c2cdc31
    health: HEALTH_WARN
            2 pool(s) have non-power-of-two pg_num
 
  services:
    mon: 3 daemons, quorum cluster1-node2,cluster1-node3,cluster1-node4 (age 6m)
    mgr: cluster1-node3(active, since 6m), standbys: cluster1-node4, cluster1-node2
    mds: cephfs:1 {0=cluster1-node5=up:active} 1 up:standby
    osd: 5 osds: 5 up (since 6m), 5 in (since 6m)
 
  task status:
    scrub status:
        mds.cluster1-node5: idle
 
  data:
    pools:   10 pools, 401 pgs
    objects: 1.54k objects, 5.6 GiB
    usage:   23 GiB used, 52 GiB / 75 GiB avail
    pgs:     401 active+clean
 
[root@cluster1-node2 ~]# ceph mds stat
cephfs:1 {0=cluster1-node5=up:active} 1 up:standby
[root@cluster1-node2 ~]# ceph fs dump
dumped fsmap epoch 2487
e2487
enable_multiple, ever_enabled_multiple: 0,0
compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2,10=snaprealm v2}
legacy client fscid: 2
 
Filesystem 'cephfs' (2)
fs_name	cephfs
epoch	2487
flags	12
created	2021-02-18 17:32:23.929674
modified	2021-03-01 16:35:44.159538
tableserver	0
root	0
session_timeout	60
session_autoclose	300
max_file_size	1099511627776
min_compat_client	-1 (unspecified)
last_failure	0
last_failure_osd_epoch	1044
compat	compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2,10=snaprealm v2}
max_mds	1
in	0
up	{0=454117}
failed	
damaged	
stopped	
data_pools	[16]
metadata_pool	17
inline_data	disabled
balancer	
standby_count_wanted	1
[mds.cluster1-node5{0:454117} state up:active seq 733 addr [v2:192.168.0.35:6800/2671665574,v1:192.168.0.35:6801/2671665574]]
 
 
Standby daemons:
 
[mds.cluster1-node6{-1:454128} state up:standby seq 1 addr [v2:192.168.0.36:6800/3651113900,v1:192.168.0.36:6801/3651113900]]


[root@cluster1-node3 ~]# ceph osd pool ls
block-device-pool
device_health_metrics
.rgw.root
default.rgw.control
default.rgw.meta
default.rgw.log
default.rgw.otp
trash-test-pool
cephfs_data
cephfs_metadata

[root@client1 ~]# mount -t ceph :/ /mnt/cephfs/ -o name=1
[root@client1 ~]# df -h
Filesystem                                               Size  Used Avail Use% Mounted on
devtmpfs                                                 880M     0  880M   0% /dev
tmpfs                                                    897M   84K  896M   1% /dev/shm
tmpfs                                                    897M   18M  879M   2% /run
tmpfs                                                    897M     0  897M   0% /sys/fs/cgroup
/dev/mapper/rhel-root                                     13G  2.7G  9.9G  22% /
/dev/nvme0n1p1                                          1014M  324M  691M  32% /boot
shm                                                       63M     0   63M   0% /var/lib/containers/storage/overlay-containers/f558efc6f3f8714ee7e2c89547a94238eb2c3e0bda8444fdcaefd730328f43ce/userdata/shm
overlay                                                   13G  2.7G  9.9G  22% /var/lib/containers/storage/overlay/dc694a6f437c1be6e0b8de4d8802ca2126ac6c898ae2f0f13dd16b2ee6f454d4/merged
tmpfs                                                    180M     0  180M   0% /run/user/0
192.168.0.32:6789,192.168.0.33:6789,192.168.0.34:6789:/   16G     0   16G   0% /mnt/cephfs
[root@client1 ~]# ls /mnt/cephfs/
[root@client1 ~]# ls
anaconda-ks.cfg  ceph.client.1.secret.backup  ceph.client.admin.secret.backup


== Remove active MDS

[admin@cluster1-node1 ceph-ansible]$ ansible-playbook infrastructure-playbooks/shrink-mds.yml -e mds_to_kill=cluster1-node5 -i hosts

PLAY [gather facts and check the init system] ****************************************************************

TASK [Gathering Facts] ***************************************************************************************
Wednesday 03 March 2021  13:55:04 -0500 (0:00:00.052)       0:00:00.052 ******* 
ok: [cluster1-node4]
ok: [cluster1-node2]
ok: [cluster1-node6]
ok: [cluster1-node5]
ok: [cluster1-node3]

TASK [debug] *************************************************************************************************
Wednesday 03 March 2021  13:55:07 -0500 (0:00:02.584)       0:00:02.636 ******* 
ok: [cluster1-node2] => 
  msg: gather facts on all Ceph hosts for following reference
ok: [cluster1-node3] => 
  msg: gather facts on all Ceph hosts for following reference
ok: [cluster1-node4] => 
  msg: gather facts on all Ceph hosts for following reference
ok: [cluster1-node5] => 
  msg: gather facts on all Ceph hosts for following reference
ok: [cluster1-node6] => 
  msg: gather facts on all Ceph hosts for following reference

TASK [ceph-facts : check if podman binary is present] ********************************************************
Wednesday 03 March 2021  13:55:07 -0500 (0:00:00.096)       0:00:02.733 ******* 
ok: [cluster1-node3]
ok: [cluster1-node5]
ok: [cluster1-node4]
ok: [cluster1-node6]
ok: [cluster1-node2]

TASK [ceph-facts : set_fact container_binary] ****************************************************************
Wednesday 03 March 2021  13:55:08 -0500 (0:00:00.934)       0:00:03.667 ******* 
ok: [cluster1-node2]
ok: [cluster1-node3]
ok: [cluster1-node4]
ok: [cluster1-node5]
ok: [cluster1-node6]
Are you sure you want to shrink the cluster? [no]: yes

PLAY [perform checks, remove mds and print cluster health] ***************************************************

TASK [exit playbook, if no mds was given] ********************************************************************
Wednesday 03 March 2021  13:55:37 -0500 (0:00:29.090)       0:00:32.758 ******* 
skipping: [cluster1-node2]

TASK [exit playbook, if the mds is not part of the inventory] ************************************************
Wednesday 03 March 2021  13:55:37 -0500 (0:00:00.020)       0:00:32.779 ******* 
skipping: [cluster1-node2]

TASK [exit playbook, if user did not mean to shrink cluster] *************************************************
Wednesday 03 March 2021  13:55:37 -0500 (0:00:00.021)       0:00:32.800 ******* 
skipping: [cluster1-node2]

TASK [set_fact container_exec_cmd for mon0] ******************************************************************
Wednesday 03 March 2021  13:55:37 -0500 (0:00:00.032)       0:00:32.832 ******* 
skipping: [cluster1-node2]

TASK [exit playbook, if can not connect to the cluster] ******************************************************
Wednesday 03 March 2021  13:55:37 -0500 (0:00:00.022)       0:00:32.855 ******* 
changed: [cluster1-node2]

TASK [set_fact mds_to_kill_hostname] *************************************************************************
Wednesday 03 March 2021  13:55:38 -0500 (0:00:00.602)       0:00:33.457 ******* 
ok: [cluster1-node2]

TASK [exit mds when containerized deployment] ****************************************************************
Wednesday 03 March 2021  13:55:38 -0500 (0:00:00.028)       0:00:33.485 ******* 
skipping: [cluster1-node2]

TASK [get ceph status] ***************************************************************************************
Wednesday 03 March 2021  13:55:38 -0500 (0:00:00.019)       0:00:33.505 ******* 
changed: [cluster1-node2]

TASK [set_fact current_max_mds] ******************************************************************************
Wednesday 03 March 2021  13:55:38 -0500 (0:00:00.513)       0:00:34.019 ******* 
ok: [cluster1-node2]

TASK [fail if removing that mds node wouldn't satisfy max_mds anymore] ***************************************
Wednesday 03 March 2021  13:55:38 -0500 (0:00:00.021)       0:00:34.041 ******* 
skipping: [cluster1-node2]

TASK [stop mds service] **************************************************************************************
Wednesday 03 March 2021  13:55:38 -0500 (0:00:00.038)       0:00:34.079 ******* 
changed: [cluster1-node2 -> cluster1-node5]

TASK [ensure that the mds is stopped] ************************************************************************
Wednesday 03 March 2021  13:55:42 -0500 (0:00:03.622)       0:00:37.702 ******* 
changed: [cluster1-node2 -> cluster1-node5]

TASK [get new ceph status] ***********************************************************************************
Wednesday 03 March 2021  13:55:42 -0500 (0:00:00.216)       0:00:37.918 ******* 
changed: [cluster1-node2]

TASK [get active mds nodes list] *****************************************************************************
Wednesday 03 March 2021  13:55:43 -0500 (0:00:00.467)       0:00:38.386 ******* 
ok: [cluster1-node2] => (item={'filesystem_id': 2, 'rank': 0, 'name': 'cluster1-node6', 'status': 'up:rejoin', 'gid': 454128})

TASK [get ceph fs dump status] *******************************************************************************
Wednesday 03 March 2021  13:55:43 -0500 (0:00:00.031)       0:00:38.417 ******* 
changed: [cluster1-node2]

TASK [create a list of standby mdss] *************************************************************************
Wednesday 03 March 2021  13:55:43 -0500 (0:00:00.472)       0:00:38.890 ******* 
ok: [cluster1-node2]

TASK [fail if mds just killed is being reported as active or standby] ****************************************
Wednesday 03 March 2021  13:55:43 -0500 (0:00:00.018)       0:00:38.909 ******* 
skipping: [cluster1-node2]

TASK [delete the filesystem when killing last mds] ***********************************************************
Wednesday 03 March 2021  13:55:43 -0500 (0:00:00.019)       0:00:38.928 ******* 
skipping: [cluster1-node2]

TASK [purge mds store] ***************************************************************************************
Wednesday 03 March 2021  13:55:43 -0500 (0:00:00.029)       0:00:38.958 ******* 
changed: [cluster1-node2 -> cluster1-node5]

TASK [show ceph health] **************************************************************************************
Wednesday 03 March 2021  13:55:43 -0500 (0:00:00.345)       0:00:39.303 ******* 
changed: [cluster1-node2]

PLAY RECAP ***************************************************************************************************
cluster1-node2             : ok=16   changed=8    unreachable=0    failed=0    skipped=8    rescued=0    ignored=0   
cluster1-node3             : ok=4    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
cluster1-node4             : ok=4    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
cluster1-node5             : ok=4    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
cluster1-node6             : ok=4    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   


Wednesday 03 March 2021  13:55:44 -0500 (0:00:00.486)       0:00:39.790 ******* 
=============================================================================== 
ceph-facts : set_fact container_binary --------------------------------------------------------------- 29.09s
stop mds service -------------------------------------------------------------------------------------- 3.62s
Gathering Facts --------------------------------------------------------------------------------------- 2.58s
ceph-facts : check if podman binary is present -------------------------------------------------------- 0.93s
exit playbook, if can not connect to the cluster ------------------------------------------------------ 0.60s
get ceph status --------------------------------------------------------------------------------------- 0.51s
show ceph health -------------------------------------------------------------------------------------- 0.49s
get ceph fs dump status ------------------------------------------------------------------------------- 0.47s
get new ceph status ----------------------------------------------------------------------------------- 0.47s
purge mds store --------------------------------------------------------------------------------------- 0.35s
ensure that the mds is stopped ------------------------------------------------------------------------ 0.22s
debug ------------------------------------------------------------------------------------------------- 0.10s
fail if removing that mds node wouldn't satisfy max_mds anymore --------------------------------------- 0.04s
exit playbook, if user did not mean to shrink cluster ------------------------------------------------- 0.03s
get active mds nodes list ----------------------------------------------------------------------------- 0.03s
delete the filesystem when killing last mds ----------------------------------------------------------- 0.03s
set_fact mds_to_kill_hostname ------------------------------------------------------------------------- 0.03s
set_fact container_exec_cmd for mon0 ------------------------------------------------------------------ 0.02s
exit playbook, if the mds is not part of the inventory ------------------------------------------------ 0.02s
set_fact current_max_mds ------------------------------------------------------------------------------ 0.02s


== Cluster state after removal of first MDS

[root@cluster1-node2 ~]# ceph -s
  cluster:
    id:     bb89661e-7d6c-48af-8473-ebfe6c2cdc31
    health: HEALTH_WARN
            insufficient standby MDS daemons available
            2 pool(s) have non-power-of-two pg_num
 
  services:
    mon: 3 daemons, quorum cluster1-node2,cluster1-node3,cluster1-node4 (age 16m)
    mgr: cluster1-node3(active, since 16m), standbys: cluster1-node4, cluster1-node2
    mds: cephfs:1 {0=cluster1-node6=up:active}
    osd: 5 osds: 5 up (since 16m), 5 in (since 16m)
 
  task status:
    scrub status:
        mds.cluster1-node6: idle
 
  data:
    pools:   10 pools, 401 pgs
    objects: 1.54k objects, 5.6 GiB
    usage:   23 GiB used, 52 GiB / 75 GiB avail
    pgs:     401 active+clean
 
[root@cluster1-node2 ~]# ceph fs dump
dumped fsmap epoch 2492
e2492
enable_multiple, ever_enabled_multiple: 0,0
compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2,10=snaprealm v2}
legacy client fscid: 2
 
Filesystem 'cephfs' (2)
fs_name	cephfs
epoch	2492
flags	12
created	2021-02-18 17:32:23.929674
modified	2021-03-03 13:55:43.234398
tableserver	0
root	0
session_timeout	60
session_autoclose	300
max_file_size	1099511627776
min_compat_client	-1 (unspecified)
last_failure	0
last_failure_osd_epoch	1111
compat	compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2,10=snaprealm v2}
max_mds	1
in	0
up	{0=454128}
failed	
damaged	
stopped	
data_pools	[16]
metadata_pool	17
inline_data	disabled
balancer	
standby_count_wanted	1
[mds.cluster1-node6{0:454128} state up:active seq 41512 addr [v2:192.168.0.36:6800/3651113900,v1:192.168.0.36:6801/3651113900]]

[root@client1 ~]# mount -t ceph :/ /mnt/cephfs/ -o name=1
[root@client1 ~]# umount /mnt/cephfs 


== Removal of second MDS

[admin@cluster1-node1 ceph-ansible]$ ansible-playbook infrastructure-playbooks/shrink-mds.yml -e mds_to_kill=cluster1-node6 -i hosts

PLAY [gather facts and check the init system] ****************************************************************

TASK [debug] *************************************************************************************************
Wednesday 03 March 2021  14:01:26 -0500 (0:00:00.070)       0:00:00.070 ******* 
ok: [cluster1-node2] => 
  msg: gather facts on all Ceph hosts for following reference
ok: [cluster1-node3] => 
  msg: gather facts on all Ceph hosts for following reference
ok: [cluster1-node4] => 
  msg: gather facts on all Ceph hosts for following reference
ok: [cluster1-node5] => 
  msg: gather facts on all Ceph hosts for following reference
ok: [cluster1-node6] => 
  msg: gather facts on all Ceph hosts for following reference

TASK [ceph-facts : check if podman binary is present] ********************************************************
Wednesday 03 March 2021  14:01:26 -0500 (0:00:00.109)       0:00:00.180 ******* 
ok: [cluster1-node4]
ok: [cluster1-node3]
ok: [cluster1-node5]
ok: [cluster1-node6]
ok: [cluster1-node2]

TASK [ceph-facts : set_fact container_binary] ****************************************************************
Wednesday 03 March 2021  14:01:27 -0500 (0:00:00.630)       0:00:00.810 ******* 
ok: [cluster1-node2]
ok: [cluster1-node3]
ok: [cluster1-node4]
ok: [cluster1-node5]
ok: [cluster1-node6]
Are you sure you want to shrink the cluster? [no]: yes

PLAY [perform checks, remove mds and print cluster health] ***************************************************

TASK [exit playbook, if no mds was given] ********************************************************************
Wednesday 03 March 2021  14:01:29 -0500 (0:00:02.253)       0:00:03.064 ******* 
skipping: [cluster1-node2]

TASK [exit playbook, if the mds is not part of the inventory] ************************************************
Wednesday 03 March 2021  14:01:29 -0500 (0:00:00.027)       0:00:03.091 ******* 
skipping: [cluster1-node2]

TASK [exit playbook, if user did not mean to shrink cluster] *************************************************
Wednesday 03 March 2021  14:01:29 -0500 (0:00:00.025)       0:00:03.117 ******* 
skipping: [cluster1-node2]

TASK [set_fact container_exec_cmd for mon0] ******************************************************************
Wednesday 03 March 2021  14:01:29 -0500 (0:00:00.037)       0:00:03.154 ******* 
skipping: [cluster1-node2]

TASK [exit playbook, if can not connect to the cluster] ******************************************************
Wednesday 03 March 2021  14:01:29 -0500 (0:00:00.024)       0:00:03.179 ******* 
changed: [cluster1-node2]

TASK [set_fact mds_to_kill_hostname] *************************************************************************
Wednesday 03 March 2021  14:01:30 -0500 (0:00:00.577)       0:00:03.756 ******* 
ok: [cluster1-node2]

TASK [exit mds when containerized deployment] ****************************************************************
Wednesday 03 March 2021  14:01:30 -0500 (0:00:00.038)       0:00:03.795 ******* 
skipping: [cluster1-node2]

TASK [get ceph status] ***************************************************************************************
Wednesday 03 March 2021  14:01:30 -0500 (0:00:00.024)       0:00:03.819 ******* 
changed: [cluster1-node2]

TASK [set_fact current_max_mds] ******************************************************************************
Wednesday 03 March 2021  14:01:30 -0500 (0:00:00.476)       0:00:04.296 ******* 
ok: [cluster1-node2]

TASK [fail if removing that mds node wouldn't satisfy max_mds anymore] ***************************************
Wednesday 03 March 2021  14:01:30 -0500 (0:00:00.024)       0:00:04.321 ******* 
skipping: [cluster1-node2]

TASK [stop mds service] **************************************************************************************
Wednesday 03 March 2021  14:01:30 -0500 (0:00:00.038)       0:00:04.359 ******* 
changed: [cluster1-node2 -> cluster1-node6]

TASK [ensure that the mds is stopped] ************************************************************************
Wednesday 03 March 2021  14:01:34 -0500 (0:00:03.851)       0:00:08.211 ******* 
changed: [cluster1-node2 -> cluster1-node6]

TASK [get new ceph status] ***********************************************************************************
Wednesday 03 March 2021  14:01:34 -0500 (0:00:00.215)       0:00:08.426 ******* 
changed: [cluster1-node2]

TASK [get active mds nodes list] *****************************************************************************
Wednesday 03 March 2021  14:01:35 -0500 (0:00:00.473)       0:00:08.899 ******* 

TASK [get ceph fs dump status] *******************************************************************************
Wednesday 03 March 2021  14:01:35 -0500 (0:00:00.022)       0:00:08.921 ******* 
changed: [cluster1-node2]

TASK [create a list of standby mdss] *************************************************************************
Wednesday 03 March 2021  14:01:35 -0500 (0:00:00.507)       0:00:09.429 ******* 
ok: [cluster1-node2]

TASK [fail if mds just killed is being reported as active or standby] ****************************************
Wednesday 03 March 2021  14:01:35 -0500 (0:00:00.021)       0:00:09.450 ******* 
skipping: [cluster1-node2]

TASK [delete the filesystem when killing last mds] ***********************************************************
Wednesday 03 March 2021  14:01:35 -0500 (0:00:00.022)       0:00:09.472 ******* 
changed: [cluster1-node2]

TASK [purge mds store] ***************************************************************************************
Wednesday 03 March 2021  14:01:36 -0500 (0:00:01.005)       0:00:10.478 ******* 
changed: [cluster1-node2 -> cluster1-node6]

TASK [show ceph health] **************************************************************************************
Wednesday 03 March 2021  14:01:37 -0500 (0:00:00.331)       0:00:10.809 ******* 
changed: [cluster1-node2]

PLAY RECAP ***************************************************************************************************
cluster1-node2             : ok=15   changed=9    unreachable=0    failed=0    skipped=8    rescued=0    ignored=0   
cluster1-node3             : ok=3    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
cluster1-node4             : ok=3    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
cluster1-node5             : ok=3    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
cluster1-node6             : ok=3    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   


Wednesday 03 March 2021  14:01:37 -0500 (0:00:00.530)       0:00:11.339 ******* 
=============================================================================== 
stop mds service -------------------------------------------------------------------------------------- 3.85s
ceph-facts : set_fact container_binary ---------------------------------------------------------------- 2.25s
delete the filesystem when killing last mds ----------------------------------------------------------- 1.01s
ceph-facts : check if podman binary is present -------------------------------------------------------- 0.63s
exit playbook, if can not connect to the cluster ------------------------------------------------------ 0.58s
show ceph health -------------------------------------------------------------------------------------- 0.53s
get ceph fs dump status ------------------------------------------------------------------------------- 0.51s
get ceph status --------------------------------------------------------------------------------------- 0.48s
get new ceph status ----------------------------------------------------------------------------------- 0.47s
purge mds store --------------------------------------------------------------------------------------- 0.33s
ensure that the mds is stopped ------------------------------------------------------------------------ 0.22s
debug ------------------------------------------------------------------------------------------------- 0.11s
fail if removing that mds node wouldn't satisfy max_mds anymore --------------------------------------- 0.04s
set_fact mds_to_kill_hostname ------------------------------------------------------------------------- 0.04s
exit playbook, if user did not mean to shrink cluster ------------------------------------------------- 0.04s
exit playbook, if no mds was given -------------------------------------------------------------------- 0.03s
exit playbook, if the mds is not part of the inventory ------------------------------------------------ 0.03s
set_fact current_max_mds ------------------------------------------------------------------------------ 0.02s
set_fact container_exec_cmd for mon0 ------------------------------------------------------------------ 0.02s
exit mds when containerized deployment ---------------------------------------------------------------- 0.02s

== Cluster state after removal of second MDS

[root@cluster1-node2 ~]# ceph -s
  cluster:
    id:     bb89661e-7d6c-48af-8473-ebfe6c2cdc31
    health: HEALTH_WARN
            2 pool(s) have non-power-of-two pg_num
 
  services:
    mon: 3 daemons, quorum cluster1-node2,cluster1-node3,cluster1-node4 (age 41m)
    mgr: cluster1-node3(active, since 40m), standbys: cluster1-node4, cluster1-node2
    osd: 5 osds: 5 up (since 41m), 5 in (since 41m)
 
  data:
    pools:   10 pools, 401 pgs
    objects: 1.54k objects, 5.6 GiB
    usage:   23 GiB used, 52 GiB / 75 GiB avail
    pgs:     401 active+clean
 
[root@cluster1-node2 ~]# 


[root@cluster1-node2 ~]# ceph fs dump
dumped fsmap epoch 2494
e2494
enable_multiple, ever_enabled_multiple: 0,0
compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2,10=snaprealm v2}
legacy client fscid: -1
 
No filesystems configured

[root@cluster1-node3 ~]# ceph osd pool ls
block-device-pool
device_health_metrics
.rgw.root
default.rgw.control
default.rgw.meta
default.rgw.log
default.rgw.otp
trash-test-pool
cephfs_data
cephfs_metadata
[root@cluster1-node3 ~]# 

[root@client1 ~]# mount -t ceph :/ /mnt/cephfs/ -o name=1
mount error 110 = Connection timed out
[root@client1 ~]#

Comment 3 John Brier 2021-03-04 18:03:22 UTC

In the previous comment I said "In my testing it did remove the node from the cluster but not from the Ansible inventory file, so it would be reprovisioned if site.yml/site-containers.yml was run again."

I just reran site.yml and it did try to reprovision the MDS servers but failed because the old pools with objects in them still exist:

fatal: [cluster1-node5 -> cluster1-node2]: FAILED! => changed=false 
  cmd:
  - ceph
  - --cluster
  - ceph
  - fs
  - new
  - cephfs
  - cephfs_metadata
  - cephfs_data
  delta: '0:00:00.312410'
  end: '2021-03-04 11:47:57.381653'
  invocation:
    module_args:
      _raw_params: ceph --cluster ceph fs new cephfs cephfs_metadata cephfs_data
      _uses_shell: false
      argv: null
      chdir: null
      creates: null
      executable: null
      removes: null
      stdin: null
      stdin_add_newline: true
      strip_empty_ends: true
      warn: true
  msg: non-zero return code
  rc: 22
  start: '2021-03-04 11:47:57.069243'
  stderr: 'Error EINVAL: pool ''cephfs_metadata'' already contains some objects. Use an empty pool instead.'
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>

NO MORE HOSTS LEFT ******************************************************************************************

PLAY RECAP **************************************************************************************************
client1                    : ok=51   changed=3    unreachable=0    failed=0    skipped=141  rescued=0    ignored=0   
cluster1-node2             : ok=328  changed=24   unreachable=0    failed=0    skipped=438  rescued=0    ignored=0   
cluster1-node3             : ok=280  changed=23   unreachable=0    failed=0    skipped=396  rescued=0    ignored=0   
cluster1-node4             : ok=286  changed=23   unreachable=0    failed=0    skipped=396  rescued=0    ignored=0   
cluster1-node5             : ok=183  changed=16   unreachable=0    failed=1    skipped=331  rescued=0    ignored=0   
cluster1-node6             : ok=165  changed=14   unreachable=0    failed=0    skipped=313  rescued=0    ignored=0   


INSTALLER STATUS ********************************************************************************************
Install Ceph Monitor           : Complete (0:01:45)
Install Ceph Manager           : Complete (0:01:50)
Install Ceph OSD               : Complete (0:03:15)
Install Ceph MDS               : In Progress (0:00:26)


I noticed in the FS Guide for the procedure to remove a CephFS [1] there is an optional step at the end to remove the pools and if that is done and I run site.yml again it successfully reprovisions the MDS servers:

[root@cluster1-node2 ~]# ceph osd pool delete cephfs_metadata cephfs_metadata --yes-i-really-really-mean-it
pool 'cephfs_metadata' removed
[root@cluster1-node2 ~]# ceph osd pool delete cephfs_data cephfs_data --yes-i-really-really-mean-it
pool 'cephfs_data' removed



INSTALLER STATUS ********************************************************************************************
Install Ceph Monitor           : Complete (0:02:14)
Install Ceph Manager           : Complete (0:02:22)
Install Ceph OSD               : Complete (0:03:19)
Install Ceph MDS               : Complete (0:00:45)
Install Ceph Client            : Complete (0:00:09)
Install Ceph Grafana           : In Progress (0:05:25)
	This phase can be restarted by running: roles/ceph-grafana/tasks/main.yml
Install Ceph Node Exporter     : Complete (0:00:53)

Thursday 04 March 2021  12:58:37 -0500 (0:00:00.004)       0:19:10.927 ******** 
=============================================================================== 


Why doesn't shrink-mds.yml remove the pools? Can you recreate the FS w/ old pools? Otherwise what is the point in keeping them?

1) https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html-single/file_system_guide/index#removing-a-ceph-file-system_fs

Comment 4 John Brier 2021-03-04 21:05:40 UTC

Not only did it reprovision the MDS servers, it created an FS. I don't remember it doing that the first time I set up CephFS. If it didn't and I had to manually create it last time, maybe the old configuration was saved somewhere and ceph-ansible went ahead and recreated it based on previous settings?

[root@cluster1-node2 ~]# ceph fs dump
dumped fsmap epoch 2499
e2499
enable_multiple, ever_enabled_multiple: 0,0
compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2,10=snaprealm v2}
legacy client fscid: 3
 
Filesystem 'cephfs' (3)
fs_name	cephfs
epoch	2498
flags	12
created	2021-03-04 12:51:54.792670
modified	2021-03-04 12:52:10.404081
tableserver	0
root	0
session_timeout	60
session_autoclose	300
max_file_size	1099511627776
min_compat_client	-1 (unspecified)
last_failure	0
last_failure_osd_epoch	0
compat	compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2,10=snaprealm v2}
max_mds	1
in	0
up	{0=624306}
failed	
damaged	
stopped	
data_pools	[18]
metadata_pool	19
inline_data	disabled
balancer	
standby_count_wanted	1
[mds.cluster1-node6{0:624306} state up:active seq 3 addr [v2:192.168.0.36:6808/825808135,v1:192.168.0.36:6809/825808135]]
 
 
Standby daemons:
 
[mds.cluster1-node5{-1:644348} state up:standby seq 2 addr [v2:192.168.0.35:6808/1480844303,v1:192.168.0.35:6809/1480844303]]


[root@client1 ~]# mount -t ceph :/ /mnt/cephfs/ -o name=1
[root@client1 ~]# df
Filesystem                                              1K-blocks    Used Available Use% Mounted on
devtmpfs                                                   900552       0    900552   0% /dev
tmpfs                                                      917512      84    917428   1% /dev/shm
tmpfs                                                      917512   18048    899464   2% /run
tmpfs                                                      917512       0    917512   0% /sys/fs/cgroup
/dev/mapper/rhel-root                                    13092864 2768652  10324212  22% /
/dev/nvme0n1p1                                            1038336  331580    706756  32% /boot
shm                                                         64000       0     64000   0% /var/lib/containers/storage/overlay-containers/f558efc6f3f8714ee7e2c89547a94238eb2c3e0bda8444fdcaefd730328f43ce/userdata/shm
overlay                                                  13092864 2768652  10324212  22% /var/lib/containers/storage/overlay/dc694a6f437c1be6e0b8de4d8802ca2126ac6c898ae2f0f13dd16b2ee6f454d4/merged
tmpfs                                                      183500       0    183500   0% /run/user/0
192.168.0.32:6789,192.168.0.33:6789,192.168.0.34:6789:/  15757312       0  15757312   0% /mnt/cephfs
[root@client1 ~]#

Comment 5 John Brier 2021-03-22 21:02:27 UTC

> Not only did it reprovision the MDS servers, it created an FS. I don't remember it doing that the first time I set up CephFS.

I just tested reprovisioning MDS nodes on a new cluster and ceph-ansible does automatically create the FS during that process.

Comment 12 Yogesh Mane 2021-06-02 16:40:49 UTC

Hi

In step 1 & 6 of procedure in 4.11 section , location of hosts file should be "/usr/share/ceph-ansible/hosts" instead of /etc/ansible/hosts.

In step 3, command line needs addition of hosts parameter which would look something like "ansible-playbook infrastructure-playbooks/shrink-mds.yml -e mds_to_kill=MDS_NODE -i hosts "

And IMHO we could exchange step 1 & 2. That way we could navigate to "/usr/share/ceph-ansible/" directory in step 1 & open hosts file in step 2.

Comment 14 Yogesh Mane 2021-09-15 06:24:19 UTC

Hi

Step 4 "Optional: Repeat the process for any additional MDS nodes" should have "-i hosts" parameter

And there is additional step needed before running playbook - set max_mds to 1 (if not already set)
# ceph fs set fs_name max_mds 1
Otherwise playbook will fail

And in step 7 , first step should be removing filesystem as ansible will only remove mds.
# ceph fs rm fs_name --yes-i-really-mean-it

And step 5 should be removed as filesyetm will still be existed.
Instead we can add "ceph fs status" with output of failed filesystem with no mdss.
# ceph fs status
cephfs - 0 clients
==========
+------+--------+-----+----------+-----+------+
| Rank | State  | MDS | Activity | dns | inos |
+------+--------+-----+----------+-----+------+
|  0   | failed |     |          |     |      |
+------+--------+-----+----------+-----+------+
+-----------------+----------+-------+-------+
|       Pool      |   type   |  used | avail |
+-----------------+----------+-------+-------+
| cephfs_metadata | metadata |  627M |  108G |
|    data_pool    |   data   |  576k |  108G |
+-----------------+----------+-------+-------+
+-------------+
| Standby MDS |
+-------------+
+-------------+
+---------+---------+
| version | daemons |
+---------+---------+
+---------+---------+

Comment 16 Yogesh Mane 2021-09-16 05:32:18 UTC

LGTM