2024154 – MDSMonitor: no active MDS after cluster deployment

Bug 2024154 - MDSMonitor: no active MDS after cluster deployment

Summary: MDSMonitor: no active MDS after cluster deployment

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	CephFS
Sub Component:
Version:	5.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	5.1
Assignee:	Venky Shankar
QA Contact:	Amarnath
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-11-17 12:48 UTC by Venky Shankar
Modified:	2022-04-04 10:23 UTC (History)
CC List:	3 users (show)
Fixed In Version:	ceph-16.2.6-40.el8cp
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-04-04 10:22:55 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Ceph Project Bug Tracker	53231	None	None	None	2021-11-17 12:50:58 UTC
Ceph Project Bug Tracker	53232	None	None	None	2021-11-17 12:48:39 UTC
Red Hat Issue Tracker	RHCEPH-2365	None	None	None	2021-11-17 12:51:35 UTC
Red Hat Product Errata	RHSA-2022:1174	None	None	None	2022-04-04 10:23:22 UTC

Description Venky Shankar 2021-11-17 12:48:40 UTC

This happens starting v16.2.6 if CephFS volume creation and setting allow_standby_replay mode occur before MDS daemons creation.

Comment 9 Amarnath 2022-01-15 18:42:27 UTC

Hi @vshankar

Comment 10 Amarnath 2022-01-15 18:45:15 UTC

I have tried making allow_standby_reply to True as soon as created file system.

I see standby mds node going to hot standby mode.

Cluster is in healthy state

2022-01-15 23:45:46,290 - ceph.ceph - INFO - Running command cephadm -v shell -- ceph fs volume create cephfs on 10.0.210.201
2022-01-15 23:45:51,017 - ceph.ceph - INFO - Command completed successfully
2022-01-15 23:45:51,017 - ceph.ceph - INFO - Running command cephadm -v shell -- ceph fs set cephfs allow_standby_replay true on 10.0.210.201
2022-01-15 23:45:54,805 - ceph.ceph - INFO - Command completed successfully
2022-01-15 23:45:54,806 - ceph.ceph - INFO - Running command cephadm -v shell -- ceph orch  --verbose apply mds cephfs --placement='label:mds' on 10.0.210.201
2022-01-15 23:45:58,158 - ceph.ceph - INFO - Command completed successfully
2022-01-15 23:46:03,161 - ceph.ceph - INFO - Running command cephadm -v shell -- ceph orch  --format json ls  --service_name mds.cephfs --refresh on 10.0.210.201
2022-01-15 23:46:05,452 - ceph.ceph - INFO - Command completed successfully
2022-01-15 23:46:05,453 - root - INFO - 4/2 mds.cephfs daemon(s) up... retrying
2022-01-15 23:46:10,454 - ceph.ceph - INFO - Running command cephadm -v shell -- ceph orch  --format json ls  --service_name mds.cephfs --refresh on 10.0.210.201
2022-01-15 23:46:12,805 - ceph.ceph - INFO - Command completed successfully
2022-01-15 23:46:12,806 - root - INFO - 5/3 mds.cephfs daemon(s) up... retrying
2022-01-15 23:46:17,808 - ceph.ceph - INFO - Running command cephadm -v shell -- ceph orch  --format json ls  --service_name mds.cephfs --refresh on 10.0.210.201
2022-01-15 23:46:20,174 - ceph.ceph - INFO - Command completed successfully
2022-01-15 23:46:20,175 - root - INFO - 4/3 mds.cephfs daemon(s) up... retrying
2022-01-15 23:46:25,176 - ceph.ceph - INFO - Running command cephadm -v shell -- ceph orch  --format json ls  --service_name mds.cephfs --refresh on 10.0.210.201
2022-01-15 23:46:27,752 - ceph.ceph - INFO - Command completed successfully
2022-01-15 23:46:27,753 - root - INFO - 3/3 mds.cephfs daemon(s) up... retrying
2022-01-15 23:46:27,754 - ceph.ceph - INFO - Running command cephadm shell -- ceph -s on 10.0.210.201
2022-01-15 23:46:30,508 - ceph.ceph - INFO - Command completed successfully
2022-01-15 23:46:30,509 - ceph.ceph - INFO -   cluster:
    id:     61561870-762e-11ec-aa3e-fa163e5d3b8c
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum ceph-amk5-1-7ho54u-node1-installer,ceph-amk5-1-7ho54u-node2,ceph-amk5-1-7ho54u-node3 (age 2m)
    mgr: ceph-amk5-1-7ho54u-node1-installer.kjkddn(active, since 5m), standbys: ceph-amk5-1-7ho54u-node2.poqczo
    mds: 1/1 daemons up, 1 standby, 1 hot standby
    osd: 12 osds: 12 up (since 47s), 12 in (since 75s)
 
  data:
    volumes: 1/1 healthy
    pools:   3 pools, 65 pgs
    objects: 22 objects, 2.3 KiB
    usage:   70 MiB used, 180 GiB / 180 GiB avail
    pgs:     65 active+clean
 
  io:
    client:   5.3 KiB/s rd, 0 B/s wr, 5 op/s rd, 0 op/s wr

Is this expected as per the fix

Comment 11 Venky Shankar 2022-01-17 06:06:11 UTC

(In reply to Amarnath from comment #10)
> I have tried making allow_standby_reply to True as soon as created file
> system.
> 
> I see standby mds node going to hot standby mode.
> 
> Cluster is in healthy state
> 
> 2022-01-15 23:45:46,290 - ceph.ceph - INFO - Running command cephadm -v
> shell -- ceph fs volume create cephfs on 10.0.210.201
> 2022-01-15 23:45:51,017 - ceph.ceph - INFO - Command completed successfully
> 2022-01-15 23:45:51,017 - ceph.ceph - INFO - Running command cephadm -v
> shell -- ceph fs set cephfs allow_standby_replay true on 10.0.210.201
> 2022-01-15 23:45:54,805 - ceph.ceph - INFO - Command completed successfully
> 2022-01-15 23:45:54,806 - ceph.ceph - INFO - Running command cephadm -v
> shell -- ceph orch  --verbose apply mds cephfs --placement='label:mds' on
> 10.0.210.201
> 2022-01-15 23:45:58,158 - ceph.ceph - INFO - Command completed successfully
> 2022-01-15 23:46:03,161 - ceph.ceph - INFO - Running command cephadm -v
> shell -- ceph orch  --format json ls  --service_name mds.cephfs --refresh on
> 10.0.210.201
> 2022-01-15 23:46:05,452 - ceph.ceph - INFO - Command completed successfully
> 2022-01-15 23:46:05,453 - root - INFO - 4/2 mds.cephfs daemon(s) up...
> retrying
> 2022-01-15 23:46:10,454 - ceph.ceph - INFO - Running command cephadm -v
> shell -- ceph orch  --format json ls  --service_name mds.cephfs --refresh on
> 10.0.210.201
> 2022-01-15 23:46:12,805 - ceph.ceph - INFO - Command completed successfully
> 2022-01-15 23:46:12,806 - root - INFO - 5/3 mds.cephfs daemon(s) up...
> retrying
> 2022-01-15 23:46:17,808 - ceph.ceph - INFO - Running command cephadm -v
> shell -- ceph orch  --format json ls  --service_name mds.cephfs --refresh on
> 10.0.210.201
> 2022-01-15 23:46:20,174 - ceph.ceph - INFO - Command completed successfully
> 2022-01-15 23:46:20,175 - root - INFO - 4/3 mds.cephfs daemon(s) up...
> retrying
> 2022-01-15 23:46:25,176 - ceph.ceph - INFO - Running command cephadm -v
> shell -- ceph orch  --format json ls  --service_name mds.cephfs --refresh on
> 10.0.210.201
> 2022-01-15 23:46:27,752 - ceph.ceph - INFO - Command completed successfully
> 2022-01-15 23:46:27,753 - root - INFO - 3/3 mds.cephfs daemon(s) up...
> retrying
> 2022-01-15 23:46:27,754 - ceph.ceph - INFO - Running command cephadm shell
> -- ceph -s on 10.0.210.201
> 2022-01-15 23:46:30,508 - ceph.ceph - INFO - Command completed successfully
> 2022-01-15 23:46:30,509 - ceph.ceph - INFO -   cluster:
>     id:     61561870-762e-11ec-aa3e-fa163e5d3b8c
>     health: HEALTH_OK
>  
>   services:
>     mon: 3 daemons, quorum
> ceph-amk5-1-7ho54u-node1-installer,ceph-amk5-1-7ho54u-node2,ceph-amk5-1-
> 7ho54u-node3 (age 2m)
>     mgr: ceph-amk5-1-7ho54u-node1-installer.kjkddn(active, since 5m),
> standbys: ceph-amk5-1-7ho54u-node2.poqczo
>     mds: 1/1 daemons up, 1 standby, 1 hot standby
>     osd: 12 osds: 12 up (since 47s), 12 in (since 75s)
>  
>   data:
>     volumes: 1/1 healthy
>     pools:   3 pools, 65 pgs
>     objects: 22 objects, 2.3 KiB
>     usage:   70 MiB used, 180 GiB / 180 GiB avail
>     pgs:     65 active+clean
>  
>   io:
>     client:   5.3 KiB/s rd, 0 B/s wr, 5 op/s rd, 0 op/s wr
> 
> Is this expected as per the fix

Looks good.

Comment 13 errata-xmlrpc 2022-04-04 10:22:55 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage 5.1 Security, Enhancement, and Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:1174

Note You need to log in before you can comment on or make changes to this bug.