Bug 2024154 - MDSMonitor: no active MDS after cluster deployment
Summary: MDSMonitor: no active MDS after cluster deployment
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: CephFS
Version: 5.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 5.1
Assignee: Venky Shankar
QA Contact: Amarnath
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-11-17 12:48 UTC by Venky Shankar
Modified: 2022-04-04 10:23 UTC (History)
3 users (show)

Fixed In Version: ceph-16.2.6-40.el8cp
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-04-04 10:22:55 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 53231 0 None None None 2021-11-17 12:50:58 UTC
Ceph Project Bug Tracker 53232 0 None None None 2021-11-17 12:48:39 UTC
Red Hat Issue Tracker RHCEPH-2365 0 None None None 2021-11-17 12:51:35 UTC
Red Hat Product Errata RHSA-2022:1174 0 None None None 2022-04-04 10:23:22 UTC

Description Venky Shankar 2021-11-17 12:48:40 UTC
This happens starting v16.2.6 if CephFS volume creation and setting allow_standby_replay mode occur before MDS daemons creation.

Comment 9 Amarnath 2022-01-15 18:42:27 UTC
Hi @vshankar

Comment 10 Amarnath 2022-01-15 18:45:15 UTC
I have tried making allow_standby_reply to True as soon as created file system.

I see standby mds node going to hot standby mode.

Cluster is in healthy state

2022-01-15 23:45:46,290 - ceph.ceph - INFO - Running command cephadm -v shell -- ceph fs volume create cephfs on 10.0.210.201
2022-01-15 23:45:51,017 - ceph.ceph - INFO - Command completed successfully
2022-01-15 23:45:51,017 - ceph.ceph - INFO - Running command cephadm -v shell -- ceph fs set cephfs allow_standby_replay true on 10.0.210.201
2022-01-15 23:45:54,805 - ceph.ceph - INFO - Command completed successfully
2022-01-15 23:45:54,806 - ceph.ceph - INFO - Running command cephadm -v shell -- ceph orch  --verbose apply mds cephfs --placement='label:mds' on 10.0.210.201
2022-01-15 23:45:58,158 - ceph.ceph - INFO - Command completed successfully
2022-01-15 23:46:03,161 - ceph.ceph - INFO - Running command cephadm -v shell -- ceph orch  --format json ls  --service_name mds.cephfs --refresh on 10.0.210.201
2022-01-15 23:46:05,452 - ceph.ceph - INFO - Command completed successfully
2022-01-15 23:46:05,453 - root - INFO - 4/2 mds.cephfs daemon(s) up... retrying
2022-01-15 23:46:10,454 - ceph.ceph - INFO - Running command cephadm -v shell -- ceph orch  --format json ls  --service_name mds.cephfs --refresh on 10.0.210.201
2022-01-15 23:46:12,805 - ceph.ceph - INFO - Command completed successfully
2022-01-15 23:46:12,806 - root - INFO - 5/3 mds.cephfs daemon(s) up... retrying
2022-01-15 23:46:17,808 - ceph.ceph - INFO - Running command cephadm -v shell -- ceph orch  --format json ls  --service_name mds.cephfs --refresh on 10.0.210.201
2022-01-15 23:46:20,174 - ceph.ceph - INFO - Command completed successfully
2022-01-15 23:46:20,175 - root - INFO - 4/3 mds.cephfs daemon(s) up... retrying
2022-01-15 23:46:25,176 - ceph.ceph - INFO - Running command cephadm -v shell -- ceph orch  --format json ls  --service_name mds.cephfs --refresh on 10.0.210.201
2022-01-15 23:46:27,752 - ceph.ceph - INFO - Command completed successfully
2022-01-15 23:46:27,753 - root - INFO - 3/3 mds.cephfs daemon(s) up... retrying
2022-01-15 23:46:27,754 - ceph.ceph - INFO - Running command cephadm shell -- ceph -s on 10.0.210.201
2022-01-15 23:46:30,508 - ceph.ceph - INFO - Command completed successfully
2022-01-15 23:46:30,509 - ceph.ceph - INFO -   cluster:
    id:     61561870-762e-11ec-aa3e-fa163e5d3b8c
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum ceph-amk5-1-7ho54u-node1-installer,ceph-amk5-1-7ho54u-node2,ceph-amk5-1-7ho54u-node3 (age 2m)
    mgr: ceph-amk5-1-7ho54u-node1-installer.kjkddn(active, since 5m), standbys: ceph-amk5-1-7ho54u-node2.poqczo
    mds: 1/1 daemons up, 1 standby, 1 hot standby
    osd: 12 osds: 12 up (since 47s), 12 in (since 75s)
 
  data:
    volumes: 1/1 healthy
    pools:   3 pools, 65 pgs
    objects: 22 objects, 2.3 KiB
    usage:   70 MiB used, 180 GiB / 180 GiB avail
    pgs:     65 active+clean
 
  io:
    client:   5.3 KiB/s rd, 0 B/s wr, 5 op/s rd, 0 op/s wr

Is this expected as per the fix

Comment 11 Venky Shankar 2022-01-17 06:06:11 UTC
(In reply to Amarnath from comment #10)
> I have tried making allow_standby_reply to True as soon as created file
> system.
> 
> I see standby mds node going to hot standby mode.
> 
> Cluster is in healthy state
> 
> 2022-01-15 23:45:46,290 - ceph.ceph - INFO - Running command cephadm -v
> shell -- ceph fs volume create cephfs on 10.0.210.201
> 2022-01-15 23:45:51,017 - ceph.ceph - INFO - Command completed successfully
> 2022-01-15 23:45:51,017 - ceph.ceph - INFO - Running command cephadm -v
> shell -- ceph fs set cephfs allow_standby_replay true on 10.0.210.201
> 2022-01-15 23:45:54,805 - ceph.ceph - INFO - Command completed successfully
> 2022-01-15 23:45:54,806 - ceph.ceph - INFO - Running command cephadm -v
> shell -- ceph orch  --verbose apply mds cephfs --placement='label:mds' on
> 10.0.210.201
> 2022-01-15 23:45:58,158 - ceph.ceph - INFO - Command completed successfully
> 2022-01-15 23:46:03,161 - ceph.ceph - INFO - Running command cephadm -v
> shell -- ceph orch  --format json ls  --service_name mds.cephfs --refresh on
> 10.0.210.201
> 2022-01-15 23:46:05,452 - ceph.ceph - INFO - Command completed successfully
> 2022-01-15 23:46:05,453 - root - INFO - 4/2 mds.cephfs daemon(s) up...
> retrying
> 2022-01-15 23:46:10,454 - ceph.ceph - INFO - Running command cephadm -v
> shell -- ceph orch  --format json ls  --service_name mds.cephfs --refresh on
> 10.0.210.201
> 2022-01-15 23:46:12,805 - ceph.ceph - INFO - Command completed successfully
> 2022-01-15 23:46:12,806 - root - INFO - 5/3 mds.cephfs daemon(s) up...
> retrying
> 2022-01-15 23:46:17,808 - ceph.ceph - INFO - Running command cephadm -v
> shell -- ceph orch  --format json ls  --service_name mds.cephfs --refresh on
> 10.0.210.201
> 2022-01-15 23:46:20,174 - ceph.ceph - INFO - Command completed successfully
> 2022-01-15 23:46:20,175 - root - INFO - 4/3 mds.cephfs daemon(s) up...
> retrying
> 2022-01-15 23:46:25,176 - ceph.ceph - INFO - Running command cephadm -v
> shell -- ceph orch  --format json ls  --service_name mds.cephfs --refresh on
> 10.0.210.201
> 2022-01-15 23:46:27,752 - ceph.ceph - INFO - Command completed successfully
> 2022-01-15 23:46:27,753 - root - INFO - 3/3 mds.cephfs daemon(s) up...
> retrying
> 2022-01-15 23:46:27,754 - ceph.ceph - INFO - Running command cephadm shell
> -- ceph -s on 10.0.210.201
> 2022-01-15 23:46:30,508 - ceph.ceph - INFO - Command completed successfully
> 2022-01-15 23:46:30,509 - ceph.ceph - INFO -   cluster:
>     id:     61561870-762e-11ec-aa3e-fa163e5d3b8c
>     health: HEALTH_OK
>  
>   services:
>     mon: 3 daemons, quorum
> ceph-amk5-1-7ho54u-node1-installer,ceph-amk5-1-7ho54u-node2,ceph-amk5-1-
> 7ho54u-node3 (age 2m)
>     mgr: ceph-amk5-1-7ho54u-node1-installer.kjkddn(active, since 5m),
> standbys: ceph-amk5-1-7ho54u-node2.poqczo
>     mds: 1/1 daemons up, 1 standby, 1 hot standby
>     osd: 12 osds: 12 up (since 47s), 12 in (since 75s)
>  
>   data:
>     volumes: 1/1 healthy
>     pools:   3 pools, 65 pgs
>     objects: 22 objects, 2.3 KiB
>     usage:   70 MiB used, 180 GiB / 180 GiB avail
>     pgs:     65 active+clean
>  
>   io:
>     client:   5.3 KiB/s rd, 0 B/s wr, 5 op/s rd, 0 op/s wr
> 
> Is this expected as per the fix

Looks good.

Comment 13 errata-xmlrpc 2022-04-04 10:22:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage 5.1 Security, Enhancement, and Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:1174


Note You need to log in before you can comment on or make changes to this bug.