Bug 2024154

Summary: MDSMonitor: no active MDS after cluster deployment
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Venky Shankar <vshankar>
Component: CephFSAssignee: Venky Shankar <vshankar>
Status: CLOSED ERRATA QA Contact: Amarnath <amk>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.0CC: ceph-eng-bugs, tserlin, vereddy
Target Milestone: ---   
Target Release: 5.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ceph-16.2.6-40.el8cp Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-04-04 10:22:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Venky Shankar 2021-11-17 12:48:40 UTC
This happens starting v16.2.6 if CephFS volume creation and setting allow_standby_replay mode occur before MDS daemons creation.

Comment 9 Amarnath 2022-01-15 18:42:27 UTC
Hi @vshankar

Comment 10 Amarnath 2022-01-15 18:45:15 UTC
I have tried making allow_standby_reply to True as soon as created file system.

I see standby mds node going to hot standby mode.

Cluster is in healthy state

2022-01-15 23:45:46,290 - ceph.ceph - INFO - Running command cephadm -v shell -- ceph fs volume create cephfs on 10.0.210.201
2022-01-15 23:45:51,017 - ceph.ceph - INFO - Command completed successfully
2022-01-15 23:45:51,017 - ceph.ceph - INFO - Running command cephadm -v shell -- ceph fs set cephfs allow_standby_replay true on 10.0.210.201
2022-01-15 23:45:54,805 - ceph.ceph - INFO - Command completed successfully
2022-01-15 23:45:54,806 - ceph.ceph - INFO - Running command cephadm -v shell -- ceph orch  --verbose apply mds cephfs --placement='label:mds' on 10.0.210.201
2022-01-15 23:45:58,158 - ceph.ceph - INFO - Command completed successfully
2022-01-15 23:46:03,161 - ceph.ceph - INFO - Running command cephadm -v shell -- ceph orch  --format json ls  --service_name mds.cephfs --refresh on 10.0.210.201
2022-01-15 23:46:05,452 - ceph.ceph - INFO - Command completed successfully
2022-01-15 23:46:05,453 - root - INFO - 4/2 mds.cephfs daemon(s) up... retrying
2022-01-15 23:46:10,454 - ceph.ceph - INFO - Running command cephadm -v shell -- ceph orch  --format json ls  --service_name mds.cephfs --refresh on 10.0.210.201
2022-01-15 23:46:12,805 - ceph.ceph - INFO - Command completed successfully
2022-01-15 23:46:12,806 - root - INFO - 5/3 mds.cephfs daemon(s) up... retrying
2022-01-15 23:46:17,808 - ceph.ceph - INFO - Running command cephadm -v shell -- ceph orch  --format json ls  --service_name mds.cephfs --refresh on 10.0.210.201
2022-01-15 23:46:20,174 - ceph.ceph - INFO - Command completed successfully
2022-01-15 23:46:20,175 - root - INFO - 4/3 mds.cephfs daemon(s) up... retrying
2022-01-15 23:46:25,176 - ceph.ceph - INFO - Running command cephadm -v shell -- ceph orch  --format json ls  --service_name mds.cephfs --refresh on 10.0.210.201
2022-01-15 23:46:27,752 - ceph.ceph - INFO - Command completed successfully
2022-01-15 23:46:27,753 - root - INFO - 3/3 mds.cephfs daemon(s) up... retrying
2022-01-15 23:46:27,754 - ceph.ceph - INFO - Running command cephadm shell -- ceph -s on 10.0.210.201
2022-01-15 23:46:30,508 - ceph.ceph - INFO - Command completed successfully
2022-01-15 23:46:30,509 - ceph.ceph - INFO -   cluster:
    id:     61561870-762e-11ec-aa3e-fa163e5d3b8c
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum ceph-amk5-1-7ho54u-node1-installer,ceph-amk5-1-7ho54u-node2,ceph-amk5-1-7ho54u-node3 (age 2m)
    mgr: ceph-amk5-1-7ho54u-node1-installer.kjkddn(active, since 5m), standbys: ceph-amk5-1-7ho54u-node2.poqczo
    mds: 1/1 daemons up, 1 standby, 1 hot standby
    osd: 12 osds: 12 up (since 47s), 12 in (since 75s)
 
  data:
    volumes: 1/1 healthy
    pools:   3 pools, 65 pgs
    objects: 22 objects, 2.3 KiB
    usage:   70 MiB used, 180 GiB / 180 GiB avail
    pgs:     65 active+clean
 
  io:
    client:   5.3 KiB/s rd, 0 B/s wr, 5 op/s rd, 0 op/s wr

Is this expected as per the fix

Comment 11 Venky Shankar 2022-01-17 06:06:11 UTC
(In reply to Amarnath from comment #10)
> I have tried making allow_standby_reply to True as soon as created file
> system.
> 
> I see standby mds node going to hot standby mode.
> 
> Cluster is in healthy state
> 
> 2022-01-15 23:45:46,290 - ceph.ceph - INFO - Running command cephadm -v
> shell -- ceph fs volume create cephfs on 10.0.210.201
> 2022-01-15 23:45:51,017 - ceph.ceph - INFO - Command completed successfully
> 2022-01-15 23:45:51,017 - ceph.ceph - INFO - Running command cephadm -v
> shell -- ceph fs set cephfs allow_standby_replay true on 10.0.210.201
> 2022-01-15 23:45:54,805 - ceph.ceph - INFO - Command completed successfully
> 2022-01-15 23:45:54,806 - ceph.ceph - INFO - Running command cephadm -v
> shell -- ceph orch  --verbose apply mds cephfs --placement='label:mds' on
> 10.0.210.201
> 2022-01-15 23:45:58,158 - ceph.ceph - INFO - Command completed successfully
> 2022-01-15 23:46:03,161 - ceph.ceph - INFO - Running command cephadm -v
> shell -- ceph orch  --format json ls  --service_name mds.cephfs --refresh on
> 10.0.210.201
> 2022-01-15 23:46:05,452 - ceph.ceph - INFO - Command completed successfully
> 2022-01-15 23:46:05,453 - root - INFO - 4/2 mds.cephfs daemon(s) up...
> retrying
> 2022-01-15 23:46:10,454 - ceph.ceph - INFO - Running command cephadm -v
> shell -- ceph orch  --format json ls  --service_name mds.cephfs --refresh on
> 10.0.210.201
> 2022-01-15 23:46:12,805 - ceph.ceph - INFO - Command completed successfully
> 2022-01-15 23:46:12,806 - root - INFO - 5/3 mds.cephfs daemon(s) up...
> retrying
> 2022-01-15 23:46:17,808 - ceph.ceph - INFO - Running command cephadm -v
> shell -- ceph orch  --format json ls  --service_name mds.cephfs --refresh on
> 10.0.210.201
> 2022-01-15 23:46:20,174 - ceph.ceph - INFO - Command completed successfully
> 2022-01-15 23:46:20,175 - root - INFO - 4/3 mds.cephfs daemon(s) up...
> retrying
> 2022-01-15 23:46:25,176 - ceph.ceph - INFO - Running command cephadm -v
> shell -- ceph orch  --format json ls  --service_name mds.cephfs --refresh on
> 10.0.210.201
> 2022-01-15 23:46:27,752 - ceph.ceph - INFO - Command completed successfully
> 2022-01-15 23:46:27,753 - root - INFO - 3/3 mds.cephfs daemon(s) up...
> retrying
> 2022-01-15 23:46:27,754 - ceph.ceph - INFO - Running command cephadm shell
> -- ceph -s on 10.0.210.201
> 2022-01-15 23:46:30,508 - ceph.ceph - INFO - Command completed successfully
> 2022-01-15 23:46:30,509 - ceph.ceph - INFO -   cluster:
>     id:     61561870-762e-11ec-aa3e-fa163e5d3b8c
>     health: HEALTH_OK
>  
>   services:
>     mon: 3 daemons, quorum
> ceph-amk5-1-7ho54u-node1-installer,ceph-amk5-1-7ho54u-node2,ceph-amk5-1-
> 7ho54u-node3 (age 2m)
>     mgr: ceph-amk5-1-7ho54u-node1-installer.kjkddn(active, since 5m),
> standbys: ceph-amk5-1-7ho54u-node2.poqczo
>     mds: 1/1 daemons up, 1 standby, 1 hot standby
>     osd: 12 osds: 12 up (since 47s), 12 in (since 75s)
>  
>   data:
>     volumes: 1/1 healthy
>     pools:   3 pools, 65 pgs
>     objects: 22 objects, 2.3 KiB
>     usage:   70 MiB used, 180 GiB / 180 GiB avail
>     pgs:     65 active+clean
>  
>   io:
>     client:   5.3 KiB/s rd, 0 B/s wr, 5 op/s rd, 0 op/s wr
> 
> Is this expected as per the fix

Looks good.

Comment 13 errata-xmlrpc 2022-04-04 10:22:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage 5.1 Security, Enhancement, and Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:1174