Bug 2021311
| Summary: | mds opening connection to up:replay/up:creating daemon causes message drop | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Patrick Donnelly <pdonnell> |
| Component: | CephFS | Assignee: | Patrick Donnelly <pdonnell> |
| Status: | CLOSED ERRATA | QA Contact: | Amarnath <amk> |
| Severity: | high | Docs Contact: | Ranjini M N <rmandyam> |
| Priority: | urgent | ||
| Version: | 5.1 | CC: | agunn, ceph-eng-bugs, rmandyam, tserlin, vereddy, vshankar |
| Target Milestone: | --- | ||
| Target Release: | 5.1 | ||
| Hardware: | All | ||
| OS: | All | ||
| Whiteboard: | |||
| Fixed In Version: | ceph-16.2.7-22.el8cp | Doc Type: | Bug Fix |
| Doc Text: |
.Inter-MDS connections to a replacement Ceph Metadata Server (MDS) is now delayed until an identity state is established
Previously, an active Ceph Metadata Server(MDS) would initiate a connection with another replacement MDS before an identity state was established thereby refusing to further process the imposter MDS messages and causing a halt to the failover.
With this release, the connection to replacement MDS is delayed until the identity state is established resulting in no message drops or failover issues.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-04-04 10:22:28 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 2031073 | ||
|
Description
Patrick Donnelly
2021-11-08 19:04:06 UTC
Hi @pdonnell , Can you Please help in verifying this bug. Any specific steps to follow or specific teutolagy script that needs to be executed. Just run the failover test in teuthology (--filter failover"). Hi @pdonnell, teuthology runs are failing for one or the other reason. cmd used : ./teuthology-suite -n 10 -c master -s fs --ceph-repo https://github.com/AmarnatReddy/ceph.git --suite-repo https://github.com/AmarnatReddy/ceph.git --suite-branch master /home/amk/rh8x_5.1.yaml -e amk -m clara --distro-version 8.5 --distro rhel -t rh --filter failover recent failure : Command failed on clara003 with status 1: 'sudo yum remove cephadm ceph-mon ceph-mgr ceph-osd ceph-mds ceph-radosgw ceph-test ceph-selinux ceph-fuse python-rados python-rbd python-cephfs rbd-mirror bison flex elfutils-libelf-devel openssl-devel NetworkManager iproute util-linux libacl-devel libaio-devel libattr-devel libtool libuuid-devel xfsdump xfsprogs xfsprogs-devel libaio-devel libtool libuuid-devel xfsprogs-devel python3-cephfs cephfs-top cephfs-mirror bison flex elfutils-libelf-devel openssl-devel NetworkManager iproute util-linux libacl-devel libaio-devel libattr-devel libtool libuuid-devel xfsdump xfsprogs xfsprogs-devel libaio-devel libtool libuuid-devel xfsprogs-devel python3-cephfs cephfs-top cephfs-mirror -y' could please let me know is there any other way I can validate this? Regards, Amarnath Hi Patrick,
I have verified it on the latest build(16.2.7-48.el8cp).
I see the active node coming back to an active state after initiating the `ceph fail mds 0`.It is not getting stuck at up:resolve state.
I don't see any message dropping in the logs.
commands executed :
[root@ceph-bz-mds-9ozvvy-node7 ~]# ceph fs status
cephfs - 1 clients
======
RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS
0 replay cephfs.ceph-bz-mds-9ozvvy-node6.mllnlu 0 0 0 0
1 active cephfs.ceph-bz-mds-9ozvvy-node5.btiybz Reqs: 4 /s 285 169 58 141
POOL TYPE USED AVAIL
cephfs.cephfs.meta metadata[root@ceph-bz-mds-9ozvvy-node7 ~]# ceph config set mds mds_sleep_rank_change 10000000.0
[root@ceph-bz-mds-9ozvvy-node7 ~]# ceph config set mds mds_connect_bootstrapping True
[root@ceph-bz-mds-9ozvvy-node7 ~]# ceph -s
cluster:
id: 4041e752-888c-11ec-9ac6-fa163e1e31c2
health: HEALTH_OK
services:
mon: 3 daemons, quorum ceph-bz-mds-9ozvvy-node1-installer,ceph-bz-mds-9ozvvy-node2,ceph-bz-mds-9ozvvy-node3 (age 40m)
mgr: ceph-bz-mds-9ozvvy-node1-installer.fzndpb(active, since 43m), standbys: ceph-bz-mds-9ozvvy-node2.znbodr
mds: 2/2 daemons up, 1 standby
osd: 12 osds: 12 up (since 39m), 12 in (since 39m)
data:
volumes: 1/1 healthy
pools: 3 pools, 65 pgs
objects: 1.45k objects, 1.9 GiB
usage: 6.0 GiB used, 174 GiB / 180 GiB avail
pgs: 65 active+clean
io:
client: 21 MiB/s rd, 63 MiB/s wr, 22 op/s rd, 55 op/s wr
[root@ceph-bz-mds-9ozvvy-node7 ~]# ceph fs status
cephfs - 1 clients
======
RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS
0 active cephfs.ceph-bz-mds-9ozvvy-node4.varcwu Reqs: 11 /s 1330 1331 283 1312
1 active cephfs.ceph-bz-mds-9ozvvy-node5.btiybz Reqs: 11 /s 109 113 61 93
POOL TYPE USED AVAIL
cephfs.cephfs.meta metadata 101M 54.9G
cephfs.cephfs.data data 3631M 54.9G
STANDBY MDS
cephfs.ceph-bz-mds-9ozvvy-node6.mllnlu
MDS version: ceph version 16.2.7-48.el8cp (49480538844c9255f03e5b0dccc609ea8fbf2656) pacific (stable)
[root@ceph-bz-mds-9ozvvy-node7 ~]# ceph mds fail 0
failed mds gid 14463 111M 54.9G
cephfs.cephfs.data data 3751M 54.9G
STANDBY MDS
cephfs.ceph-bz-mds-9ozvvy-node4.varcwu
MDS version: ceph version 16.2.7-48.el8cp (49480538844c9255f03e5b0dccc609ea8fbf2656) pacific (stable)
[root@ceph-bz-mds-9ozvvy-node7 ~]# ceph fs status
cephfs - 1 clients
======
RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS
0 replay cephfs.ceph-bz-mds-9ozvvy-node6.mllnlu 0 0 0 0
1 active cephfs.ceph-bz-mds-9ozvvy-node5.btiybz Reqs: 0 /s 267 151 58 123
POOL TYPE USED AVAIL
cephfs.cephfs.meta metadata 111M 54.9G
cephfs.cephfs.data data 3273M 54.9G
STANDBY MDS
cephfs.ceph-bz-mds-9ozvvy-node4.varcwu
MDS version: ceph version 16.2.7-48.el8cp (49480538844c9255f03e5b0dccc609ea8fbf2656) pacific (stable)
[root@ceph-bz-mds-9ozvvy-node7 ~]# ceph fs status
cephfs - 1 clients
======
RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS
0 resolve cephfs.ceph-bz-mds-9ozvvy-node6.mllnlu 2943 1356 290 0
1 active cephfs.ceph-bz-mds-9ozvvy-node5.btiybz Reqs: 0 /s 267 151 58 123
POOL TYPE USED AVAIL
cephfs.cephfs.meta metadata 111M 55.1G
cephfs.cephfs.data data 3211M 55.1G
STANDBY MDS
cephfs.ceph-bz-mds-9ozvvy-node4.varcwu
MDS version: ceph version 16.2.7-48.el8cp (49480538844c9255f03e5b0dccc609ea8fbf2656) pacific (stable)
[root@ceph-bz-mds-9ozvvy-node7 ~]# ceph fs status
cephfs - 1 clients
======
RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS
0 resolve cephfs.ceph-bz-mds-9ozvvy-node6.mllnlu 2943 1356 290 0
1 active cephfs.ceph-bz-mds-9ozvvy-node5.btiybz Reqs: 0 /s 267 151 58 123
POOL TYPE USED AVAIL
cephfs.cephfs.meta metadata 111M 55.1G
cephfs.cephfs.data data 3211M 55.1G
STANDBY MDS
cephfs.ceph-bz-mds-9ozvvy-node4.varcwu
MDS version: ceph version 16.2.7-48.el8cp (49480538844c9255f03e5b0dccc609ea8fbf2656) pacific (stable)
[root@ceph-bz-mds-9ozvvy-node7 ~]# ceph fs status
cephfs - 1 clients
======
RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS
0 reconnect cephfs.ceph-bz-mds-9ozvvy-node6.mllnlu 2943 1356 290 0
1 active cephfs.ceph-bz-mds-9ozvvy-node5.btiybz Reqs: 0 /s 267 151 58 119
POOL TYPE USED AVAIL
cephfs.cephfs.meta metadata 111M 55.1G
cephfs.cephfs.data data 3211M 55.1G
STANDBY MDS
cephfs.ceph-bz-mds-9ozvvy-node4.varcwu
MDS version: ceph version 16.2.7-48.el8cp (49480538844c9255f03e5b0dccc609ea8fbf2656) pacific (stable)
[root@ceph-bz-mds-9ozvvy-node7 ~]# ceph fs status
cephfs - 1 clients
======
RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS
0 reconnect cephfs.ceph-bz-mds-9ozvvy-node6.mllnlu 2943 1356 290 0
1 active cephfs.ceph-bz-mds-9ozvvy-node5.btiybz Reqs: 0 /s 267 151 58 119
POOL TYPE USED AVAIL
cephfs.cephfs.meta metadata 111M 55.1G
cephfs.cephfs.data data 3211M 55.1G
STANDBY MDS
cephfs.ceph-bz-mds-9ozvvy-node4.varcwu
MDS version: ceph version 16.2.7-48.el8cp (49480538844c9255f03e5b0dccc609ea8fbf2656) pacific (stable)
[root@ceph-bz-mds-9ozvvy-node7 ~]# ceph fs status
cephfs - 1 clients
======
RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS
0 active cephfs.ceph-bz-mds-9ozvvy-node6.mllnlu Reqs: 30 /s 2988 1356 290 21
1 active cephfs.ceph-bz-mds-9ozvvy-node5.btiybz Reqs: 0 /s 270 142 58 120
POOL TYPE USED AVAIL
cephfs.cephfs.meta metadata 112M 55.3G
cephfs.cephfs.data data 2941M 55.3G
STANDBY MDS
cephfs.ceph-bz-mds-9ozvvy-node4.varcwu
MDS version: ceph version 16.2.7-48.el8cp (49480538844c9255f03e5b0dccc609ea8fbf2656) pacific (stable)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat Ceph Storage 5.1 Security, Enhancement, and Bug Fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:1174 |