.The Ceph Monitor properly updates the OSD map on stretch cluster mode transitions
Previously, the Ceph Monitor would not properly update the OSD map on stretch cluster mode transitions resulting in some clients being unable to resend the OSD requests. The corresponding client I/O requests would appear hung.
With this release, the Ceph Monitor was fixed to properly update the OSD map and clients resend the OSD requests as expected. The stretch cluster mode transition does not precipitate the client I/O requests that appear hung.
DescriptionRaghavendra Talur
2022-03-28 20:30:51 UTC
Description of problem:
We created a stretch cluster RHCS setup where there are 3 nodes in each of the 2 data centers and one arbiter node in cloud. Each data center has 2 MONs running and a 5th MON is running in the arbiter node.
A kernel rbd mount was performed on a client machine where data was being continuously written to it. When all 3 nodes of 2nd data center were shutdown, the IO stopped. Both read and writes failed on the volume.
Version-Release number of selected component (if applicable):
Server info:
ceph version 16.2.0-152.el8cp (e456e8b705cb2f4a779689a0d80b122bcb0d67c9) pacific (stable)
image tag 5-103
How reproducible:
Always
Steps to Reproduce:
1. Create a stretch cluster
2. Mount a rbd volume
3. Shutdown all OSD nodes that belong to one of the data centers/failure domain.
Actual results:
IO hangs.
Expected results:
IO should continue.
Additional info:
Server side info
=======================
[root@ceph-0 ~]# ceph status
cluster:
id: b0d624f6-89ea-11ec-ad08-005056838602
health: HEALTH_WARN
3 hosts fail cephadm check
We are missing stretch mode buckets, only requiring 1 of 2 buckets to peer
insufficient standby MDS daemons available
2/5 mons down, quorum ceph-0,ceph-2,ceph-6
1 datacenter (6 osds) down
6 osds down
3 hosts (6 osds) down
Degraded data redundancy: 1040/2080 objects degraded (50.000%), 90 pgs degraded, 201 pgs undersized
services:
mon: 5 daemons, quorum ceph-0,ceph-2,ceph-6 (age 5h), out of quorum: ceph-3, ceph-5
mgr: ceph-0.fhbxvx(active, since 3d)
mds: 1/1 daemons up
osd: 12 osds: 6 up (since 5h), 12 in (since 6w)
rgw: 1 daemon active (1 hosts, 1 zones)
data:
volumes: 1/1 healthy
pools: 8 pools, 201 pgs
objects: 520 objects, 724 MiB
usage: 12 GiB used, 588 GiB / 600 GiB avail
pgs: 1040/2080 objects degraded (50.000%)
111 active+undersized
90 active+undersized+degraded
[root@ceph-0 ~]# ceph osd pool ls
device_health_metrics
cephfs.cephfs.meta
cephfs.cephfs.data
.rgw.root
default.rgw.log
default.rgw.control
default.rgw.meta
rbdblockpool
[root@ceph-0 ~]# ceph osd crush rule dump stretch_rule
{
"rule_id": 1,
"rule_name": "stretch_rule",
"ruleset": 1,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -16,
"item_name": "site1"
},
{
"op": "chooseleaf_firstn",
"num": 2,
"type": "host"
},
{
"op": "emit"
},
{
"op": "take",
"item": -17,
"item_name": "site2"
},
{
"op": "chooseleaf_firstn",
"num": 2,
"type": "host"
},
{
"op": "emit"
}
]
}
RBD Volume Name : csi-vol-37119694-aca7-11ec-b69e-0a580a050c22
Client info
===============
sh-4.4# cat /etc/redhat-release
Red Hat Enterprise Linux CoreOS release 4.10
sh-4.4# uname -a
Linux perf1-qpmpf-ocs-94mfj 4.18.0-305.34.2.el8_4.x86_64 #1 SMP Mon Jan 17 09:42:23 EST 2022 x86_64 x86_64 x86_64 GNU/Linux
sh-4.4# modinfo libceph
filename: /lib/modules/4.18.0-305.34.2.el8_4.x86_64/kernel/net/ceph/libceph.ko.xz
license: GPL
description: Ceph core library
author: Patience Warnick <patience>
author: Yehuda Sadeh <yehuda.net>
author: Sage Weil <sage>
rhelversion: 8.4
srcversion: 56D6E7804420E592C2C8124
depends: libcrc32c,dns_resolver
intree: Y
name: libceph
vermagic: 4.18.0-305.34.2.el8_4.x86_64 SMP mod_unload modversions
sh-4.4# mount | grep pvc-48a06b6b-6e69-48b9-9d3f-6b4d4e4aeff7
/dev/rbd2 on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-48a06b6b-6e69-48b9-9d3f-6b4d4e4aeff7/globalmount/0001-0011-openshift-storage-0000000000000008-37119694-aca7-11ec-b69e-0a580a050c22 type ext4 (rw,relatime,seclabel,stripe=16,_netdev)
/dev/rbd2 on /var/lib/kubelet/pods/fa5c2e5d-ac40-46f9-8969-44afb25f3168/volumes/kubernetes.io~csi/pvc-48a06b6b-6e69-48b9-9d3f-6b4d4e4aeff7/mount type ext4 (rw,relatime,seclabel,stripe=16,_netdev)
We are also attaching the client debug logs as a file.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Red Hat Ceph Storage 5.1 Bug Fix update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2022:4622
Description of problem: We created a stretch cluster RHCS setup where there are 3 nodes in each of the 2 data centers and one arbiter node in cloud. Each data center has 2 MONs running and a 5th MON is running in the arbiter node. A kernel rbd mount was performed on a client machine where data was being continuously written to it. When all 3 nodes of 2nd data center were shutdown, the IO stopped. Both read and writes failed on the volume. Version-Release number of selected component (if applicable): Server info: ceph version 16.2.0-152.el8cp (e456e8b705cb2f4a779689a0d80b122bcb0d67c9) pacific (stable) image tag 5-103 How reproducible: Always Steps to Reproduce: 1. Create a stretch cluster 2. Mount a rbd volume 3. Shutdown all OSD nodes that belong to one of the data centers/failure domain. Actual results: IO hangs. Expected results: IO should continue. Additional info: Server side info ======================= [root@ceph-0 ~]# ceph status cluster: id: b0d624f6-89ea-11ec-ad08-005056838602 health: HEALTH_WARN 3 hosts fail cephadm check We are missing stretch mode buckets, only requiring 1 of 2 buckets to peer insufficient standby MDS daemons available 2/5 mons down, quorum ceph-0,ceph-2,ceph-6 1 datacenter (6 osds) down 6 osds down 3 hosts (6 osds) down Degraded data redundancy: 1040/2080 objects degraded (50.000%), 90 pgs degraded, 201 pgs undersized services: mon: 5 daemons, quorum ceph-0,ceph-2,ceph-6 (age 5h), out of quorum: ceph-3, ceph-5 mgr: ceph-0.fhbxvx(active, since 3d) mds: 1/1 daemons up osd: 12 osds: 6 up (since 5h), 12 in (since 6w) rgw: 1 daemon active (1 hosts, 1 zones) data: volumes: 1/1 healthy pools: 8 pools, 201 pgs objects: 520 objects, 724 MiB usage: 12 GiB used, 588 GiB / 600 GiB avail pgs: 1040/2080 objects degraded (50.000%) 111 active+undersized 90 active+undersized+degraded [root@ceph-0 ~]# ceph osd pool ls device_health_metrics cephfs.cephfs.meta cephfs.cephfs.data .rgw.root default.rgw.log default.rgw.control default.rgw.meta rbdblockpool [root@ceph-0 ~]# ceph osd crush rule dump stretch_rule { "rule_id": 1, "rule_name": "stretch_rule", "ruleset": 1, "type": 1, "min_size": 1, "max_size": 10, "steps": [ { "op": "take", "item": -16, "item_name": "site1" }, { "op": "chooseleaf_firstn", "num": 2, "type": "host" }, { "op": "emit" }, { "op": "take", "item": -17, "item_name": "site2" }, { "op": "chooseleaf_firstn", "num": 2, "type": "host" }, { "op": "emit" } ] } RBD Volume Name : csi-vol-37119694-aca7-11ec-b69e-0a580a050c22 Client info =============== sh-4.4# cat /etc/redhat-release Red Hat Enterprise Linux CoreOS release 4.10 sh-4.4# uname -a Linux perf1-qpmpf-ocs-94mfj 4.18.0-305.34.2.el8_4.x86_64 #1 SMP Mon Jan 17 09:42:23 EST 2022 x86_64 x86_64 x86_64 GNU/Linux sh-4.4# modinfo libceph filename: /lib/modules/4.18.0-305.34.2.el8_4.x86_64/kernel/net/ceph/libceph.ko.xz license: GPL description: Ceph core library author: Patience Warnick <patience> author: Yehuda Sadeh <yehuda.net> author: Sage Weil <sage> rhelversion: 8.4 srcversion: 56D6E7804420E592C2C8124 depends: libcrc32c,dns_resolver intree: Y name: libceph vermagic: 4.18.0-305.34.2.el8_4.x86_64 SMP mod_unload modversions sh-4.4# mount | grep pvc-48a06b6b-6e69-48b9-9d3f-6b4d4e4aeff7 /dev/rbd2 on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-48a06b6b-6e69-48b9-9d3f-6b4d4e4aeff7/globalmount/0001-0011-openshift-storage-0000000000000008-37119694-aca7-11ec-b69e-0a580a050c22 type ext4 (rw,relatime,seclabel,stripe=16,_netdev) /dev/rbd2 on /var/lib/kubelet/pods/fa5c2e5d-ac40-46f9-8969-44afb25f3168/volumes/kubernetes.io~csi/pvc-48a06b6b-6e69-48b9-9d3f-6b4d4e4aeff7/mount type ext4 (rw,relatime,seclabel,stripe=16,_netdev) We are also attaching the client debug logs as a file.