Hello Ilya, What about those 2 PRs? PR#42843 (https://github.com/ceph/ceph/pull/42843) = Tracker for issue 52323 PR#42883 (https://github.com/ceph/ceph/pull/42883)= Tracker for issue 52341 Should those 2 fixes be backported as well? Thanks, Bertrand
@llYa, @Deepika, We have used ceph-16.2.6-34.el8cp to verify the issue and below are my observation: 1) After updating conf file to SSD mode, We see cache status as RWL instead of SSD mode. Below are my observation - root@plena007 log]# cat /etc/ceph/ceph.conf # minimal ceph.conf for d6e5c458-0f10-11ec-9663-002590fc25a4 [global] fsid = d6e5c458-0f10-11ec-9663-002590fc25a4 mon_host = [v2:10.8.128.31:3300/0,v1:10.8.128.31:6789/0] [client] rbd_cache = false rbd_persistent_cache_mode = ssd rbd_plugins = pwl_cache rbd_persistent_cache_size = 1073741824 rbd_persistent_cache_path = /mnt/nvme/ Started Ios using rbd bench , and checked the mounted path to see the data [root@plena007 nvme]# ls lost+found rbd-pwl.test1.11da251a5d7044.pool [root@plena007 nvme]# ls lost+found [root@plena007 nvme]# ls lost+found [root@plena007 nvme]# [root@magna031 yum.repos.d]# rbd du --pool test1 NAME PROVISIONED USED image1 1 GiB 1 GiB image2 1000 GiB 1 GiB <TOTAL> 1001 GiB 2 GiB [root@magna031 yum.repos.d]# [root@magna031 yum.repos.d]# rbd status test1/image1 Watchers: none [root@magna031 yum.repos.d]# rbd status test1/image2 Watchers: watcher=10.1.172.7:0/225138355 client.1170002 cookie=140197396313344 Image cache state: {"present":"true","empty":"true","clean":"true","cache_type":"rwl","pwl_host":"plena007","pwl_path":"/mnt/nvme//rbd-pwl.test1.11da251a5d7044.pool","pwl_size":1073741824} [root@magna031 yum.repos.d]# rbd status test1/image2 Watchers: none [root@magna031 yum.repos.d]# Above status seen as RWL instead of SSD****************************** NOTE: the above behaviour was inconsistent, second time when IO was ran we didnot see data writing to mounted path and there was none in watchers output when rbd status command was issued. We also upgraded the cluster to the latest and saw Io's failed to start. (Triggered Ios from RBD bench and FIO both) [root@magna031 ubuntu]# ceph version ceph version 16.2.7-3.el8cp (54410e69e153d229a04fb6acc388f7e4afdd05e7) pacific (stable) RBD bench output for reference - [root@plena007 ubuntu]# rbd bench-write image1 --pool=test --io-threads=1 rbd: bench-write is deprecated, use rbd bench --io-type write ... 2021-12-14T07:25:30.666+0000 7fc3327fc700 -1 librbd::exclusive_lock::PostAcquireRequest: 0x7fc32c037000 handle_process_plugin_acquire_lock: failed to process plugins: (2) No such file or directory rbd: failed to flush: 2021-12-14T07:25:30.669+0000 7fc3327fc700 -1 librbd::exclusive_lock::ImageDispatch: 0x7fc314002b60 handle_acquire_lock: failed to acquire exclusive lock: (2) No such file or directory 2021-12-14T07:25:30.669+0000 7fc3327fc700 -1 librbd::io::AioCompletion: 0x559cca568320 fail: (2) No such file or directory (2) No such file or directory bench failed: (2) No such file or directory FIO output - [root@plena007 ubuntu]# fio --name=test-1 --ioengine=rbd --pool=test1 --rbdname=image2 --numjobs=1 --rw=write --bs=4k --iodepth=1 --fsync=32 --runtime=480 --time_based --group_reporting --ramp_time=120 test-1: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=rbd, iodepth=1 fio-3.19 Starting 1 process fio: io_u error on file test-1.0.0: No such file or directory: write offset=0, buflen=4096 fio: pid=1197333, err=2/file:io_u.c:1803, func=io_u error, error=No such file or directory test-1: (groupid=0, jobs=1): err= 2 (file:io_u.c:1803, func=io_u error, error=No such file or directory): pid=1197333: Tue Dec 14 07:26:47 2021 cpu : usr=0.00%, sys=0.00%, ctx=2, majf=0, minf=5 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=0,1,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=1 Run status group 0 (all jobs): Disk stats (read/write): sda: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00% [root@plena007 ubuntu]#
Filed separate BZs and tracker for IO failed to start in SSD mode -https://bugzilla.redhat.com/show_bug.cgi?id=2032764 https://tracker.ceph.com/issues/53613
To verify this BZ, we need steps and workloads. Please help us in sharing the steps to verify.
Attached script used for verification for the below mentioned scenario. We have completed 10k iteration and no issue seen. Hence moving it to verified state. create an image for i in {0..10000}: start "rbd bench" or "fio" (choose between sequential and random write workload at random, choose I/O size at random) in the background sleep for 10-100 seconds at random SIGKILL "rbd bench" or "fio" assert that the cache is dirty ("rbd status | grep image_cache_state" should produce output)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat Ceph Storage 5.1 Security, Enhancement, and Bug Fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:1174