Description of problem: As part of scale testing, We followed below steps 1. Create subvolume 2. Get the path of subvolume created 3. Mount the subvolume using above path 4. write 1 gb data. Above steps have been passed till subvolume 2491 iterations In 2492 Iteration Get path of subvolume has failed with below error 2022-12-01 23:04:59,778 - INFO - cephci.ceph.ceph.py:1513 - Running command ceph fs subvolume getpath cephfs subvol_max_2942 on 10.1.38.141 timeout 600 2022-12-01 23:05:00,211 - ERROR - cephci.ceph.ceph.py:1548 - Error 108 during cmd, timeout 600 2022-12-01 23:05:00,212 - ERROR - cephci.ceph.ceph.py:1549 - Error ESHUTDOWN: error in stat: /volumes/_nogroup MDS Logs : http://magna002.ceph.redhat.com/ceph-qe-logs/amar/mds_log_scale/ Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Please specify the severity of this bug. Severity is defined here: https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.
[root@f12-h09-000-1029u ~]# ceph versions { "mon": { "ceph version 16.2.10-79.el8cp (04a651bbcd8d087dd0fcc0bc71a5871e77732529) pacific (stable)": 3 }, "mgr": { "ceph version 16.2.10-79.el8cp (04a651bbcd8d087dd0fcc0bc71a5871e77732529) pacific (stable)": 2 }, "osd": { "ceph version 16.2.10-79.el8cp (04a651bbcd8d087dd0fcc0bc71a5871e77732529) pacific (stable)": 96 }, "mds": { "ceph version 16.2.10-79.el8cp (04a651bbcd8d087dd0fcc0bc71a5871e77732529) pacific (stable)": 3 }, "rbd-mirror": { "ceph version 16.2.10-79.el8cp (04a651bbcd8d087dd0fcc0bc71a5871e77732529) pacific (stable)": 1 }, "overall": { "ceph version 16.2.10-79.el8cp (04a651bbcd8d087dd0fcc0bc71a5871e77732529) pacific (stable)": 105 } } [root@f12-h09-000-1029u ~]#
Also share the ceph-mgr logs please.
Given the low priority of this issue and the fact that we are fixing blockers only in 5.3 z1, I am moving this to 6.1.
Hi Venky, This is done on bare metal servers and awe have reimaged them. I have to recreate the setup and try this. Please expect some delay in collect the logs. As we have limited server resources Regards, Amarnath
(In reply to Amarnath from comment #22) > Hi Venky, > > We are not observing the error code as part of output even after filling the > cluster to full. > [root@ceph-amk-61-test-xtknkr-node7 subvol_1]# ceph -s > cluster: > id: 417b4eba-247b-11ee-bf71-fa163e45e70b > health: HEALTH_ERR > 1 MDSs report slow metadata IOs > 1 MDSs report slow requests > 1 full osd(s) > Degraded data redundancy: 1983/122460 objects degraded (1.619%), > 6 pgs degraded, 6 pgs undersized > Full OSDs blocking recovery: 6 pgs recovery_toofull > 5 pool(s) full > > services: > mon: 3 daemons, quorum > ceph-amk-61-test-xtknkr-node1-installer,ceph-amk-61-test-xtknkr-node3,ceph- > amk-61-test-xtknkr-node2 (age 3h) > mgr: ceph-amk-61-test-xtknkr-node1-installer.cmqizy(active, since 11m) > mds: 2/2 daemons up, 3 standby > osd: 12 osds: 12 up (since 2h), 12 in (since 3h); 6 remapped pgs > > data: > volumes: 2/2 healthy > pools: 5 pools, 193 pgs > objects: 40.82k objects, 40 GiB > usage: 121 GiB used, 59 GiB / 180 GiB avail > pgs: 1983/122460 objects degraded (1.619%) > 1340/122460 objects misplaced (1.094%) > 187 active+clean > 6 active+recovery_toofull+undersized+degraded+remapped > > [root@ceph-amk-61-test-xtknkr-node7 subvol_1]# > [root@ceph-amk-61-test-xtknkr-node7 subvol_1]# wget -O linux.tar.gz > http://download.ceph.com/qa/linux-5.4.tar.gz > --2023-07-17 08:26:15-- http://download.ceph.com/qa/linux-5.4.tar.gz > Resolving download.ceph.com (download.ceph.com)... > 2607:5300:201:2000::3:58a1, 158.69.68.124 > Connecting to download.ceph.com > (download.ceph.com)|2607:5300:201:2000::3:58a1|:80... failed: No route to > host. > Connecting to download.ceph.com (download.ceph.com)|158.69.68.124|:80... > connected. > HTTP request sent, awaiting response... 200 OK > Length: 172616875 (165M) [application/octet-stream] > Saving to: ‘linux.tar.gz’ > > linux.tar.gz 0%[ > ] 0 --.-KB/s in 0s > > > Cannot write to ‘linux.tar.gz’ (No space left on device). > > > We even tried stopping the mgr services using systemctl. when the service is > stopped the command is getting stuck and when it started back it is getting > the path of the subvolume correctly > > @need info, Anything more needs to tested on this Venky This should be sufficient.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat Ceph Storage 6.1 Bug Fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:4473