Description of problem (please be detailed as possible and provide log snippests): Cephfs clones are stuck in pending state. Version of all relevant components (if applicable): ceph version 16.2.7-126.el8cp Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Yes, can't restore snapshot backup Is there any workaround available to the best of your knowledge? No Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 2 Can this issue reproducible? Yes Can this issue reproduce from the UI? Yes If this is a regression, please provide more details to justify this: - Steps to Reproduce: 1. Create cephfs volume or a cephfs PVC 2. Put some data in it 3. Create snapshot from the volume 4. Create cephfs clone (using ceph command) Actual results: Volume clone is stuck for indefinite time period. Expected results: Cloned volume should provision Additional info: In next private comment.
*** Bug 2179080 has been marked as a duplicate of this bug. ***
*** Bug 2179081 has been marked as a duplicate of this bug. ***
*** Bug 2179082 has been marked as a duplicate of this bug. ***
The MDS is unable to find an inode from its peers. Normally, if an inode is not in the MDCache, the MDS would try to find the inode by contacting its peer MDSs. However, this MDS is the only MDS in the cluster (max_mds=1) which is excluded when trying to find an inode, which is expected. ``` 2023-03-29T12:33:32.101673839Z debug 2023-03-29T12:33:32.100+0000 7fd6c4315700 7 mds.0.cache traverse: opening base ino 0x10000cfd1cd snap head 2023-03-29T12:33:32.101673839Z debug 2023-03-29T12:33:32.100+0000 7fd6c4315700 10 mds.0.server rdlock_path_pin_ref request(client.39931710:23624643 nref=2 cr=0x56073ae99600) #0x10000cfd1cd 2023-03-29T12:33:32.101673839Z debug 2023-03-29T12:33:32.100+0000 7fd6c4315700 7 mds.0.cache traverse: opening base ino 0x10000cfd1cd snap head 2023-03-29T12:33:32.101673839Z debug 2023-03-29T12:33:32.100+0000 7fd6c4315700 10 mds.0.server FAIL on CEPHFS_ESTALE but attempting recovery 2023-03-29T12:33:32.101673839Z debug 2023-03-29T12:33:32.100+0000 7fd6c4315700 5 mds.0.cache find_ino_peers 0x10000cfd1cd hint -1 2023-03-29T12:33:32.101673839Z debug 2023-03-29T12:33:32.100+0000 7fd6c4315700 10 mds.0.cache _do_find_ino_peer 14063259190 0x10000cfd1cd active 0 all 0 checked 2023-03-29T12:33:32.101684532Z debug 2023-03-29T12:33:32.100+0000 7fd6c4315700 10 mds.0.cache _do_find_ino_peer failed on 0x10000cfd1cd 2023-03-29T12:33:32.101684532Z debug 2023-03-29T12:33:32.100+0000 7fd6c4315700 10 MDSContext::complete: 18C_MDS_TryFindInode 2023-03-29T12:33:32.101684532Z debug 2023-03-29T12:33:32.100+0000 7fd6c4315700 7 mds.0.server reply_client_request -116 ((116) Stale file handle) client_request(client.39931710:23624643 getattr Fa #0x10000cfd1cd 2023-03-29T12:22:44.081124+0000 RETRY=236 caller_uid=0, caller_gid=0{}) v5 ``` The inode is missing from the MDS cache, which means, running `find` or `ls -R` would load the inode in the MDS cache. Another reason could be that the inode is under purging by the MDS. The inode number is 0x10000cfd1cd. Could you please share the output of: > ceph tell mds.c dump inode 1099525247437 Additionally, could you also run the following from a cephfs mount: > find <mntpt> -inum 1099525247437
Sonal, ping?