Bug 2179083

Summary: [GSS][Ceph] cephfs clone are stuck in Pending state
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Sonal <sarora>
Component: cephAssignee: Venky Shankar <vshankar>
ceph sub component: CephFS QA Contact: Elad <ebenahar>
Status: CLOSED CURRENTRELEASE Docs Contact:
Severity: high    
Priority: unspecified CC: bniver, muagarwa, odf-bz-bot, sheggodu, sostapov, vshankar, xiubli
Version: 4.10   
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 2190080 (view as bug list) Environment:
Last Closed: 2024-02-13 02:22:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2190080    

Description Sonal 2023-03-16 15:36:57 UTC
Description of problem (please be detailed as possible and provide log
snippests):
Cephfs clones are stuck in pending state.

Version of all relevant components (if applicable):
ceph version 16.2.7-126.el8cp

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes, can't restore snapshot backup

Is there any workaround available to the best of your knowledge?
No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
2

Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
Yes

If this is a regression, please provide more details to justify this:
-

Steps to Reproduce:
1. Create cephfs volume or a cephfs PVC
2. Put some data in it
3. Create snapshot from the volume
4. Create cephfs clone (using ceph command)


Actual results:
Volume clone is stuck for indefinite time period.

Expected results:
Cloned volume should provision

Additional info:
In next private comment.

Comment 4 Greg Farnum 2023-03-18 01:11:40 UTC
*** Bug 2179080 has been marked as a duplicate of this bug. ***

Comment 5 Greg Farnum 2023-03-18 01:12:23 UTC
*** Bug 2179081 has been marked as a duplicate of this bug. ***

Comment 6 Greg Farnum 2023-03-18 01:12:34 UTC
*** Bug 2179082 has been marked as a duplicate of this bug. ***

Comment 22 Venky Shankar 2023-03-31 05:18:34 UTC
The MDS is unable to find an inode from its peers. Normally, if an inode is not in the MDCache, the MDS would try to find the inode by contacting its peer MDSs. However, this MDS is the only MDS in the cluster (max_mds=1) which is excluded when trying to find an inode, which is expected.

```
2023-03-29T12:33:32.101673839Z debug 2023-03-29T12:33:32.100+0000 7fd6c4315700  7 mds.0.cache traverse: opening base ino 0x10000cfd1cd snap head
2023-03-29T12:33:32.101673839Z debug 2023-03-29T12:33:32.100+0000 7fd6c4315700 10 mds.0.server rdlock_path_pin_ref request(client.39931710:23624643 nref=2 cr=0x56073ae99600) #0x10000cfd1cd
2023-03-29T12:33:32.101673839Z debug 2023-03-29T12:33:32.100+0000 7fd6c4315700  7 mds.0.cache traverse: opening base ino 0x10000cfd1cd snap head
2023-03-29T12:33:32.101673839Z debug 2023-03-29T12:33:32.100+0000 7fd6c4315700 10 mds.0.server FAIL on CEPHFS_ESTALE but attempting recovery
2023-03-29T12:33:32.101673839Z debug 2023-03-29T12:33:32.100+0000 7fd6c4315700  5 mds.0.cache find_ino_peers 0x10000cfd1cd hint -1
2023-03-29T12:33:32.101673839Z debug 2023-03-29T12:33:32.100+0000 7fd6c4315700 10 mds.0.cache _do_find_ino_peer 14063259190 0x10000cfd1cd active 0 all 0 checked
2023-03-29T12:33:32.101684532Z debug 2023-03-29T12:33:32.100+0000 7fd6c4315700 10 mds.0.cache _do_find_ino_peer failed on 0x10000cfd1cd
2023-03-29T12:33:32.101684532Z debug 2023-03-29T12:33:32.100+0000 7fd6c4315700 10 MDSContext::complete: 18C_MDS_TryFindInode                                                                                                          2023-03-29T12:33:32.101684532Z debug 2023-03-29T12:33:32.100+0000 7fd6c4315700  7 mds.0.server reply_client_request -116 ((116) Stale file handle) client_request(client.39931710:23624643 getattr Fa #0x10000cfd1cd 2023-03-29T12:22:44.081124+0000 RETRY=236 caller_uid=0, caller_gid=0{}) v5
```

The inode is missing from the MDS cache, which means, running `find` or `ls -R` would load the inode in the MDS cache. Another reason could be that the inode is under purging by the MDS. The inode number is 0x10000cfd1cd. Could you please share the output of:

> ceph tell mds.c dump inode 1099525247437

Additionally, could you also run the following from a cephfs mount:

> find <mntpt> -inum 1099525247437

Comment 29 Venky Shankar 2023-04-11 06:20:36 UTC
Sonal, ping?