2179083 – [GSS][Ceph] cephfs clone are stuck in Pending state

Bug 2179083 - [GSS][Ceph] cephfs clone are stuck in Pending state

Summary: [GSS][Ceph] cephfs clone are stuck in Pending state

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	ceph
Sub Component:
Version:	4.10
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Venky Shankar
QA Contact:	Elad
Docs Contact:
URL:
Whiteboard:
Duplicates (3):	2179080 2179081 2179082 (view as bug list)
Depends On:
Blocks:	2190080
TreeView+	depends on / blocked

Reported:	2023-03-16 15:36 UTC by Sonal
Modified:	2024-02-13 02:22 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	2190080 (view as bug list)
Environment:
Last Closed:	2024-02-13 02:22:28 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Ceph Project Bug Tracker	55935	0	None	None	None	2023-04-04 09:43:26 UTC

Description Sonal 2023-03-16 15:36:57 UTC

Description of problem (please be detailed as possible and provide log
snippests):
Cephfs clones are stuck in pending state.

Version of all relevant components (if applicable):
ceph version 16.2.7-126.el8cp

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes, can't restore snapshot backup

Is there any workaround available to the best of your knowledge?
No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
2

Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
Yes

If this is a regression, please provide more details to justify this:
-

Steps to Reproduce:
1. Create cephfs volume or a cephfs PVC
2. Put some data in it
3. Create snapshot from the volume
4. Create cephfs clone (using ceph command)


Actual results:
Volume clone is stuck for indefinite time period.

Expected results:
Cloned volume should provision

Additional info:
In next private comment.

Comment 4 Greg Farnum 2023-03-18 01:11:40 UTC

*** Bug 2179080 has been marked as a duplicate of this bug. ***

Comment 5 Greg Farnum 2023-03-18 01:12:23 UTC

*** Bug 2179081 has been marked as a duplicate of this bug. ***

Comment 6 Greg Farnum 2023-03-18 01:12:34 UTC

*** Bug 2179082 has been marked as a duplicate of this bug. ***

Comment 22 Venky Shankar 2023-03-31 05:18:34 UTC

The MDS is unable to find an inode from its peers. Normally, if an inode is not in the MDCache, the MDS would try to find the inode by contacting its peer MDSs. However, this MDS is the only MDS in the cluster (max_mds=1) which is excluded when trying to find an inode, which is expected.

```
2023-03-29T12:33:32.101673839Z debug 2023-03-29T12:33:32.100+0000 7fd6c4315700  7 mds.0.cache traverse: opening base ino 0x10000cfd1cd snap head
2023-03-29T12:33:32.101673839Z debug 2023-03-29T12:33:32.100+0000 7fd6c4315700 10 mds.0.server rdlock_path_pin_ref request(client.39931710:23624643 nref=2 cr=0x56073ae99600) #0x10000cfd1cd
2023-03-29T12:33:32.101673839Z debug 2023-03-29T12:33:32.100+0000 7fd6c4315700  7 mds.0.cache traverse: opening base ino 0x10000cfd1cd snap head
2023-03-29T12:33:32.101673839Z debug 2023-03-29T12:33:32.100+0000 7fd6c4315700 10 mds.0.server FAIL on CEPHFS_ESTALE but attempting recovery
2023-03-29T12:33:32.101673839Z debug 2023-03-29T12:33:32.100+0000 7fd6c4315700  5 mds.0.cache find_ino_peers 0x10000cfd1cd hint -1
2023-03-29T12:33:32.101673839Z debug 2023-03-29T12:33:32.100+0000 7fd6c4315700 10 mds.0.cache _do_find_ino_peer 14063259190 0x10000cfd1cd active 0 all 0 checked
2023-03-29T12:33:32.101684532Z debug 2023-03-29T12:33:32.100+0000 7fd6c4315700 10 mds.0.cache _do_find_ino_peer failed on 0x10000cfd1cd
2023-03-29T12:33:32.101684532Z debug 2023-03-29T12:33:32.100+0000 7fd6c4315700 10 MDSContext::complete: 18C_MDS_TryFindInode                                                                                                          2023-03-29T12:33:32.101684532Z debug 2023-03-29T12:33:32.100+0000 7fd6c4315700  7 mds.0.server reply_client_request -116 ((116) Stale file handle) client_request(client.39931710:23624643 getattr Fa #0x10000cfd1cd 2023-03-29T12:22:44.081124+0000 RETRY=236 caller_uid=0, caller_gid=0{}) v5
```

The inode is missing from the MDS cache, which means, running `find` or `ls -R` would load the inode in the MDS cache. Another reason could be that the inode is under purging by the MDS. The inode number is 0x10000cfd1cd. Could you please share the output of:

> ceph tell mds.c dump inode 1099525247437

Additionally, could you also run the following from a cephfs mount:

> find <mntpt> -inum 1099525247437

Comment 29 Venky Shankar 2023-04-11 06:20:36 UTC

Sonal, ping?

Note You need to log in before you can comment on or make changes to this bug.