2272495 – [RDR] CephFS snapshots: VolumeSnapshot->ReadyToUse is false and stays false forever due to timeout error

Bug 2272495 - [RDR] CephFS snapshots: VolumeSnapshot->ReadyToUse is false and stays false forever due to timeout error [NEEDINFO]

Summary: [RDR] CephFS snapshots: VolumeSnapshot->ReadyToUse is false and stays false f...

Keywords:
Status:	ASSIGNED
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	ceph
Sub Component:
Version:	4.15
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Radoslaw Zarzynski
QA Contact:	Elad
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2024-04-01 16:26 UTC by Aman Agrawal
Modified:	2024-09-12 16:40 UTC (History)
CC List:	15 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2024-04-11 12:51:21 UTC
Embargoed:
Flags:	gfarnum: needinfo? (rzarzyns) khiremat: needinfo- bhubbard: needinfo? (rzarzyns) sheggodu: needinfo? (rzarzyns)

Attachments	(Terms of Use)

Description Aman Agrawal 2024-04-01 16:26:43 UTC

Description of problem (please be detailed as possible and provide log
snippests):


Version of all relevant components (if applicable):
ACM 2.10 GA'ed
ODF 4.15 GA'ed
ceph version 17.2.6-196.el9cp (cbbf2cfb549196ca18c0c9caff9124d83ed681a4) quincy (stable)
OCP 4.15.0-0.nightly-2024-03-24-023440
VolSync 0.9.0
Submariner 0.17 (GA'ed alongside ACM 2.10)


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?


Can this issue reproduce from the UI?
No, but CLI workaround is available

Run ""oc get volumesnapshot -A | grep false"" where the workloads are primary and delete the stale volumesnapshots which are stuck (older ones only). If they are not deleted, remove the finalizers for the VolumeSnapshot and their corresponding VolumeSnapshotContent.

If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Deployed DR protected 6 RBD and 6 CephFS workloads on C1 over a RDR setup of both subscription and appset types (1 each) and failedover (with all clusters up and running) and relocated them in such a way that they are finally running on C1 and maintain a unique state such as Deployed, FailedOver and Relocated (check drpc output below). Such as if busybox-1 is failedover to C2, it is failedover back to C1 and so on (with all clusters up and running).

We also have 4 workloads (2 RBD and 2 CephFS) on C2 and they remain as it is in the Deployed state.


2. After 2nd operation when workloads are finally running on C1, let IOs continue overnight and ensure data sync is progressing well.
3.


Actual results:
Current drpc looks like this:

amanagrawal@Amans-MacBook-Pro ~ % drpc
NAMESPACE              NAME                                    AGE    PREFERREDCLUSTER   FAILOVERCLUSTER   DESIREDSTATE   CURRENTSTATE   PROGRESSION   START TIME             DURATION           PEER READY
busybox-workloads-10   rbd-sub-busybox10-placement-1-drpc      2d7h   amagrawa-c1-29m    amagrawa-c2-29m   Relocate       Relocated      Completed     2024-03-30T11:46:04Z   3m34.194136561s    True
busybox-workloads-11   rbd-sub-busybox11-placement-1-drpc      2d7h   amagrawa-c1-29m                                     Deployed       Completed     2024-03-30T08:50:53Z   17.08813336s       True
busybox-workloads-12   rbd-sub-busybox12-placement-1-drpc      2d7h   amagrawa-c1-29m                                     Deployed       Completed     2024-03-30T08:53:01Z   15.048723358s      True
busybox-workloads-13   rbd-sub-busybox13-placement-1-drpc      2d7h   amagrawa-c2-29m                                     Deployed       Completed     2024-03-30T08:54:07Z   1.052097882s       True
busybox-workloads-14   cephfs-sub-busybox14-placement-1-drpc   2d7h   amagrawa-c2-29m    amagrawa-c1-29m   Failover       FailedOver     Completed     2024-03-30T11:45:33Z   2m13.87941597s     True
busybox-workloads-15   cephfs-sub-busybox15-placement-1-drpc   2d7h   amagrawa-c1-29m    amagrawa-c2-29m   Relocate       Relocated      Completed     2024-03-30T11:45:41Z   2m25.765604202s    True
busybox-workloads-16   cephfs-sub-busybox16-placement-1-drpc   2d7h   amagrawa-c1-29m                                     Deployed       Completed     2024-03-30T08:58:36Z   45.206438694s      True
busybox-workloads-17   cephfs-sub-busybox17-placement-1-drpc   2d7h   amagrawa-c2-29m                                     Deployed       Completed     2024-03-30T09:00:45Z   45.250150435s      True
busybox-workloads-9    rbd-sub-busybox9-placement-1-drpc       2d7h   amagrawa-c2-29m    amagrawa-c1-29m   Failover       FailedOver     Completed     2024-03-30T11:45:51Z   3m52.912910746s    True
openshift-gitops       cephfs-appset-busybox5-placement-drpc   2d7h   amagrawa-c2-29m    amagrawa-c1-29m   Failover       FailedOver     Completed     2024-03-30T11:45:22Z   1m54.720876193s    True
openshift-gitops       cephfs-appset-busybox6-placement-drpc   2d7h   amagrawa-c1-29m    amagrawa-c2-29m   Relocate       Relocated      Completed     2024-03-30T11:45:27Z   4m59.996397972s    True
openshift-gitops       cephfs-appset-busybox7-placement-drpc   2d7h   amagrawa-c1-29m                                     Deployed       Completed     2024-03-30T08:33:50Z   45.322308361s      True
openshift-gitops       cephfs-appset-busybox8-placement-drpc   2d7h   amagrawa-c2-29m                                     Deployed       Completed     2024-03-30T08:34:58Z   51.174493729s      True
openshift-gitops       rbd-appset-busybox1-placement-drpc      2d7h   amagrawa-c2-29m    amagrawa-c1-29m   Failover       FailedOver     Completed     2024-03-30T11:46:12Z   4m35.129560857s    True
openshift-gitops       rbd-appset-busybox2-placement-drpc      2d7h   amagrawa-c1-29m    amagrawa-c2-29m   Relocate       Relocated      Completed     2024-03-30T11:46:15Z   10m42.195053331s   True
openshift-gitops       rbd-appset-busybox3-placement-drpc      2d7h   amagrawa-c1-29m                                     Deployed       Completed     2024-03-30T08:29:02Z   19.120449331s      True
openshift-gitops       rbd-appset-busybox4-placement-drpc      2d7h   amagrawa-c2-29m                                     Deployed       Completed     2024-03-30T08:30:30Z   1.049141938s       True



Data sync wasn't progressing for busybox-workloads-6, 7 and 14


busybox-workloads-6 seems to recover and then busybox-workloads-8 seems to be impacted.


amanagrawal@Amans-MacBook-Pro 01april24 % oc get volumesnapshot -A | grep false
busybox-workloads-14   volsync-busybox-pvc-1-src      false        busybox-pvc-1                                         ocs-storagecluster-cephfsplugin-snapclass   snapcontent-a42538b4-eeef-45c3-94cf-3ef9bdf4feae                  28h
busybox-workloads-7    volsync-busybox-pvc-4-src      false        busybox-pvc-4                                         ocs-storagecluster-cephfsplugin-snapclass   snapcontent-ac363a4b-2b11-443a-b968-2a705381959d                  28h
busybox-workloads-8    busybox-pvc-2-20240401030130   false        busybox-pvc-2                                         ocs-storagecluster-cephfsplugin-snapclass   snapcontent-e9d04116-9cd3-4021-b11d-a2066a46eb0a                  10h


Expected results: VolumeSnapshot->ReadyToUse should be True for CephFS workloads and data sync should progress as expected while running IOs.


Additional info:

Comment 34 Venky Shankar 2024-05-08 06:40:41 UTC

This is the only read call in the cephfs client debug log that's reading data off the OSD (data pool):

```
2024-05-06T11:36:57.841121364Z debug 2024-05-06T11:36:57.840+0000 7f9c27504640 10 client.584545 _read_async 0x10000020458.head(faked_ino=0 nref=10 ll_ref=0 cap_refs={2048=1} open={1=1} mode=100640 size=158/0 nlink
=1 btime=2024-04-30T20:53:25.974991+0000 mtime=2024-04-30T20:53:25.975615+0000 ctime=2024-04-30T20:53:26.386700+0000 change_attr=7 caps=pAsLsXsFscr(0=pAsLsXsFscr) objectset[0x10000020458 ts 0/0 objects 0 dirty_or_
tx 0] parents=0x100000066cb.head[".meta"] 0x5634ec816a00) 0~4096
2024-05-06T11:36:57.841121364Z debug 2024-05-06T11:36:57.840+0000 7f9c27504640 10 client.584545  min_bytes=131072 max_bytes=16777216 max_periods=4
2024-05-06T11:36:57.841121364Z debug 2024-05-06T11:36:57.840+0000 7f9c27504640 10 client.584545.objecter _op_submit op 0x5634ec41fe00
2024-05-06T11:36:57.841135304Z debug 2024-05-06T11:36:57.840+0000 7f9c27504640 20 client.584545.objecter _calc_target epoch 7696 base 10000020458.00000000 @12 precalc_pgid 0 pgid 0.0 is_read
2024-05-06T11:36:57.841135304Z debug 2024-05-06T11:36:57.840+0000 7f9c27504640 20 client.584545.objecter _calc_target target 10000020458.00000000 @12 -> pgid 12.53bb8e39
2024-05-06T11:36:57.841140932Z debug 2024-05-06T11:36:57.840+0000 7f9c27504640 10 client.584545.objecter _calc_target  raw pgid 12.53bb8e39 -> actual 12.19 acting [0,1,2] primary 0
2024-05-06T11:36:57.841179626Z debug 2024-05-06T11:36:57.840+0000 7f9c27504640 20 client.584545.objecter _get_session s=0x5634ec88e480 osd=0 3
2024-05-06T11:36:57.841179626Z debug 2024-05-06T11:36:57.840+0000 7f9c27504640 10 client.584545.objecter _op_submit oid 10000020458.00000000 '@12' '@12' [read 0~158] tid 1 osd.0
2024-05-06T11:36:57.841183709Z debug 2024-05-06T11:36:57.840+0000 7f9c27504640 20 client.584545.objecter get_session s=0x5634ec88e480 osd=0 3
2024-05-06T11:36:57.841183709Z debug 2024-05-06T11:36:57.840+0000 7f9c27504640 15 client.584545.objecter _session_op_assign 0 1
2024-05-06T11:36:57.841199599Z debug 2024-05-06T11:36:57.840+0000 7f9c27504640 15 client.584545.objecter _send_op 1 to 12.19 on osd.0
```

And later...

```
2024-05-06T11:37:12.813418983Z debug 2024-05-06T11:37:12.812+0000 7f9c18b67640 10 client.584545.objecter tick
2024-05-06T11:37:12.813418983Z debug 2024-05-06T11:37:12.812+0000 7f9c18b67640  2 client.584545.objecter  tid 1 on osd.0 is laggy
2024-05-06T11:37:12.813418983Z debug 2024-05-06T11:37:12.812+0000 7f9c18b67640 10 client.584545.objecter _maybe_request_map subscribing (onetime) to next osd map
2024-05-06T11:37:12.813551139Z debug 2024-05-06T11:37:12.812+0000 7f9cedb7c640 20 --2- 10.131.1.11:0/1469616214 >> [v2:242.1.255.250:3300/0,v1:242.1.255.250:6789/0] conn(0x5634ec834400 0x5634ec6bf180 secure :-1 s=READY pgs=110714 cs=0 l=1 rev1=1 crypto rx=0x5634ec799e00 tx=0x5634ec535f80 comp rx=0 tx=0).write_message sending m=0x5634ec8ba4e0 seq=7 src=client.584545 off=0
2024-05-06T11:37:12.814280714Z debug 2024-05-06T11:37:12.813+0000 7f9c16362640 10 client.584545.objecter ms_dispatch 0x5634e28b8000 osd_map(7697..7697 src has 7082..7697)
2024-05-06T11:37:12.814280714Z debug 2024-05-06T11:37:12.813+0000 7f9c16362640  3 client.584545.objecter handle_osd_map got epochs [7697,7697] > 7696
2024-05-06T11:37:12.814280714Z debug 2024-05-06T11:37:12.813+0000 7f9c16362640  3 client.584545.objecter handle_osd_map decoding incremental epoch 7697
2024-05-06T11:37:12.814468790Z debug 2024-05-06T11:37:12.813+0000 7f9c16362640 10 client.584545.objecter  checking op 1
2024-05-06T11:37:12.814468790Z debug 2024-05-06T11:37:12.813+0000 7f9c16362640 20 client.584545.objecter _calc_target epoch 7697 base 10000020458.00000000 @12 precalc_pgid 0 pgid 0.0 is_read
2024-05-06T11:37:12.814468790Z debug 2024-05-06T11:37:12.813+0000 7f9c16362640 20 client.584545.objecter _calc_target target 10000020458.00000000 @12 -> pgid 12.53bb8e39
2024-05-06T11:37:12.814468790Z debug 2024-05-06T11:37:12.813+0000 7f9c16362640 20 client.584545.objecter dump_active .. 0 homeless
2024-05-06T11:37:12.814468790Z debug 2024-05-06T11:37:12.813+0000 7f9c16362640 20 client.584545.objecter 1      12.53bb8e39     osd.0   10000020458.00000000    [read 0~158]
```

The "tid 1 on osd.0 is laggy" log appears over and over again and I believe the [read 0~158] osd_op is tid 1 to osd.0.

Comment 36 Venky Shankar 2024-05-08 07:13:52 UTC

OK. So, the above update was with grep `client.584545` just to see what's happening to the call at the client/objecter level, not necessarily at the messenger layer. That's what upcoming:

[read 0~158] is being sent to osd.0, so objecter initiates a connect with endpoint: [v2:242.1.255.254:6800/2685472366,v1:242.1.255.254:6801/2685472366]. Then, soon after that

```
2024-05-06T11:36:57.841199599Z debug 2024-05-06T11:36:57.840+0000 7f9c27504640  1 -- 10.131.1.11:0/1469616214 --> [v2:242.1.255.254:6800/2685472366,v1:242.1.255.254:6801/2685472366] -- osd_op(unknown.0.0:1 12.19 12:9c71ddca:::10000020458.00000000:head [read 0~158] snapc 0=[] ondisk+read+known_if_redirected+supports_pool_eio e7696) -- 0x5634ec6ef000 con 0x5634ec835000
2024-05-06T11:36:57.841199599Z debug 2024-05-06T11:36:57.840+0000 7f9ced37b640 20 -- 10.131.1.11:0/1469616214 >> [v2:242.1.255.254:6800/2685472366,v1:242.1.255.254:6801/2685472366] conn(0x5634ec835000 msgr2=0x5634ec836b00 unknown :-1 s=STATE_CONNECTING l=0).process
2024-05-06T11:36:57.841204861Z debug 2024-05-06T11:36:57.840+0000 7f9c27504640  5 --2- 10.131.1.11:0/1469616214 >> [v2:242.1.255.254:6800/2685472366,v1:242.1.255.254:6801/2685472366] conn(0x5634ec835000 0x5634ec836b00 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).send_message enqueueing message m=0x5634ec6ef000 type=42 osd_op(client.584545.0:1 12.19 12:9c71ddca:::10000020458.00000000:head [read 0~158] snapc 0=[] ondisk+read+known_if_redirected+supports_pool_eio e7696)
2024-05-06T11:36:57.841204861Z debug 2024-05-06T11:36:57.840+0000 7f9c27504640 15 --2- 10.131.1.11:0/1469616214 >> [v2:242.1.255.254:6800/2685472366,v1:242.1.255.254:6801/2685472366] conn(0x5634ec835000 0x5634ec836b00 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).send_message inline write is denied, reschedule m=0x5634ec6ef000
```

due to (osd) connection still in-progress (which is hinted as "nonblock connect inprogress" below)?

```
2024-05-06T11:36:57.841393513Z debug 2024-05-06T11:36:57.840+0000 7f9ced37b640 20 Event(0x5634e35b0988 nevent=5000 time_id=56).create_file_event create event started fd=83 mask=1 original mask is 0
2024-05-06T11:36:57.841393513Z debug 2024-05-06T11:36:57.840+0000 7f9ced37b640 20 EpollDriver.add_event add event fd=83 cur_mask=0 add_mask=1 to 11
2024-05-06T11:36:57.841393513Z debug 2024-05-06T11:36:57.840+0000 7f9ced37b640 20 Event(0x5634e35b0988 nevent=5000 time_id=56).create_file_event create event end fd=83 mask=1 current mask is 1
2024-05-06T11:36:57.841393513Z debug 2024-05-06T11:36:57.840+0000 7f9ced37b640 10 NetHandler reconnect reconnect: 114 Operation already in progress
2024-05-06T11:36:57.841401866Z debug 2024-05-06T11:36:57.840+0000 7f9ced37b640 10 -- 10.131.1.11:0/1469616214 >> [v2:242.1.255.254:6800/2685472366,v1:242.1.255.254:6801/2685472366] conn(0x5634ec835000 msgr2=0x5634ec836b00 unknown :-1 s=STATE_CONNECTING_RE l=0).process nonblock connect inprogress
```

After going through lot many messenger related logs which I could not fully make sense of, at this point log messages such as

```
2024-05-06T11:37:07.288541431Z debug 2024-05-06T11:37:07.287+0000 7f9cee37d640 10 -- 10.131.1.11:0/751099213 >> [v2:242.1.255.254:6800/2685472366,v1:242.1.255.254:6801/2685472366] conn(0x5634ec428000 msgr2=0x5634ec014100 unknown :-1 s=STATE_CONNECTING_RE l=1).process nonblock connect inprogress
```

are still showing up.

Comment 37 Venky Shankar 2024-05-08 07:54:29 UTC

At "2024-05-06T11:43:10.691169634" the "connect improgress" log messages are still seen:

```
2024-05-06T11:43:10.691169634Z debug 2024-05-06T11:43:10.690+0000 7f9cedb7c640 10 -- 10.131.1.11:0/1651991364 >> [v2:242.1.255.254:6800/2685472366,v1:242.1.255.254:6801/2685472366] conn(0x5634ec8ea400 msgr2=0x5634
ec8f8580 unknown :-1 s=STATE_CONNECTING_RE l=1).process nonblock connect inprogress
```

What's happening at the messenger layer? cc Radek.

Comment 45 Venky Shankar 2024-05-14 04:51:04 UTC

Radek/Brad - https://tracker.ceph.com/issues/65986

Comment 50 Aman Agrawal 2024-05-14 13:47:31 UTC

Now I see that the sync is progressing well on this setup. Probably due to pod restart/node reboot.

Note You need to log in before you can comment on or make changes to this bug.