Bug 2168869 - Multi-site sync doesn't work for some buckets in v5.3 [NEEDINFO]
Summary: Multi-site sync doesn't work for some buckets in v5.3
Keywords:
Status: ASSIGNED
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RGW-Multisite
Version: 5.3
Hardware: x86_64
OS: Linux
high
medium
Target Milestone: ---
: 6.1z2
Assignee: shilpa
QA Contact: Madhavi Kasturi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-02-10 11:06 UTC by Yalcin
Modified: 2023-08-04 15:44 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Embargoed:
lsantann: needinfo? (smanjara)
rsachere: needinfo? (smanjara)
rsachere: needinfo? (smanjara)
rsachere: needinfo? (smanjara)


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHCEPH-6126 0 None None None 2023-02-10 11:07:21 UTC

Description Yalcin 2023-02-10 11:06:12 UTC
Description of problem:
Some buckets are not being synced and there are continuous "io blocked" errors in RGW debug logs:
"
2023-02-09T10:28:11.239+0100 7f46f8902700 20 rgw rados thread: cr:s=0x55793ad3ba40:op=0x55793adaa400:18RGWDataSyncShardCR: operate()
2023-02-09T10:28:11.239+0100 7f46f8902700 20 rgw rados thread: cr:s=0x55793ad3ba40:op=0x55793adaa400:18RGWDataSyncShardCR: operate() returned r=-16
2023-02-09T10:28:11.239+0100 7f46f8902700 20 rgw rados thread: cr:s=0x55793b4e77c0:op=0x55793acd6000:20RGWSimpleRadosLockCR: operate()
2023-02-09T10:28:11.239+0100 7f46f8902700 20 rgw rados thread: cr:s=0x55793ad3ba40:op=0x55793ad52300:25RGWDataSyncShardControlCR: operate()
2023-02-09T10:28:11.239+0100 7f46f8902700 20 run: stack=0x55793ad3ba40 is io blocked
2023-02-09T10:28:11.239+0100 7f46f8902700 20 rgw rados thread: cr:s=0x55793b4e77c0:op=0x55793acd6000:20RGWSimpleRadosLockCR: operate()
2023-02-09T10:28:11.239+0100 7f46f8902700 20 enqueued request req=0x55793ae7f2c0
2023-02-09T10:28:11.239+0100 7f46f8902700 20 RGWWQ:
2023-02-09T10:28:11.239+0100 7f46f8902700 20 req: 0x55793ae7f2c0
2023-02-09T10:28:11.239+0100 7f46f8902700 20 run: stack=0x55793b4e77c0 is io blocked
2023-02-09T10:28:11.239+0100 7f470ed48700 20 dequeued request req=0x55793ae7f2c0
2023-02-09T10:28:11.239+0100 7f470ed48700 20 RGWWQ: empty
2023-02-09T10:28:11.239+0100 7f46f8902700 20 rgw rados thread: cr:s=0x55793ad02140:op=0x55793acfa700:25RGWDataSyncShardControlCR: operate()
2023-02-09T10:28:11.239+0100 7f46f8902700 20 rgw rados thread: cr:s=0x55793ad02140:op=0x55793adaa400:18RGWDataSyncShardCR: operate()
2023-02-09T10:28:11.239+0100 7f46f8902700 10 RGW-SYNC:data:sync:shard[32]: start incremental sync
2023-02-09T10:28:11.239+0100 7f46f8902700 20 rgw rados thread: cr:s=0x55793a597400:op=0x55793aec0a00:20RGWContinuousLeaseCR: operate()
2023-02-09T10:28:11.239+0100 7f46f8902700 20 rgw rados thread: cr:s=0x55793ad02140:op=0x55793adaa400:18RGWDataSyncShardCR: operate()
2023-02-09T10:28:11.239+0100 7f46f8902700 20 run: stack=0x55793ad02140 is_blocked_by_stack()=0 is_sleeping=1 waiting_for_child()=0
2023-02-09T10:28:11.239+0100 7f46f8902700 20 rgw rados thread: cr:s=0x55793a597400:op=0x55793f13bc00:20RGWSimpleRadosLockCR: operate()
2023-02-09T10:28:11.239+0100 7f46f8902700 20 rgw rados thread: cr:s=0x55793a597400:op=0x55793f13bc00:20RGWSimpleRadosLockCR: operate()
2023-02-09T10:28:11.239+0100 7f46f8902700 20 enqueued request req=0x55793ba7c8c0
2023-02-09T10:28:11.239+0100 7f46f8902700 20 RGWWQ:
2023-02-09T10:28:11.239+0100 7f46f8902700 20 req: 0x55793ba7c8c0
2023-02-09T10:28:11.239+0100 7f46f8902700 20 run: stack=0x55793a597400 is io blocked
2023-02-09T10:28:11.239+0100 7f470152d700 20 dequeued request req=0x55793ba7c8c0
2023-02-09T10:28:11.239+0100 7f470152d700 20 RGWWQ: empty
"
There were several multi-site sync errors in v5.1, so the clusters were updated to v5.3. Most of the errors are gone with the update but there are still some buckets not synced.
bucket sync enable/disable, bucket sync init/run didn't help.

Version-Release number of selected component (if applicable):
Ceph v5.3

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:
There is difference between 2 sites within some buckets

Expected results:
All buckets to be synced


Additional info:

Comment 43 Raimund Sacherer 2023-03-13 08:22:36 UTC
Hello, 

Please see the comment below for clients answer. 

best regards
Raimund


Note You need to log in before you can comment on or make changes to this bug.