1362639 – Object sync requests skipped in some scenarios

Bug 1362639 - Object sync requests skipped in some scenarios

Summary: Object sync requests skipped in some scenarios

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	RGW
Sub Component:
Version:	2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	rc
Target Release:	3.0
Assignee:	Matt Benjamin (redhat)
QA Contact:	ceph-qe-bugs
Docs Contact:	Bara Ancincova
URL:
Whiteboard:
Depends On:
Blocks:	1322504 1383917 1412948
TreeView+	depends on / blocked

Reported:	2016-08-02 18:02 UTC by shilpa
Modified:	2022-02-21 18:06 UTC (History)
CC List:	12 users (show)
Fixed In Version:
Doc Type:	Known Issue
Doc Text:	.Object sync requests are sometimes skipped In multi-site configurations of the Ceph Object Gateway, a non-master zone can be promoted to the master zone. In most cases, the master zone's gateway or gateways are still running when this happens. However, if the gateways are down, it can take up to 30 seconds after their restart until the gateways notice that another zone was promoted. During this time, the gateways can miss changes to buckets that occur on other zones. Consequently, object sync requests are skipped. To work around this issue, pull the new master's period to the old master zone before restarting the old master zone: ---- $ radosgw-admin period pull --remote=<new-master-zone-id> ---- For details on pulling the period, see the https://access.redhat.com/documentation/en/red-hat-ceph-storage/2/single/object-gateway-guide-for-red-hat-enterprise-linux[Ceph Object Gateway Guide for Red Hat Enterprise Linux] or the https://access.redhat.com/documentation/en/red-hat-ceph-storage/2/single/object-gateway-guide-for-ubuntu[Ceph Object Gateway Guide for Ubuntu].
Clone Of:
Environment:
Last Closed:	2017-07-06 14:37:45 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description shilpa 2016-08-02 18:02:31 UTC

Version-Release number of selected component (if applicable):
ceph-radosgw-10.2.2-32.el7cp.x86_64

Steps to Reproduce:
1. Stop rgw process on master. Switch non-master zone to master.
3. Create new buckets and objects on current master.
4. Bring up rgw process on the old master zone.
5. Upload some more buckets and objects on the current master.
5. Check for sync

Actual results:
All the buckets were synced. The objects created before rgw process restart failed to sync but the other objects created later synced successfully.

I don't see a GET request sent to master on the buckets that are missing objects.

Comment 10 Casey Bodley 2016-08-11 14:01:57 UTC

Thanks Bara,

The only part that I'd clarify is "During this process, the master zone is down and object sync requests can be skipped under certain circumstances."

I'd suggest replacing that sentence with:
Generally, the master zone's gateway(s) will still be running when this happens. But in the case where its gateways are all down, it can take up to 30 seconds after restarting for them to notice that another zone was promoted. During this window, they can miss some changes to buckets that occur on other zones.

Comment 12 Casey Bodley 2016-08-11 14:25:52 UTC

Looks good!

Comment 13 shilpa 2016-08-17 09:27:38 UTC

Hi Bara,

Is this yet to be documented in the release notes?

Comment 17 Matt Benjamin (redhat) 2016-10-03 17:57:03 UTC

The behavior is expected in current multisite sync.  In future releases, multisite sync may support more complex failover and recovery scenarios, there is no upstream tracker issue yet.

Note You need to log in before you can comment on or make changes to this bug.