Bug 1362639

Summary:	Object sync requests skipped in some scenarios
Product:	[Red Hat Storage] Red Hat Ceph Storage	Reporter:	shilpa <smanjara>
Component:	RGW	Assignee:	Matt Benjamin (redhat) <mbenjamin>
Status:	CLOSED WONTFIX	QA Contact:	ceph-qe-bugs <ceph-qe-bugs>
Severity:	medium	Docs Contact:	Bara Ancincova <bancinco>
Priority:	unspecified
Version:	2.0	CC:	anharris, bancinco, cbodley, ceph-eng-bugs, hnallurv, kbader, kdreyer, mbenjamin, owasserm, smanjara, sweil, yehuda
Target Milestone:	rc
Target Release:	3.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Known Issue
Doc Text:	.Object sync requests are sometimes skipped In multi-site configurations of the Ceph Object Gateway, a non-master zone can be promoted to the master zone. In most cases, the master zone's gateway or gateways are still running when this happens. However, if the gateways are down, it can take up to 30 seconds after their restart until the gateways notice that another zone was promoted. During this time, the gateways can miss changes to buckets that occur on other zones. Consequently, object sync requests are skipped. To work around this issue, pull the new master's period to the old master zone before restarting the old master zone: ---- $ radosgw-admin period pull --remote=<new-master-zone-id> ---- For details on pulling the period, see the https://access.redhat.com/documentation/en/red-hat-ceph-storage/2/single/object-gateway-guide-for-red-hat-enterprise-linux[Ceph Object Gateway Guide for Red Hat Enterprise Linux] or the https://access.redhat.com/documentation/en/red-hat-ceph-storage/2/single/object-gateway-guide-for-ubuntu[Ceph Object Gateway Guide for Ubuntu].	Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-07-06 14:37:45 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1322504, 1383917, 1412948

Description shilpa 2016-08-02 18:02:31 UTC

Version-Release number of selected component (if applicable):
ceph-radosgw-10.2.2-32.el7cp.x86_64

Steps to Reproduce:
1. Stop rgw process on master. Switch non-master zone to master.
3. Create new buckets and objects on current master.
4. Bring up rgw process on the old master zone.
5. Upload some more buckets and objects on the current master.
5. Check for sync

Actual results:
All the buckets were synced. The objects created before rgw process restart failed to sync but the other objects created later synced successfully.

I don't see a GET request sent to master on the buckets that are missing objects.

Comment 10 Casey Bodley 2016-08-11 14:01:57 UTC

Thanks Bara,

The only part that I'd clarify is "During this process, the master zone is down and object sync requests can be skipped under certain circumstances."

I'd suggest replacing that sentence with:
Generally, the master zone's gateway(s) will still be running when this happens. But in the case where its gateways are all down, it can take up to 30 seconds after restarting for them to notice that another zone was promoted. During this window, they can miss some changes to buckets that occur on other zones.

Comment 12 Casey Bodley 2016-08-11 14:25:52 UTC

Looks good!

Comment 13 shilpa 2016-08-17 09:27:38 UTC

Hi Bara,

Is this yet to be documented in the release notes?

Comment 17 Matt Benjamin (redhat) 2016-10-03 17:57:03 UTC

The behavior is expected in current multisite sync.  In future releases, multisite sync may support more complex failover and recovery scenarios, there is no upstream tracker issue yet.