Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1357641 - In multisite environment, sync and upload operations time out
In multisite environment, sync and upload operations time out
Status: CLOSED ERRATA
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: RGW (Show other bugs)
2.0
Unspecified Unspecified
unspecified Severity unspecified
: rc
: 2.0
Assigned To: Casey Bodley
shilpa
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-07-18 14:16 EDT by shilpa
Modified: 2017-07-31 16:59 EDT (History)
11 users (show)

See Also:
Fixed In Version: RHEL: ceph-10.2.2-26.el7cp Ubuntu: ceph_10.2.2-20redhat1
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-08-23 15:44:25 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Ceph Project Bug Tracker 16759 None None None 2016-07-20 17:10 EDT
Red Hat Product Errata RHBA-2016:1755 normal SHIPPED_LIVE Red Hat Ceph Storage 2.0 bug fix and enhancement update 2016-08-23 19:23:52 EDT

  None (edit)
Description shilpa 2016-07-18 14:16:08 EDT
Description of problem:
In a multisite environment where the master zone and non-master zone were swapped, after uploading few objects from both the zones, the master zone had all the files uploaded and synced. The non-master zone had stopped syncing objects. Later, tried to create a bucket on the non-master zone and that failed too.


Version-Release number of selected component (if applicable):
ceph-radosgw-10.2.2-18.el7cp.x86_64


Steps to Reproduce:
1. Create a multisite configuration and upload and sync objects with rgw1 master and rgw2 non-master.
2. Switch rgw2 to master zone, do a period update commit and restart gateways.
3. Upload objects and check sync status.

Actual results:
All the objects got synced to the new master zone, rgw2. While the first few objects synced to rgw1, all the other object uploads and sync fail with:

('Connection aborted.', error(110, 'Connection timed out')
Comment 3 Casey Bodley 2016-07-19 12:22:52 EDT
Hi Shilpa,

It looks like magna115 was configured as the initial master zone us-1, correct? I'm not seeing any evidence in the logs that it was ever restarted:

$ grep -n --binary-files=text 'ceph version' ceph-rgw-magna115.log-20160716
2:2016-07-15 07:14:33.982866 7f78170c59c0  0 ceph version 10.2.2-18.el7cp (408019449adec8263014b356737cf326544ea7c6), process radosgw, pid 26186

I also searched for the 'period commit', and found two instances of the 'post_period' request:

$ grep --binary-files=text 'post_period:http' ceph-rgw-magna115.log-20160716
2016-07-15 07:14:40.563546 7f762d7da700  2 req 97:1.024208::POST /admin/realm/period:post_period:http status=200
2016-07-15 09:48:01.171323 7f7634fe9700  2 req 87458:1.006765::POST /admin/realm/period:post_period:http status=200

Both of those included the message "period epoch 1 is not newer than current epoch 1, discarding update", so the period configuration doesn't appear to have changed since it was started at time 07:14:33.

Are the logs for zone us-2 available anywhere? Is there any way that your 'zone modify' and 'period update --commit' commands are still in scrollback, so you could copy/paste their output?
Comment 8 Casey Bodley 2016-07-20 17:10:47 EDT
Due to a bug in how we update the sync status markers, we were skipping past bucket entries that hadn't completed.

Yehuda's PR at https://github.com/ceph/ceph/pull/10355 should fix this.
Comment 14 shilpa 2016-07-28 02:16:06 EDT
Tested on  ceph-10.2.2-26.el7cp along with rgw_thread_pool_size=200. I don't see the issue anymore.
Comment 16 errata-xmlrpc 2016-08-23 15:44:25 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-1755.html

Note You need to log in before you can comment on or make changes to this bug.