Bug 1357641
| Summary: | In multisite environment, sync and upload operations time out | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | shilpa <smanjara> |
| Component: | RGW | Assignee: | Casey Bodley <cbodley> |
| Status: | CLOSED ERRATA | QA Contact: | shilpa <smanjara> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 2.0 | CC: | cbodley, ceph-eng-bugs, hnallurv, kbader, kdreyer, mbenjamin, owasserm, smanjara, sweil, tserlin, yehuda |
| Target Milestone: | rc | ||
| Target Release: | 2.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | RHEL: ceph-10.2.2-26.el7cp Ubuntu: ceph_10.2.2-20redhat1 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-08-23 19:44:25 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Hi Shilpa, It looks like magna115 was configured as the initial master zone us-1, correct? I'm not seeing any evidence in the logs that it was ever restarted: $ grep -n --binary-files=text 'ceph version' ceph-rgw-magna115.log-20160716 2:2016-07-15 07:14:33.982866 7f78170c59c0 0 ceph version 10.2.2-18.el7cp (408019449adec8263014b356737cf326544ea7c6), process radosgw, pid 26186 I also searched for the 'period commit', and found two instances of the 'post_period' request: $ grep --binary-files=text 'post_period:http' ceph-rgw-magna115.log-20160716 2016-07-15 07:14:40.563546 7f762d7da700 2 req 97:1.024208::POST /admin/realm/period:post_period:http status=200 2016-07-15 09:48:01.171323 7f7634fe9700 2 req 87458:1.006765::POST /admin/realm/period:post_period:http status=200 Both of those included the message "period epoch 1 is not newer than current epoch 1, discarding update", so the period configuration doesn't appear to have changed since it was started at time 07:14:33. Are the logs for zone us-2 available anywhere? Is there any way that your 'zone modify' and 'period update --commit' commands are still in scrollback, so you could copy/paste their output? Due to a bug in how we update the sync status markers, we were skipping past bucket entries that hadn't completed. Yehuda's PR at https://github.com/ceph/ceph/pull/10355 should fix this. Tested on ceph-10.2.2-26.el7cp along with rgw_thread_pool_size=200. I don't see the issue anymore. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-1755.html |
Description of problem: In a multisite environment where the master zone and non-master zone were swapped, after uploading few objects from both the zones, the master zone had all the files uploaded and synced. The non-master zone had stopped syncing objects. Later, tried to create a bucket on the non-master zone and that failed too. Version-Release number of selected component (if applicable): ceph-radosgw-10.2.2-18.el7cp.x86_64 Steps to Reproduce: 1. Create a multisite configuration and upload and sync objects with rgw1 master and rgw2 non-master. 2. Switch rgw2 to master zone, do a period update commit and restart gateways. 3. Upload objects and check sync status. Actual results: All the objects got synced to the new master zone, rgw2. While the first few objects synced to rgw1, all the other object uploads and sync fail with: ('Connection aborted.', error(110, 'Connection timed out')