Bug 1359712

Summary: A master zone switch requires radosgw to be restarted
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: shilpa <smanjara>
Component: RGWAssignee: Casey Bodley <cbodley>
Status: CLOSED ERRATA QA Contact: Rachana Patel <racpatel>
Severity: medium Docs Contact: Bara Ancincova <bancinco>
Priority: high    
Version: 2.0CC: cbodley, ceph-eng-bugs, hnallurv, icolle, kbader, kdreyer, mbenjamin, owasserm, smanjara, sweil, uboppana
Target Milestone: rc   
Target Release: 2.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: RHEL: ceph-10.2.3-2.el7cp Ubuntu: ceph_10.2.3-3redhat1xenial Doc Type: Bug Fix
Doc Text:
.A restart of the radosgw process is no longer required after switching the zone from master to non-master When a non-master zone was promoted to the master zone, all I/0 requests became unresponsive until the `radosgw` process was restarted on both zones. Consequently, the I/0 requests timed out. The underlying source code has been modified, and restarting `radosgw` is no longer required in the described situation.
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-22 19:28:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1322504, 1383917    

Description shilpa 2016-07-25 10:39:27 UTC
Description of problem:
When the zone is switched from master to non-master, all I/O requests hang until a rgw process restart on both the zones

Version-Release number of selected component (if applicable):
ceph-radosgw-10.2.2-27.el7cp.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Modify non-master zone with '--master' flag.
radosgw-admin zone modify --rgw-zonegroup=us --rgw-zone=us-2
--access_key=secret --secret=secret --endpoints=http://magna059:80 --default
--master
2. Update and commit the period


Actual results:
A radosgw restart should not be expected with a period configuration change. However all the I/O's hang until a process restart.

Additional info:

Traceback (most recent call last):
  File "s3del.py", line 21, in <module>
    conn.delete_bucket(buck.name)
  File "/usr/lib/python2.7/site-packages/boto/s3/connection.py", line 641, in delete_bucket
    response = self.make_request('DELETE', bucket, headers=headers)
  File "/usr/lib/python2.7/site-packages/boto/s3/connection.py", line 668, in make_request
    retry_handler=retry_handler
  File "/usr/lib/python2.7/site-packages/boto/connection.py", line 1071, in make_request
    retry_handler=retry_handler)
  File "/usr/lib/python2.7/site-packages/boto/connection.py", line 1028, in _mexe
    raise BotoServerError(response.status, response.reason, body)
boto.exception.BotoServerError: BotoServerError: 504 Gateway Time-out
<html><body><h1>504 Gateway Time-out</h1>
The server didn't respond in time.
</body></html>

Comment 7 Casey Bodley 2016-09-08 18:01:57 UTC
Hi Shilpa,

We have upstream testing that makes multiple changes to the master zone without restarting gateways, and we haven't seen it hit this issue. Can you try to reproduce the issue with the latest build?

Comment 8 shilpa 2016-09-09 08:10:27 UTC
(In reply to Casey Bodley from comment #7)
> Hi Shilpa,
> 
> We have upstream testing that makes multiple changes to the master zone
> without restarting gateways, and we haven't seen it hit this issue. Can you
> try to reproduce the issue with the latest build?

Hi Casey, 

Sure, I will try it on 2.0 Async build?

Comment 11 shilpa 2016-11-04 10:29:57 UTC
Verified on 10.2.3-12. No gateway restart is required to switch master.

Comment 14 shilpa 2016-11-22 13:44:51 UTC
Hi Bara,

It looks good to me.

Comment 16 errata-xmlrpc 2016-11-22 19:28:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2815.html