Bug 1498218 - ceph-ansible RGW role restarts all RGWs simutaneously
Summary: ceph-ansible RGW role restarts all RGWs simutaneously
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Ceph-Ansible
Version: 3.0
Hardware: All
OS: Unspecified
high
high
Target Milestone: rc
: 3.0
Assignee: Sébastien Han
QA Contact: Vidushi Mishra
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-10-03 18:36 UTC by Tupper Cole
Modified: 2017-12-05 23:46 UTC (History)
14 users (show)

Fixed In Version: RHEL: ceph-ansible-3.0.3-1.el7cp Ubuntu: ceph-ansible_3.0.3-2redhat1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-12-05 23:46:42 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph-ansible pull 2068 0 None None None 2017-10-18 06:47:52 UTC
Red Hat Product Errata RHBA-2017:3387 0 normal SHIPPED_LIVE Red Hat Ceph Storage 3.0 bug fix and enhancement update 2017-12-06 03:03:45 UTC

Description Tupper Cole 2017-10-03 18:36:48 UTC
Description of problem: trying to restart RGW results in all RGWs being restarted at the same time. They should be done serially. 

Even running a serial upgrade results in all restarting at the same time. 

https://github.com/ceph/ceph-ansible/commit/ce7ad225d8aeef9f7b2de2617cc00dea983fa25f

This is a problem when using ceph-ansible to upgfrade RGWs *all* RGWs will restart simultaneously.

Comment 5 Sébastien Han 2017-10-18 06:47:53 UTC
There is a difference between upgrades and restart. The handler will restart all the gateway, we know that and I have a fix pending upstream. This has nothing to do with the rolling update playbook even if there is a dependancy.

Thanks.

Comment 6 Sébastien Han 2017-10-18 07:18:20 UTC
Will be in 3.0.3, release upstream is here: https://github.com/ceph/ceph-ansible/releases/tag/v3.0.3

Ken, can you build a package? Thanks.

Comment 10 Sébastien Han 2017-10-25 12:46:32 UTC
Deploy rgws, change something in the ceph.conf with ceph_conf_overrides, run ansible again, look at the restart sequence. Then look at the process age to make sure they all restarted serially.

Comment 12 Sébastien Han 2017-10-25 13:53:28 UTC
Looks like magna079 wasn't restarted, was it? 1h12min ago versus 16min?

Comment 14 Sébastien Han 2017-10-25 15:05:30 UTC
Password for the logs?

Comment 15 Sébastien Han 2017-10-25 15:07:05 UTC
Ok it didn't notice the line was truncated, looking now

Comment 17 Guillaume Abrioux 2017-10-25 15:23:26 UTC
Hi Vidushi,

can you paste the group_vars/* as used for the deployment mentioned in c13, please?

Comment 19 Sébastien Han 2017-10-27 07:48:02 UTC
Please try again, if you look at this log, you will see that the restart works: https://2.jenkins.ceph.com/view/ceph-ansible-luminous-nightly/job/ceph-ansible-nightly-luminous-ansible2.3-centos7_cluster/28/consoleFull


Search for "restart ceph rgw daemon(s) - container", you will see that ceph-rgw0 gets restarted.

Comment 20 Vidushi Mishra 2017-10-27 12:23:17 UTC
Hi Seb,

I re-ran the ansible-playbook. I observed that the rgw roles have restarted in the playbook logs as shown below:

RUNNING HANDLER [ceph-defaults : copy rgw restart script] ************************************************************************************************************************************************************************************
ok: [magna090]

RUNNING HANDLER [ceph-defaults : restart ceph rgw daemon(s) - non container] *****************************************************************************************************************************************************************
changed: [magna090 -> magna100] => (item=magna100)
changed: [magna090 -> magna090] => (item=magna090)

Via the systemctl status, it looks that the 2 rgw roles have restarted with a difference of 10-12 seconds. O/p shown below:

--------------------------- console  o/p -----------------------

[root@magna100 ubuntu]#  systemctl status ceph-radosgw.service
● ceph-radosgw.service - Ceph rados gateway
   Loaded: loaded (/usr/lib/systemd/system/ceph-radosgw@.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2017-10-27 12:06:19 UTC; 3min 4s ago
 Main PID: 29180 (radosgw)
   CGroup: /system.slice/system-ceph\x2dradosgw.slice/ceph-radosgw.service
           └─29180 /usr/bin/radosgw -f --cluster ceph --name client.rgw.magna100 --setuser ceph --setgroup ceph

Oct 27 12:06:19 magna100 systemd[1]: Started Ceph rados gateway.
Oct 27 12:06:19 magna100 systemd[1]: Starting Ceph rados gateway...
Oct 27 12:06:19 magna100 radosgw[29180]: warning: line 29: 'host' in section 'client.rgw.magna100' redefined
Oct 27 12:06:19 magna100 radosgw[29180]: warning: line 30: 'keyring' in section 'client.rgw.magna100' redefined
Oct 27 12:06:19 magna100 radosgw[29180]: warning: line 31: 'log_file' in section 'client.rgw.magna100' redefined
[root@magna100 ubuntu]# 


[root@magna090 ceph-ansible]# systemctl status ceph-radosgw.service
● ceph-radosgw.service - Ceph rados gateway
   Loaded: loaded (/usr/lib/systemd/system/ceph-radosgw@.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2017-10-27 12:06:30 UTC; 2min 57s ago
 Main PID: 17471 (radosgw)
   CGroup: /system.slice/system-ceph\x2dradosgw.slice/ceph-radosgw.service
           └─17471 /usr/bin/radosgw -f --cluster ceph --name client.rgw.magna090 --setuser ceph --setgroup ceph

Oct 27 12:06:30 magna090 systemd[1]: Started Ceph rados gateway.
Oct 27 12:06:30 magna090 systemd[1]: Starting Ceph rados gateway...
[root@magna090 ceph-ansible]# 

-------------------------------------------------------------------

Is this sufficient to verify this BZ? Do let me know.

Thanks,
Vidushi

Comment 21 Vidushi Mishra 2017-10-27 12:27:50 UTC
Also, please let us know what is the expected time difference among the multiple rgw roles for restart?

Comment 22 Sébastien Han 2017-10-27 13:00:26 UTC
That's the expected behavior and results are good.
Thanks, please move this to VERIFIED.

Comment 26 errata-xmlrpc 2017-12-05 23:46:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3387


Note You need to log in before you can comment on or make changes to this bug.