Description of problem: trying to restart RGW results in all RGWs being restarted at the same time. They should be done serially. Even running a serial upgrade results in all restarting at the same time. https://github.com/ceph/ceph-ansible/commit/ce7ad225d8aeef9f7b2de2617cc00dea983fa25f This is a problem when using ceph-ansible to upgfrade RGWs *all* RGWs will restart simultaneously.
There is a difference between upgrades and restart. The handler will restart all the gateway, we know that and I have a fix pending upstream. This has nothing to do with the rolling update playbook even if there is a dependancy. Thanks.
Will be in 3.0.3, release upstream is here: https://github.com/ceph/ceph-ansible/releases/tag/v3.0.3 Ken, can you build a package? Thanks.
Deploy rgws, change something in the ceph.conf with ceph_conf_overrides, run ansible again, look at the restart sequence. Then look at the process age to make sure they all restarted serially.
Looks like magna079 wasn't restarted, was it? 1h12min ago versus 16min?
Password for the logs?
Ok it didn't notice the line was truncated, looking now
Hi Vidushi, can you paste the group_vars/* as used for the deployment mentioned in c13, please?
Please try again, if you look at this log, you will see that the restart works: https://2.jenkins.ceph.com/view/ceph-ansible-luminous-nightly/job/ceph-ansible-nightly-luminous-ansible2.3-centos7_cluster/28/consoleFull Search for "restart ceph rgw daemon(s) - container", you will see that ceph-rgw0 gets restarted.
Hi Seb, I re-ran the ansible-playbook. I observed that the rgw roles have restarted in the playbook logs as shown below: RUNNING HANDLER [ceph-defaults : copy rgw restart script] ************************************************************************************************************************************************************************************ ok: [magna090] RUNNING HANDLER [ceph-defaults : restart ceph rgw daemon(s) - non container] ***************************************************************************************************************************************************************** changed: [magna090 -> magna100] => (item=magna100) changed: [magna090 -> magna090] => (item=magna090) Via the systemctl status, it looks that the 2 rgw roles have restarted with a difference of 10-12 seconds. O/p shown below: --------------------------- console o/p ----------------------- [root@magna100 ubuntu]# systemctl status ceph-radosgw.service ● ceph-radosgw.service - Ceph rados gateway Loaded: loaded (/usr/lib/systemd/system/ceph-radosgw@.service; enabled; vendor preset: disabled) Active: active (running) since Fri 2017-10-27 12:06:19 UTC; 3min 4s ago Main PID: 29180 (radosgw) CGroup: /system.slice/system-ceph\x2dradosgw.slice/ceph-radosgw.service └─29180 /usr/bin/radosgw -f --cluster ceph --name client.rgw.magna100 --setuser ceph --setgroup ceph Oct 27 12:06:19 magna100 systemd[1]: Started Ceph rados gateway. Oct 27 12:06:19 magna100 systemd[1]: Starting Ceph rados gateway... Oct 27 12:06:19 magna100 radosgw[29180]: warning: line 29: 'host' in section 'client.rgw.magna100' redefined Oct 27 12:06:19 magna100 radosgw[29180]: warning: line 30: 'keyring' in section 'client.rgw.magna100' redefined Oct 27 12:06:19 magna100 radosgw[29180]: warning: line 31: 'log_file' in section 'client.rgw.magna100' redefined [root@magna100 ubuntu]# [root@magna090 ceph-ansible]# systemctl status ceph-radosgw.service ● ceph-radosgw.service - Ceph rados gateway Loaded: loaded (/usr/lib/systemd/system/ceph-radosgw@.service; enabled; vendor preset: disabled) Active: active (running) since Fri 2017-10-27 12:06:30 UTC; 2min 57s ago Main PID: 17471 (radosgw) CGroup: /system.slice/system-ceph\x2dradosgw.slice/ceph-radosgw.service └─17471 /usr/bin/radosgw -f --cluster ceph --name client.rgw.magna090 --setuser ceph --setgroup ceph Oct 27 12:06:30 magna090 systemd[1]: Started Ceph rados gateway. Oct 27 12:06:30 magna090 systemd[1]: Starting Ceph rados gateway... [root@magna090 ceph-ansible]# ------------------------------------------------------------------- Is this sufficient to verify this BZ? Do let me know. Thanks, Vidushi
Also, please let us know what is the expected time difference among the multiple rgw roles for restart?
That's the expected behavior and results are good. Thanks, please move this to VERIFIED.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:3387