Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 1498218

Summary: ceph-ansible RGW role restarts all RGWs simutaneously
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Tupper Cole <tcole>
Component: Ceph-AnsibleAssignee: Sébastien Han <shan>
Status: CLOSED ERRATA QA Contact: Vidushi Mishra <vimishra>
Severity: high Docs Contact:
Priority: high    
Version: 3.0CC: adeza, anharris, aschoen, ceph-eng-bugs, gabrioux, gmeno, hnallurv, icolle, kdreyer, nthomas, sankarshan, shan, tcole, vimishra
Target Milestone: rc   
Target Release: 3.0   
Hardware: All   
OS: Unspecified   
Whiteboard:
Fixed In Version: RHEL: ceph-ansible-3.0.3-1.el7cp Ubuntu: ceph-ansible_3.0.3-2redhat1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-12-05 23:46:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Tupper Cole 2017-10-03 18:36:48 UTC
Description of problem: trying to restart RGW results in all RGWs being restarted at the same time. They should be done serially. 

Even running a serial upgrade results in all restarting at the same time. 

https://github.com/ceph/ceph-ansible/commit/ce7ad225d8aeef9f7b2de2617cc00dea983fa25f

This is a problem when using ceph-ansible to upgfrade RGWs *all* RGWs will restart simultaneously.

Comment 5 Sébastien Han 2017-10-18 06:47:53 UTC
There is a difference between upgrades and restart. The handler will restart all the gateway, we know that and I have a fix pending upstream. This has nothing to do with the rolling update playbook even if there is a dependancy.

Thanks.

Comment 6 Sébastien Han 2017-10-18 07:18:20 UTC
Will be in 3.0.3, release upstream is here: https://github.com/ceph/ceph-ansible/releases/tag/v3.0.3

Ken, can you build a package? Thanks.

Comment 10 Sébastien Han 2017-10-25 12:46:32 UTC
Deploy rgws, change something in the ceph.conf with ceph_conf_overrides, run ansible again, look at the restart sequence. Then look at the process age to make sure they all restarted serially.

Comment 12 Sébastien Han 2017-10-25 13:53:28 UTC
Looks like magna079 wasn't restarted, was it? 1h12min ago versus 16min?

Comment 14 Sébastien Han 2017-10-25 15:05:30 UTC
Password for the logs?

Comment 15 Sébastien Han 2017-10-25 15:07:05 UTC
Ok it didn't notice the line was truncated, looking now

Comment 17 Guillaume Abrioux 2017-10-25 15:23:26 UTC
Hi Vidushi,

can you paste the group_vars/* as used for the deployment mentioned in c13, please?

Comment 19 Sébastien Han 2017-10-27 07:48:02 UTC
Please try again, if you look at this log, you will see that the restart works: https://2.jenkins.ceph.com/view/ceph-ansible-luminous-nightly/job/ceph-ansible-nightly-luminous-ansible2.3-centos7_cluster/28/consoleFull


Search for "restart ceph rgw daemon(s) - container", you will see that ceph-rgw0 gets restarted.

Comment 20 Vidushi Mishra 2017-10-27 12:23:17 UTC
Hi Seb,

I re-ran the ansible-playbook. I observed that the rgw roles have restarted in the playbook logs as shown below:

RUNNING HANDLER [ceph-defaults : copy rgw restart script] ************************************************************************************************************************************************************************************
ok: [magna090]

RUNNING HANDLER [ceph-defaults : restart ceph rgw daemon(s) - non container] *****************************************************************************************************************************************************************
changed: [magna090 -> magna100] => (item=magna100)
changed: [magna090 -> magna090] => (item=magna090)

Via the systemctl status, it looks that the 2 rgw roles have restarted with a difference of 10-12 seconds. O/p shown below:

--------------------------- console  o/p -----------------------

[root@magna100 ubuntu]#  systemctl status ceph-radosgw.service
● ceph-radosgw.service - Ceph rados gateway
   Loaded: loaded (/usr/lib/systemd/system/ceph-radosgw@.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2017-10-27 12:06:19 UTC; 3min 4s ago
 Main PID: 29180 (radosgw)
   CGroup: /system.slice/system-ceph\x2dradosgw.slice/ceph-radosgw.service
           └─29180 /usr/bin/radosgw -f --cluster ceph --name client.rgw.magna100 --setuser ceph --setgroup ceph

Oct 27 12:06:19 magna100 systemd[1]: Started Ceph rados gateway.
Oct 27 12:06:19 magna100 systemd[1]: Starting Ceph rados gateway...
Oct 27 12:06:19 magna100 radosgw[29180]: warning: line 29: 'host' in section 'client.rgw.magna100' redefined
Oct 27 12:06:19 magna100 radosgw[29180]: warning: line 30: 'keyring' in section 'client.rgw.magna100' redefined
Oct 27 12:06:19 magna100 radosgw[29180]: warning: line 31: 'log_file' in section 'client.rgw.magna100' redefined
[root@magna100 ubuntu]# 


[root@magna090 ceph-ansible]# systemctl status ceph-radosgw.service
● ceph-radosgw.service - Ceph rados gateway
   Loaded: loaded (/usr/lib/systemd/system/ceph-radosgw@.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2017-10-27 12:06:30 UTC; 2min 57s ago
 Main PID: 17471 (radosgw)
   CGroup: /system.slice/system-ceph\x2dradosgw.slice/ceph-radosgw.service
           └─17471 /usr/bin/radosgw -f --cluster ceph --name client.rgw.magna090 --setuser ceph --setgroup ceph

Oct 27 12:06:30 magna090 systemd[1]: Started Ceph rados gateway.
Oct 27 12:06:30 magna090 systemd[1]: Starting Ceph rados gateway...
[root@magna090 ceph-ansible]# 

-------------------------------------------------------------------

Is this sufficient to verify this BZ? Do let me know.

Thanks,
Vidushi

Comment 21 Vidushi Mishra 2017-10-27 12:27:50 UTC
Also, please let us know what is the expected time difference among the multiple rgw roles for restart?

Comment 22 Sébastien Han 2017-10-27 13:00:26 UTC
That's the expected behavior and results are good.
Thanks, please move this to VERIFIED.

Comment 26 errata-xmlrpc 2017-12-05 23:46:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3387