Description of problem: A rgw service restart is needed on all the RGW clients post a Ceph cluster reboot. Version-Release number of selected component (if applicable): ceph version 10.2.2-21redhat1xenial How reproducible: Always Steps to Reproduce: 1. RGW IO in progress on an 2 way multisite active active 2. Reboot the primary cluster 3. After the primary comes up, the RGW client has lost connection with the cluster, and needs a service restart Expected results: Not sure, but atleast needs a mention to the customer. root@magna086:~# systemctl status ceph-radosgw.service * ceph-radosgw.service - Ceph rados gateway Loaded: loaded (/lib/systemd/system/ceph-radosgw@.service; enabled; vendor preset: enabled) Active: active (running) since Mon 2016-08-01 09:37:43 UTC; 2 days ago Main PID: 3423 (radosgw) CGroup: /system.slice/system-ceph\x2dradosgw.slice/ceph-radosgw.service `-3423 /usr/bin/radosgw -f --cluster master --name client.rgw.magna086 --setuser ceph --setgroup ceph Aug 01 09:37:43 magna086 systemd[1]: Started Ceph rados gateway. Aug 02 07:43:14 magna086 radosgw[3423]: 2016-08-02 07:43:14.837382 7f135effd700 -1 RGWWatcher::handle_error cookie 94141816127056 err (110) Connection timed out Aug 02 07:43:14 magna086 radosgw[3423]: 2016-08-02 07:43:14.838232 7f135effd700 -1 RGWWatcher::handle_error cookie 94141816151104 err (110) Connection timed out Aug 03 09:35:06 magna086 radosgw[3423]: 2016-08-03 09:35:06.263131 7f135effd700 -1 RGWWatcher::handle_error cookie 94141816111696 err (107) Transport endpoint is not connected Aug 03 09:40:06 magna086 radosgw[3423]: 2016-08-03 09:40:06.295158 7f135effd700 -1 RGWWatcher::handle_error cookie 94141816148448 err (107) Transport endpoint is not connected root@magna086:~#
We'd like more information on the amount of delay required before the secondary/primary radosgw process can be successfully restarted.
Matt, who can provide that information?
(In reply to Ken Dreyer (Red Hat) from comment #6) > Matt, who can provide that information? I don't recall the need for this info--I'll coordinate w/Shilpa Monday.
Verified on ceph-10.2.5-27
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2017-0514.html