Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1552202 - RGW multi-site segfault received when 'rgw_run_sync_thread = False' is set in ceph.conf
RGW multi-site segfault received when 'rgw_run_sync_thread = False' is set in...
Status: CLOSED ERRATA
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: RGW-Multisite (Show other bugs)
3.0
All Linux
low Severity high
: z4
: 3.0
Assigned To: Matt Benjamin (redhat)
vidushi
:
Depends On:
Blocks: 1553254
  Show dependency treegraph
 
Reported: 2018-03-06 11:44 EST by jquinn
Modified: 2018-07-11 14:11 EDT (History)
12 users (show)

See Also:
Fixed In Version: RHEL: ceph-12.2.4-15.el7cp Ubuntu: 12.2.4-19redhat1xenial
Doc Type: Bug Fix
Doc Text:
Previously, due to a programming error, Ceph RADOS Gateway (RGW) instances in zones configured for multi-site replication would crash if configured to disable sync ("rgw_run_sync_thread = false"). Therefor, multi-site replication environments could not start dedicated non-replication RGW instances. With this update, the "rgw_run_sync_thread" option can be used to configure RGW instances that will not participate in replication even if their zone is replicated. If this option is set for all active RGW instances in the zone, replication will not take place.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2018-07-11 14:11:08 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Ceph Project Bug Tracker 20448 None None None 2018-03-07 05:08 EST
Github ceph/ceph/pull/20769 None None None 2018-03-08 11:49 EST
Github ceph/ceph/pull/20932 None None None 2018-05-21 16:26 EDT
Red Hat Product Errata RHSA-2018:2177 None None None 2018-07-11 14:11 EDT

  None (edit)
Description jquinn 2018-03-06 11:44:06 EST
Description of problem: RGW process receives segfault when 'rgw_run_sync_thread = False' option is set in the ceph.conf for the RGW instance.  In this case they  using this for a containerized deployment of rgw. 

This issue is known in http://tracker.ceph.com/issues/20448, but has not yet been resolved. 

The customer is looking to have 4 RGW instances in a multi-site config, but 2 of them will be dedicated for client requests and not handle replication.  This flag appears to be the only way to handle this request. 


Version-Release number of selected component (if applicable):12.2.1-40


How reproducible:every time


Steps to Reproduce:
1.Deploy RGW instance (multi-site config not needed to re-produce)
2.add rgw_run_sync_thread = False to ceph.conf for the rgw instance
3.restart rgw service. 

Actual results:

[client.rgw.vm250-102.gsslab.pnq2.redhat.com]
debug_rgw = 20
osd_heartbeat_grace = 60 
host = vm250-102
keyring = /var/lib/ceph/radosgw/ceph-rgw.vm250-102/keyring
log file = /var/log/ceph/ceph-rgw-vm250-102.log
rgw frontends = civetweb port=10.74.250.102:8080 num_threads=100
rgw_run_sync_thread = False 


** Journalctl ** 

Mar 06 08:51:19 vm250-102.gsslab.pnq2.redhat.com systemd[1]: ceph-radosgw@rgw.vm250-102.service: main process exited, code=exited, status=1/FAILURE
Mar 06 08:51:20 vm250-102.gsslab.pnq2.redhat.com docker[4023]: Error response from daemon: No such container: ceph-rgw-vm250-102
Mar 06 08:51:20 vm250-102.gsslab.pnq2.redhat.com systemd[1]: Unit ceph-radosgw@rgw.vm250-102.service entered failed state.
Mar 06 08:51:20 vm250-102.gsslab.pnq2.redhat.com systemd[1]: ceph-radosgw@rgw.vm250-102.service failed.
Mar 06 08:51:30 vm250-102.gsslab.pnq2.redhat.com systemd[1]: ceph-radosgw@rgw.vm250-102.service holdoff time over, scheduling restart.
Mar 06 08:51:30 vm250-102.gsslab.pnq2.redhat.com systemd[1]: Starting Ceph RGW...
Mar 06 08:51:30 vm250-102.gsslab.pnq2.redhat.com systemd-journal[16792]: Suppressed 955 messages from /system.slice/docker.service
Mar 06 08:51:30 vm250-102.gsslab.pnq2.redhat.com dockerd-current[11747]: time="2018-03-06T08:51:30.086802327-05:00" level=error msg="Handler for POST /v1.24/containers/ceph-rgw-vm250-102/stop?t=10 returned error: No such container: ceph-r
Mar 06 08:51:30 vm250-102.gsslab.pnq2.redhat.com dockerd-current[11747]: time="2018-03-06T08:51:30.086844695-05:00" level=error msg="Handler for POST /v1.24/containers/ceph-rgw-vm250-102/stop returned error: No such container: ceph-rgw-vm
Mar 06 08:51:30 vm250-102.gsslab.pnq2.redhat.com systemd-journal[16792]: Suppressed 282 messages from /system.slice/system-ceph\x2dradosgw.slice
Mar 06 08:51:30 vm250-102.gsslab.pnq2.redhat.com docker[4032]: Error response from daemon: No such container: ceph-rgw-vm250-102
Mar 06 08:51:30 vm250-102.gsslab.pnq2.redhat.com dockerd-current[11747]: time="2018-03-06T08:51:30.113109222-05:00" level=error msg="Handler for DELETE /v1.24/containers/ceph-rgw-vm250-102 returned error: No such container: ceph-rgw-vm250
Mar 06 08:51:30 vm250-102.gsslab.pnq2.redhat.com dockerd-current[11747]: time="2018-03-06T08:51:30.113140646-05:00" level=error msg="Handler for DELETE /v1.24/containers/ceph-rgw-vm250-102 returned error: No such container: ceph-rgw-vm250
Mar 06 08:51:30 vm250-102.gsslab.pnq2.redhat.com docker[4036]: Error response from daemon: No such container: ceph-rgw-vm250-102
Mar 06 08:51:30 vm250-102.gsslab.pnq2.redhat.com systemd[1]: Started Ceph RGW.
Mar 06 08:51:30 vm250-102.gsslab.pnq2.redhat.com kernel: XFS (dm-3): Mounting V5 Filesystem



[root@vm250-102 ~]# systemctl status ceph-radosgw@rgw.vm250-102.serviceceph-radosgw@rgw.vm250-102.service - Ceph RGW
   Loaded: loaded (/etc/systemd/system/ceph-radosgw@.service; enabled; vendor preset: disabled)
   Active: activating (auto-restart) (Result: exit-code) since Tue 2018-03-06 08:52:40 EST; 3s ago
  Process: 5641 ExecStopPost=/usr/bin/docker stop ceph-rgw-vm250-102 (code=exited, status=1/FAILURE)
  Process: 5387 ExecStart=/usr/bin/docker run --rm --net=host --memory=1g --cpu-quota=100000 -v /var/lib/ceph:/var/lib/ceph -v /etc/ceph:/etc/ceph -e RGW_CIVETWEB_IP=10.74.250.102 -v /etc/localtime:/etc/localtime:ro -e CEPH_DAEMON=RGW -e CLUSTER=ceph -e RGW_CIVETWEB_PORT=8080 --name=ceph-rgw-vm250-102 registry.access.redhat.com/rhceph/rhceph-3-rhel7:latest (code=exited, status=1/FAILURE)
  Process: 5381 ExecStartPre=/usr/bin/docker rm ceph-rgw-vm250-102 (code=exited, status=1/FAILURE)
  Process: 5377 ExecStartPre=/usr/bin/docker stop ceph-rgw-vm250-102 (code=exited, status=1/FAILURE)
 Main PID: 5387 (code=exited, status=1/FAILURE)

Mar 06 08:52:40 vm250-102.gsslab.pnq2.redhat.com systemd[1]: Unit ceph-radosgw@rgw.vm250-102.service entered failed state.
Mar 06 08:52:40 vm250-102.gsslab.pnq2.redhat.com systemd[1]: ceph-radosgw@rgw.vm250-102.service failed.
[root@vm250-102 ~]# 



Expected results:Expect RGW process to start and when value is enabled this instance should not perform replication. 


Additional info:
Comment 3 Orit Wasserman 2018-03-07 05:07:32 EST
upstream fix:
https://github.com/ceph/ceph/pull/20769
Comment 14 errata-xmlrpc 2018-07-11 14:11:08 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2177

Note You need to log in before you can comment on or make changes to this bug.