Description of problem: RGW process receives segfault when 'rgw_run_sync_thread = False' option is set in the ceph.conf for the RGW instance. In this case they using this for a containerized deployment of rgw. This issue is known in http://tracker.ceph.com/issues/20448, but has not yet been resolved. The customer is looking to have 4 RGW instances in a multi-site config, but 2 of them will be dedicated for client requests and not handle replication. This flag appears to be the only way to handle this request. Version-Release number of selected component (if applicable):12.2.1-40 How reproducible:every time Steps to Reproduce: 1.Deploy RGW instance (multi-site config not needed to re-produce) 2.add rgw_run_sync_thread = False to ceph.conf for the rgw instance 3.restart rgw service. Actual results: [client.rgw.vm250-102.gsslab.pnq2.redhat.com] debug_rgw = 20 osd_heartbeat_grace = 60 host = vm250-102 keyring = /var/lib/ceph/radosgw/ceph-rgw.vm250-102/keyring log file = /var/log/ceph/ceph-rgw-vm250-102.log rgw frontends = civetweb port=10.74.250.102:8080 num_threads=100 rgw_run_sync_thread = False ** Journalctl ** Mar 06 08:51:19 vm250-102.gsslab.pnq2.redhat.com systemd[1]: ceph-radosgw.service: main process exited, code=exited, status=1/FAILURE Mar 06 08:51:20 vm250-102.gsslab.pnq2.redhat.com docker[4023]: Error response from daemon: No such container: ceph-rgw-vm250-102 Mar 06 08:51:20 vm250-102.gsslab.pnq2.redhat.com systemd[1]: Unit ceph-radosgw.service entered failed state. Mar 06 08:51:20 vm250-102.gsslab.pnq2.redhat.com systemd[1]: ceph-radosgw.service failed. Mar 06 08:51:30 vm250-102.gsslab.pnq2.redhat.com systemd[1]: ceph-radosgw.service holdoff time over, scheduling restart. Mar 06 08:51:30 vm250-102.gsslab.pnq2.redhat.com systemd[1]: Starting Ceph RGW... Mar 06 08:51:30 vm250-102.gsslab.pnq2.redhat.com systemd-journal[16792]: Suppressed 955 messages from /system.slice/docker.service Mar 06 08:51:30 vm250-102.gsslab.pnq2.redhat.com dockerd-current[11747]: time="2018-03-06T08:51:30.086802327-05:00" level=error msg="Handler for POST /v1.24/containers/ceph-rgw-vm250-102/stop?t=10 returned error: No such container: ceph-r Mar 06 08:51:30 vm250-102.gsslab.pnq2.redhat.com dockerd-current[11747]: time="2018-03-06T08:51:30.086844695-05:00" level=error msg="Handler for POST /v1.24/containers/ceph-rgw-vm250-102/stop returned error: No such container: ceph-rgw-vm Mar 06 08:51:30 vm250-102.gsslab.pnq2.redhat.com systemd-journal[16792]: Suppressed 282 messages from /system.slice/system-ceph\x2dradosgw.slice Mar 06 08:51:30 vm250-102.gsslab.pnq2.redhat.com docker[4032]: Error response from daemon: No such container: ceph-rgw-vm250-102 Mar 06 08:51:30 vm250-102.gsslab.pnq2.redhat.com dockerd-current[11747]: time="2018-03-06T08:51:30.113109222-05:00" level=error msg="Handler for DELETE /v1.24/containers/ceph-rgw-vm250-102 returned error: No such container: ceph-rgw-vm250 Mar 06 08:51:30 vm250-102.gsslab.pnq2.redhat.com dockerd-current[11747]: time="2018-03-06T08:51:30.113140646-05:00" level=error msg="Handler for DELETE /v1.24/containers/ceph-rgw-vm250-102 returned error: No such container: ceph-rgw-vm250 Mar 06 08:51:30 vm250-102.gsslab.pnq2.redhat.com docker[4036]: Error response from daemon: No such container: ceph-rgw-vm250-102 Mar 06 08:51:30 vm250-102.gsslab.pnq2.redhat.com systemd[1]: Started Ceph RGW. Mar 06 08:51:30 vm250-102.gsslab.pnq2.redhat.com kernel: XFS (dm-3): Mounting V5 Filesystem [root@vm250-102 ~]# systemctl status ceph-radosgw.service ● ceph-radosgw.service - Ceph RGW Loaded: loaded (/etc/systemd/system/ceph-radosgw@.service; enabled; vendor preset: disabled) Active: activating (auto-restart) (Result: exit-code) since Tue 2018-03-06 08:52:40 EST; 3s ago Process: 5641 ExecStopPost=/usr/bin/docker stop ceph-rgw-vm250-102 (code=exited, status=1/FAILURE) Process: 5387 ExecStart=/usr/bin/docker run --rm --net=host --memory=1g --cpu-quota=100000 -v /var/lib/ceph:/var/lib/ceph -v /etc/ceph:/etc/ceph -e RGW_CIVETWEB_IP=10.74.250.102 -v /etc/localtime:/etc/localtime:ro -e CEPH_DAEMON=RGW -e CLUSTER=ceph -e RGW_CIVETWEB_PORT=8080 --name=ceph-rgw-vm250-102 registry.access.redhat.com/rhceph/rhceph-3-rhel7:latest (code=exited, status=1/FAILURE) Process: 5381 ExecStartPre=/usr/bin/docker rm ceph-rgw-vm250-102 (code=exited, status=1/FAILURE) Process: 5377 ExecStartPre=/usr/bin/docker stop ceph-rgw-vm250-102 (code=exited, status=1/FAILURE) Main PID: 5387 (code=exited, status=1/FAILURE) Mar 06 08:52:40 vm250-102.gsslab.pnq2.redhat.com systemd[1]: Unit ceph-radosgw.service entered failed state. Mar 06 08:52:40 vm250-102.gsslab.pnq2.redhat.com systemd[1]: ceph-radosgw.service failed. [root@vm250-102 ~]# Expected results:Expect RGW process to start and when value is enabled this instance should not perform replication. Additional info:
upstream fix: https://github.com/ceph/ceph/pull/20769
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2177