Description of problem: Increasing the number of RGW instances from "1" to "2" with changinf the parameter radosgw_num_instances to "2". It fails with the following error logs for new instances; stdout: |- Socket file could not be found, which means Rados Gateway is not running. Showing ceph-rgw unit logs now: -- Logs begin at Mon 2020-06-29 06:58:57 EDT, end at Mon 2020-06-29 07:28:35 EDT. -- Jun 29 07:27:00 ceph4osd3.aydin.lab systemd[1]: Failed to load environment files: No such file or directory Jun 29 07:27:00 ceph4osd3.aydin.lab systemd[1]: ceph-radosgw.rgw1.service failed to run 'start-pre' task: No such file or directory Jun 29 07:27:00 ceph4osd3.aydin.lab systemd[1]: Failed to start Ceph RGW. Jun 29 07:27:00 ceph4osd3.aydin.lab systemd[1]: Unit ceph-radosgw.rgw1.service entered failed state. Jun 29 07:27:00 ceph4osd3.aydin.lab systemd[1]: ceph-radosgw.rgw1.service failed. Jun 29 07:27:10 ceph4osd3.aydin.lab systemd[1]: ceph-radosgw.rgw1.service holdoff time over, scheduling restart. Jun 29 07:27:10 ceph4osd3.aydin.lab systemd[1]: Stopped Ceph RGW. Jun 29 07:27:10 ceph4osd3.aydin.lab systemd[1]: Failed to load environment files: No such file or directory Jun 29 07:27:10 ceph4osd3.aydin.lab systemd[1]: ceph-radosgw.rgw1.service failed to run 'start-pre' task: No such file or directory Jun 29 07:27:10 ceph4osd3.aydin.lab systemd[1]: Failed to start Ceph RGW. Jun 29 07:27:10 ceph4osd3.aydin.lab systemd[1]: Unit ceph-radosgw.rgw1.service entered failed state. Jun 29 07:27:10 ceph4osd3.aydin.lab systemd[1]: ceph-radosgw.rgw1.service failed. Version-Release number of selected component (if applicable): RHCS 4.1 How reproducible: Always Steps to Reproduce: - Install the ceph cluster w/o setting the radosgw_num_instances parameter at all.yml - Add the "radosgw_num_instances: 2" variable to all.yaml Actual results: stdout: |- Socket file could not be found, which means Rados Gateway is not running. Showing ceph-rgw unit logs now: -- Logs begin at Mon 2020-06-29 06:58:57 EDT, end at Mon 2020-06-29 07:28:35 EDT. -- Jun 29 07:27:00 ceph4osd3.aydin.lab systemd[1]: Failed to load environment files: No such file or directory Jun 29 07:27:00 ceph4osd3.aydin.lab systemd[1]: ceph-radosgw.rgw1.service failed to run 'start-pre' task: No such file or directory Jun 29 07:27:00 ceph4osd3.aydin.lab systemd[1]: Failed to start Ceph RGW. Jun 29 07:27:00 ceph4osd3.aydin.lab systemd[1]: Unit ceph-radosgw.rgw1.service entered failed state. Jun 29 07:27:00 ceph4osd3.aydin.lab systemd[1]: ceph-radosgw.rgw1.service failed. Jun 29 07:27:10 ceph4osd3.aydin.lab systemd[1]: ceph-radosgw.rgw1.service holdoff time over, scheduling restart. Jun 29 07:27:10 ceph4osd3.aydin.lab systemd[1]: Stopped Ceph RGW. Jun 29 07:27:10 ceph4osd3.aydin.lab systemd[1]: Failed to load environment files: No such file or directory Jun 29 07:27:10 ceph4osd3.aydin.lab systemd[1]: ceph-radosgw.rgw1.service failed to run 'start-pre' task: No such file or directory Jun 29 07:27:10 ceph4osd3.aydin.lab systemd[1]: Failed to start Ceph RGW. Jun 29 07:27:10 ceph4osd3.aydin.lab systemd[1]: Unit ceph-radosgw.rgw1.service entered failed state. Jun 29 07:27:10 ceph4osd3.aydin.lab systemd[1]: ceph-radosgw.rgw1.service failed. Expected results: - Second instances are provisioned successfuly at the first attempt Additional info: - When running the site-docker.yaml again w/o changing anything it succeeds.
Created attachment 1699123 [details] ansible logs from management VM
Created attachment 1699124 [details] sosreport from one of the rgw nodes
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat Ceph Storage 4.1 Bug Fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4144