Bug 1851906 - [Ceph] Increasing the number of RGW instances from 1 to 2 fails with "Failed to load environment files: No such file or directory"
Summary: [Ceph] Increasing the number of RGW instances from 1 to 2 fails with "Failed ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Ceph-Ansible
Version: 4.1
Hardware: All
OS: Linux
unspecified
medium
Target Milestone: z2
: 4.1
Assignee: Guillaume Abrioux
QA Contact: Sunil Kumar Nagaraju
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-06-29 11:42 UTC by Mustafa Aydın
Modified: 2020-09-30 17:26 UTC (History)
10 users (show)

Fixed In Version: ceph-ansible-4.0.29-1.el8cp, ceph-ansible-4.0.29-1.el7cp
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-09-30 17:26:19 UTC
Embargoed:


Attachments (Terms of Use)
ansible logs from management VM (7.98 MB, application/gzip)
2020-06-29 11:56 UTC, Mustafa Aydın
no flags Details
sosreport from one of the rgw nodes (11.22 MB, application/x-xz)
2020-06-29 12:01 UTC, Mustafa Aydın
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph-ansible pull 5487 0 None closed rgw: fix multi instances scaleout 2021-01-12 15:53:39 UTC
Red Hat Product Errata RHBA-2020:4144 0 None None None 2020-09-30 17:26:44 UTC

Description Mustafa Aydın 2020-06-29 11:42:20 UTC
Description of problem:

Increasing the number of RGW instances from "1" to "2" with changinf the parameter radosgw_num_instances to  "2". It fails with the following error logs for new instances;

 stdout: |-
    Socket file  could not be found, which means Rados Gateway is not running. Showing ceph-rgw unit logs now:
    -- Logs begin at Mon 2020-06-29 06:58:57 EDT, end at Mon 2020-06-29 07:28:35 EDT. --
    Jun 29 07:27:00 ceph4osd3.aydin.lab systemd[1]: Failed to load environment files: No such file or directory
    Jun 29 07:27:00 ceph4osd3.aydin.lab systemd[1]: ceph-radosgw.rgw1.service failed to run 'start-pre' task: No such file or directory
    Jun 29 07:27:00 ceph4osd3.aydin.lab systemd[1]: Failed to start Ceph RGW.
    Jun 29 07:27:00 ceph4osd3.aydin.lab systemd[1]: Unit ceph-radosgw.rgw1.service entered failed state.
    Jun 29 07:27:00 ceph4osd3.aydin.lab systemd[1]: ceph-radosgw.rgw1.service failed.
    Jun 29 07:27:10 ceph4osd3.aydin.lab systemd[1]: ceph-radosgw.rgw1.service holdoff time over, scheduling restart.
    Jun 29 07:27:10 ceph4osd3.aydin.lab systemd[1]: Stopped Ceph RGW.
    Jun 29 07:27:10 ceph4osd3.aydin.lab systemd[1]: Failed to load environment files: No such file or directory
    Jun 29 07:27:10 ceph4osd3.aydin.lab systemd[1]: ceph-radosgw.rgw1.service failed to run 'start-pre' task: No such file or directory
    Jun 29 07:27:10 ceph4osd3.aydin.lab systemd[1]: Failed to start Ceph RGW.
    Jun 29 07:27:10 ceph4osd3.aydin.lab systemd[1]: Unit ceph-radosgw.rgw1.service entered failed state.
    Jun 29 07:27:10 ceph4osd3.aydin.lab systemd[1]: ceph-radosgw.rgw1.service failed.


Version-Release number of selected component (if applicable):
RHCS 4.1

How reproducible:
Always

Steps to Reproduce:
- Install the ceph cluster w/o setting the radosgw_num_instances parameter at all.yml
- Add the "radosgw_num_instances: 2" variable to all.yaml
 

Actual results:

 stdout: |-
    Socket file  could not be found, which means Rados Gateway is not running. Showing ceph-rgw unit logs now:
    -- Logs begin at Mon 2020-06-29 06:58:57 EDT, end at Mon 2020-06-29 07:28:35 EDT. --
    Jun 29 07:27:00 ceph4osd3.aydin.lab systemd[1]: Failed to load environment files: No such file or directory
    Jun 29 07:27:00 ceph4osd3.aydin.lab systemd[1]: ceph-radosgw.rgw1.service failed to run 'start-pre' task: No such file or directory
    Jun 29 07:27:00 ceph4osd3.aydin.lab systemd[1]: Failed to start Ceph RGW.
    Jun 29 07:27:00 ceph4osd3.aydin.lab systemd[1]: Unit ceph-radosgw.rgw1.service entered failed state.
    Jun 29 07:27:00 ceph4osd3.aydin.lab systemd[1]: ceph-radosgw.rgw1.service failed.
    Jun 29 07:27:10 ceph4osd3.aydin.lab systemd[1]: ceph-radosgw.rgw1.service holdoff time over, scheduling restart.
    Jun 29 07:27:10 ceph4osd3.aydin.lab systemd[1]: Stopped Ceph RGW.
    Jun 29 07:27:10 ceph4osd3.aydin.lab systemd[1]: Failed to load environment files: No such file or directory
    Jun 29 07:27:10 ceph4osd3.aydin.lab systemd[1]: ceph-radosgw.rgw1.service failed to run 'start-pre' task: No such file or directory
    Jun 29 07:27:10 ceph4osd3.aydin.lab systemd[1]: Failed to start Ceph RGW.
    Jun 29 07:27:10 ceph4osd3.aydin.lab systemd[1]: Unit ceph-radosgw.rgw1.service entered failed state.
    Jun 29 07:27:10 ceph4osd3.aydin.lab systemd[1]: ceph-radosgw.rgw1.service failed.

Expected results:

- Second instances are provisioned successfuly at the first attempt

Additional info:
- When running the site-docker.yaml again w/o changing anything it succeeds.

Comment 1 Mustafa Aydın 2020-06-29 11:56:50 UTC
Created attachment 1699123 [details]
ansible logs from management VM

Comment 2 Mustafa Aydın 2020-06-29 12:01:37 UTC
Created attachment 1699124 [details]
sosreport from one of the rgw nodes

Comment 14 errata-xmlrpc 2020-09-30 17:26:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 4.1 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4144


Note You need to log in before you can comment on or make changes to this bug.