Bug 1851906

Summary: [Ceph] Increasing the number of RGW instances from 1 to 2 fails with "Failed to load environment files: No such file or directory"
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Mustafa Aydın <maydin>
Component: Ceph-AnsibleAssignee: Guillaume Abrioux <gabrioux>
Status: CLOSED ERRATA QA Contact: Sunil Kumar Nagaraju <sunnagar>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.1CC: aschoen, ceph-eng-bugs, gmeno, gsitlani, lithomas, nthomas, sunnagar, tserlin, vumrao, ykaul
Target Milestone: z2   
Target Release: 4.1   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: ceph-ansible-4.0.29-1.el8cp, ceph-ansible-4.0.29-1.el7cp Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-09-30 17:26:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
ansible logs from management VM
none
sosreport from one of the rgw nodes none

Description Mustafa Aydın 2020-06-29 11:42:20 UTC
Description of problem:

Increasing the number of RGW instances from "1" to "2" with changinf the parameter radosgw_num_instances to  "2". It fails with the following error logs for new instances;

 stdout: |-
    Socket file  could not be found, which means Rados Gateway is not running. Showing ceph-rgw unit logs now:
    -- Logs begin at Mon 2020-06-29 06:58:57 EDT, end at Mon 2020-06-29 07:28:35 EDT. --
    Jun 29 07:27:00 ceph4osd3.aydin.lab systemd[1]: Failed to load environment files: No such file or directory
    Jun 29 07:27:00 ceph4osd3.aydin.lab systemd[1]: ceph-radosgw.rgw1.service failed to run 'start-pre' task: No such file or directory
    Jun 29 07:27:00 ceph4osd3.aydin.lab systemd[1]: Failed to start Ceph RGW.
    Jun 29 07:27:00 ceph4osd3.aydin.lab systemd[1]: Unit ceph-radosgw.rgw1.service entered failed state.
    Jun 29 07:27:00 ceph4osd3.aydin.lab systemd[1]: ceph-radosgw.rgw1.service failed.
    Jun 29 07:27:10 ceph4osd3.aydin.lab systemd[1]: ceph-radosgw.rgw1.service holdoff time over, scheduling restart.
    Jun 29 07:27:10 ceph4osd3.aydin.lab systemd[1]: Stopped Ceph RGW.
    Jun 29 07:27:10 ceph4osd3.aydin.lab systemd[1]: Failed to load environment files: No such file or directory
    Jun 29 07:27:10 ceph4osd3.aydin.lab systemd[1]: ceph-radosgw.rgw1.service failed to run 'start-pre' task: No such file or directory
    Jun 29 07:27:10 ceph4osd3.aydin.lab systemd[1]: Failed to start Ceph RGW.
    Jun 29 07:27:10 ceph4osd3.aydin.lab systemd[1]: Unit ceph-radosgw.rgw1.service entered failed state.
    Jun 29 07:27:10 ceph4osd3.aydin.lab systemd[1]: ceph-radosgw.rgw1.service failed.


Version-Release number of selected component (if applicable):
RHCS 4.1

How reproducible:
Always

Steps to Reproduce:
- Install the ceph cluster w/o setting the radosgw_num_instances parameter at all.yml
- Add the "radosgw_num_instances: 2" variable to all.yaml
 

Actual results:

 stdout: |-
    Socket file  could not be found, which means Rados Gateway is not running. Showing ceph-rgw unit logs now:
    -- Logs begin at Mon 2020-06-29 06:58:57 EDT, end at Mon 2020-06-29 07:28:35 EDT. --
    Jun 29 07:27:00 ceph4osd3.aydin.lab systemd[1]: Failed to load environment files: No such file or directory
    Jun 29 07:27:00 ceph4osd3.aydin.lab systemd[1]: ceph-radosgw.rgw1.service failed to run 'start-pre' task: No such file or directory
    Jun 29 07:27:00 ceph4osd3.aydin.lab systemd[1]: Failed to start Ceph RGW.
    Jun 29 07:27:00 ceph4osd3.aydin.lab systemd[1]: Unit ceph-radosgw.rgw1.service entered failed state.
    Jun 29 07:27:00 ceph4osd3.aydin.lab systemd[1]: ceph-radosgw.rgw1.service failed.
    Jun 29 07:27:10 ceph4osd3.aydin.lab systemd[1]: ceph-radosgw.rgw1.service holdoff time over, scheduling restart.
    Jun 29 07:27:10 ceph4osd3.aydin.lab systemd[1]: Stopped Ceph RGW.
    Jun 29 07:27:10 ceph4osd3.aydin.lab systemd[1]: Failed to load environment files: No such file or directory
    Jun 29 07:27:10 ceph4osd3.aydin.lab systemd[1]: ceph-radosgw.rgw1.service failed to run 'start-pre' task: No such file or directory
    Jun 29 07:27:10 ceph4osd3.aydin.lab systemd[1]: Failed to start Ceph RGW.
    Jun 29 07:27:10 ceph4osd3.aydin.lab systemd[1]: Unit ceph-radosgw.rgw1.service entered failed state.
    Jun 29 07:27:10 ceph4osd3.aydin.lab systemd[1]: ceph-radosgw.rgw1.service failed.

Expected results:

- Second instances are provisioned successfuly at the first attempt

Additional info:
- When running the site-docker.yaml again w/o changing anything it succeeds.

Comment 1 Mustafa Aydın 2020-06-29 11:56:50 UTC
Created attachment 1699123 [details]
ansible logs from management VM

Comment 2 Mustafa Aydın 2020-06-29 12:01:37 UTC
Created attachment 1699124 [details]
sosreport from one of the rgw nodes

Comment 14 errata-xmlrpc 2020-09-30 17:26:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 4.1 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4144