Bug 1271873 - Radosgw does not start on Rhel 7.1
Radosgw does not start on Rhel 7.1
Status: CLOSED WONTFIX
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: RGW (Show other bugs)
1.2.3
Unspecified Unspecified
unspecified Severity unspecified
: rc
: 1.3.2
Assigned To: Yehuda Sadeh
ceph-qe-bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-10-14 20:11 EDT by Warren
Modified: 2017-07-30 11:43 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-12-11 15:06:52 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
/var/log/messages output (1.21 KB, text/plain)
2015-10-15 23:03 EDT, Warren
no flags Details

  None (edit)
Description Warren 2015-10-14 20:11:02 EDT
Description of problem:
sudo systemctl start radosgw
Warning: Unit file of radosgw.service changed on disk, 'systemctl daemon-reload' recommended.
Job for ceph-radosgw.service failed. See 'systemctl status ceph-radosgw.service' and 'journalctl -xn' for details.


Version-Release number of selected component (if applicable):
Ceph 1.2.3, Rhel 7.1
ceph version 0.80.8 (69eaad7f8308f21573c604f121956e64679a52a7)

How reproducible:
100% of the time

Steps to Reproduce:
1. Bring up ceph.
2. Follow the steps in: https://access.redhat.com/documentation/en/red-hat-ceph-storage/version-1.2.3/red-hat-ceph-storage-123-ceph-object-gateway-for-rhel-x86-64/chapter-1-install-ceph-object-gateway/
3. Follow the steps in: https://access.redhat.com/documentation/en/red-hat-ceph-storage/version-1.2.3/red-hat-ceph-storage-123-ceph-object-gateway-for-rhel-x86-64/chapter-2-configure-ceph-object-gateway/

When you get to step 8, 'sudo systemctl start radosgw', you see the problem described above

ps shows that radosgw is not running

Actual results:
Above message, and radosgw is not running.

Expected results:
No message, and radosgw running

Additional info:

sudo systemctl status ceph-radosgw.service -l showed:

ceph-radosgw.service - LSB: radosgw RESTful rados gateway
   Loaded: loaded (/etc/rc.d/init.d/ceph-radosgw)
   Active: failed (Result: exit-code) since Wed 2015-10-14 19:50:07 EDT; 2s ago
  Process: 25928 ExecStart=/etc/rc.d/init.d/ceph-radosgw start (code=exited, status=1/FAILURE)

Oct 14 19:50:06 magna070 systemd[1]: Starting LSB: radosgw RESTful rados gateway...
Oct 14 19:50:06 magna070 ceph-radosgw[25928]: Starting radosgw instance(s)...
Oct 14 19:50:06 magna070 ceph-radosgw[25928]: Running as unit run-25942.service.
Oct 14 19:50:06 magna070 ceph-radosgw[25928]: Starting client.radosgw.gateway...
Oct 14 19:50:07 magna070 ceph-radosgw[25928]: /bin/radosgw is not running.
Oct 14 19:50:07 magna070 systemd[1]: ceph-radosgw.service: control process exited, code=exited status=1
Oct 14 19:50:07 magna070 systemd[1]: Failed to start LSB: radosgw RESTful rados gateway.
Oct 14 19:50:07 magna070 systemd[1]: Unit ceph-radosgw.service entered failed state.

So I changed /etc/rc.d/init.d/ceph-radosgw to run /bin/bash -x, reran the start command, and got:

Oct 14 19:52:03 magna070 ceph-radosgw: + '[' '!' -e /var/log/radosgw/client.radosgw.gateway.log ']'
Oct 14 19:52:03 magna070 ceph-radosgw: + '[' 1 -eq 1 ']'
Oct 14 19:52:03 magna070 ceph-radosgw: + systemd-run -r sudo -u apache bash -c 'ulimit -n 32768; /bin/radosgw -n client.radosgw.gateway'
Oct 14 19:52:03 magna070 ceph-radosgw: Running as unit run-26498.service.
Oct 14 19:52:03 magna070 ceph-radosgw: + echo 'Starting client.radosgw.gateway...'
Oct 14 19:52:03 magna070 ceph-radosgw: Starting client.radosgw.gateway...
Oct 14 19:52:03 magna070 ceph-radosgw: + daemon_is_running /bin/radosgw
Oct 14 19:52:03 magna070 ceph-radosgw: + daemon=/bin/radosgw
Oct 14 19:52:03 magna070 ceph-radosgw: + sleep 1
Oct 14 19:52:03 magna070 systemd: Starting /bin/sudo -u apache bash -c ulimit -n 32768; /bin/radosgw -n client.radosgw.gateway...
Oct 14 19:52:03 magna070 systemd: Started /bin/sudo -u apache bash -c ulimit -n 32768; /bin/radosgw -n client.radosgw.gateway.
Oct 14 19:52:03 magna070 sudo: bash: line 0: ulimit: open files: cannot modify limit: Operation not permitted
Oct 14 19:52:04 magna070 ceph-radosgw: + pidof /bin/radosgw
Oct 14 19:52:04 magna070 ceph-radosgw: + echo '/bin/radosgw is not running.'
Oct 14 19:52:04 magna070 ceph-radosgw: /bin/radosgw is not running.
Oct 14 19:52:04 magna070 ceph-radosgw: + exit 1
Oct 14 19:52:04 magna070 systemd: ceph-radosgw.service: control process exited, code=exited status=1
Oct 14 19:52:04 magna070 systemd: Failed to start LSB: radosgw RESTful rados gateway.
Oct 14 19:52:04 magna070 systemd: Unit ceph-radosgw.service entered failed state.

So it looks like that ulimit command is the culprit.  This error message does not come up if the command (sudo -u apache bash -c 'ulimit -n 32768; /bin/radosgw -n client.radosgw.gateway') is run manually, but radosgw is still not running afterwards.

I tried the upstream fix in https://github.com/ceph/ceph/pull/5325/files but that did not seem to help.
Comment 2 Ken Dreyer (Red Hat) 2015-10-15 11:44:25 EDT
The ulimit issue was fixed in bz 1184588 for EL6. Maybe it is still broken for EL7.

I logged into magna070 and changed DEFAULT_USER='apache' to DEFAULT_USER='root' (see https://bugzilla.redhat.com/show_bug.cgi?id=1214823)

Making that change allows /bin/radosgw to run, but it immediately stops, with the following log:

[root@magna070 ~]# cat /var/log/radosgw/client.radosgw.gateway.log
2015-10-15 11:36:09.859456 7f26b0be8880  0 ceph version 0.80.8 (69eaad7f8308f21573c604f121956e64679a52a7), process radosgw, pid 14909
2015-10-15 11:36:09.860129 7f26a93f7700 -1 Initialization timeout, failed to initialize

How do we get past this?
Comment 3 Warren 2015-10-15 23:03 EDT
Created attachment 1083478 [details]
/var/log/messages output

Output of /var/log/messages
Comment 4 Warren 2015-10-15 23:09:05 EDT
Comment 2 may have  been the result of an improperly configure rhsm.conf file prior
to the subscription-manager enabling of the rh-common repo.  I believe that I have done this correctly on magna056.

On Magna056, the radosgw start command still failed.  /var/log/messages is in the previously added attachment.  The problem still appears to be around the following
command:

/bin/sudo -u root bash -c ulimit -n 32768; /bin/radosgw -n client.radosgw.gateway

This problem with DEFAULT_USER='apache' and also happens with DEFAULT_USER='root'
Comment 5 Tamil 2015-10-16 12:43:51 EDT
Warren, looks like the ceph-radosgw version still doesnt match with the ceph version and hence the error


[ubuntu@magna056 ~]$ sudo rpm -qa | grep ceph
ceph-radosgw-0.80.8-17.el7cp.x86_64
ceph-osd-0.80.8-5.el7cp.x86_64
iozone-3.424-2_ceph.el7.x86_64
ceph-common-0.80.8-5.el7cp.x86_64
mod_fastcgi-2.4.7-1.ceph.el7.x86_64
ceph-0.80.8-5.el7cp.x86_64
ceph-mon-0.80.8-5.el7cp.x86_64
[ubuntu@magna056 ~]$
Comment 6 Ken Dreyer (Red Hat) 2015-12-11 14:57:18 EST
We're not going to ship any more releases for RHCS 1.2.

In RHCS 1.3, RGW already runs as root, and we don't see the issue on RHCS 1.3...

I think we should close this bug as WONTFIX

Note You need to log in before you can comment on or make changes to this bug.