Bug 1337287 - ceph-installer tasks can collide
Summary: ceph-installer tasks can collide
Alias: None
Product: Red Hat Storage Console
Classification: Red Hat
Component: ceph-installer
Version: 2
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 2
Assignee: Andrew Schoen
QA Contact: ceph-qe-bugs
Depends On:
TreeView+ depends on / blocked
Reported: 2016-05-18 17:35 UTC by Ken Dreyer (Red Hat)
Modified: 2016-08-23 19:51 UTC (History)
7 users (show)

Fixed In Version: RHEL: ceph-installer-1.0.11-1.el7cp
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2016-08-23 19:51:04 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2016:1754 0 normal SHIPPED_LIVE New packages: Red Hat Storage Console 2.0 2017-04-18 19:09:06 UTC

Description Ken Dreyer (Red Hat) 2016-05-18 17:35:05 UTC
Description of problem:
Prior to version v1.0.11, ceph-installer could start too many celery workers. This would lead to race conditions where multiple tasks could be running at the same time.

One way that this manifests itself is if an application submits an "/api/osd/install" task and then an "/api/osd/configure" task immediately afterwards. The "/api/osd/install" task will run in Worker-1, while the "/api/osd/configure" task will run in Worker-2, and Worker-2's task will error because Worker-1 has not yet finished installing the ceph-osd packages.

One workaround would be for client applications (USM) to *always* check the status before submitting tasks that depend on each other. I'm not sure if USM always does this, so it's safer to just restrict the number of workers.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Start with a RHEL system with multiple processors (ie /proc/cpuinfo shows multiple processors)
2. sudo yum install ceph-installer
3. sudo systemctl status ceph-installer-celery

Actual results:
systemd shows that more than two celery PIDs are running, for example:

   CGroup: /system.slice/ceph-installer-celery.service
           ├─10088 /usr/bin/python /usr/bin/celery -A async worker --loglevel...
           ├─10180 /usr/bin/python /usr/bin/celery -A async worker --loglevel...
           └─10184 /usr/bin/python /usr/bin/celery -A async worker --loglevel...

Expected results:
systemd should always show only two celery PIDs:

   CGroup: /system.slice/ceph-installer-celery.service
           ├─15317 /usr/bin/python /usr/bin/celery -A async worker --loglevel...
           └─15334 /usr/bin/python /usr/bin/celery -A async worker --loglevel...

Additional info:
This is fixed upstream in v1.0.11: http://docs.ceph.com/ceph-installer/docs/changelog.html#v1-0-11-2016-05-18

Comment 3 Ken Dreyer (Red Hat) 2016-07-29 13:05:11 UTC
Have not seen this in any smoke test for a while now -> VERIFIED

Comment 5 errata-xmlrpc 2016-08-23 19:51:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.