Bug 1337287

Summary: ceph-installer tasks can collide
Product: [Red Hat Storage] Red Hat Storage Console Reporter: Ken Dreyer (Red Hat) <kdreyer>
Component: ceph-installerAssignee: Andrew Schoen <aschoen>
Status: CLOSED ERRATA QA Contact: ceph-qe-bugs <ceph-qe-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 2CC: adeza, amaredia, aschoen, ceph-eng-bugs, nthomas, sankarshan, vakulkar
Target Milestone: ---   
Target Release: 2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: RHEL: ceph-installer-1.0.11-1.el7cp Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-08-23 19:51:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ken Dreyer (Red Hat) 2016-05-18 17:35:05 UTC
Description of problem:
Prior to version v1.0.11, ceph-installer could start too many celery workers. This would lead to race conditions where multiple tasks could be running at the same time.

One way that this manifests itself is if an application submits an "/api/osd/install" task and then an "/api/osd/configure" task immediately afterwards. The "/api/osd/install" task will run in Worker-1, while the "/api/osd/configure" task will run in Worker-2, and Worker-2's task will error because Worker-1 has not yet finished installing the ceph-osd packages.

One workaround would be for client applications (USM) to *always* check the status before submitting tasks that depend on each other. I'm not sure if USM always does this, so it's safer to just restrict the number of workers.

Version-Release number of selected component (if applicable):
ceph-installer-1.0.10-1.el7scon

How reproducible:
always

Steps to Reproduce:
1. Start with a RHEL system with multiple processors (ie /proc/cpuinfo shows multiple processors)
2. sudo yum install ceph-installer
3. sudo systemctl status ceph-installer-celery

Actual results:
systemd shows that more than two celery PIDs are running, for example:

   CGroup: /system.slice/ceph-installer-celery.service
           ├─10088 /usr/bin/python /usr/bin/celery -A async worker --loglevel...
           ├─10180 /usr/bin/python /usr/bin/celery -A async worker --loglevel...
           └─10184 /usr/bin/python /usr/bin/celery -A async worker --loglevel...


Expected results:
systemd should always show only two celery PIDs:

   CGroup: /system.slice/ceph-installer-celery.service
           ├─15317 /usr/bin/python /usr/bin/celery -A async worker --loglevel...
           └─15334 /usr/bin/python /usr/bin/celery -A async worker --loglevel...


Additional info:
This is fixed upstream in v1.0.11: http://docs.ceph.com/ceph-installer/docs/changelog.html#v1-0-11-2016-05-18

Comment 3 Ken Dreyer (Red Hat) 2016-07-29 13:05:11 UTC
Have not seen this in any smoke test for a while now -> VERIFIED

Comment 5 errata-xmlrpc 2016-08-23 19:51:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2016:1754