Bug 1590628

Summary: ceph-ansible fails at the ceph-create-keys removal stage
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Dan Macpherson <dmacpher>
Component: Ceph-AnsibleAssignee: Sébastien Han <shan>
Status: CLOSED ERRATA QA Contact: Yogev Rabl <yrabl>
Severity: medium Docs Contact:
Priority: low    
Version: 3.1CC: anharris, aschoen, ceph-eng-bugs, gmeno, hnallurv, kdreyer, nthomas, sankarshan, seb, tserlin, yrabl
Target Milestone: rc   
Target Release: 3.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: RHEL: ceph-ansible-3.1.0-0.1.rc12.el7cp Ubuntu: ceph-ansible_3.1.0~rc12-2redhat1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-09-26 18:21:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Dan Macpherson 2018-06-13 05:06:56 UTC
Description of problem:
I was running a test upgrade of OSP12/Ceph 2 to OSP13/Ceph 3. I hit an issue at the Ceph Storage upgrade stage. The ceph-ansible playbook fails at the following task:

TASK [ceph-client : kill a dummy container that created pool(s)/key(s)] ********
Wednesday 13 June 2018  14:32:19 +1000 (0:00:00.028)       0:05:17.843 ******** 
fatal: [192.0.2.107]: FAILED! => {"changed": false, "cmd": ["docker", "rm", "-f", "ceph-create-keys"], "delta": "0:00:00.019216", "end": "2018-06-13 04:32:19.560333", "msg": "non-zero return code", "rc": 1, "start": "2018-06-13 04:32:19.541117", "stderr": "Error response from daemon: No such container: ceph-create-keys", "stderr_lines": ["Error response from daemon: No such container: ceph-create-keys"], "stdout": "", "stdout_lines": []}

I think the container already gets removed due to the change introduced with this BZ:

https://bugzilla.redhat.com/show_bug.cgi?id=1568157

So when the command runs, it fails because the container has already been removed due to the introduced --rm option in the previous task.

Have tested without the "kill a dummy container" task and it seems to work.

Version-Release number of selected component (if applicable):
3.1.0-0.1.rc8.el7cp

How reproducible:
Always

Steps to Reproduce:
1. Upgrade an OSP12 with Ceph 2 to OSP13 with Ceph 3
2. Get to the Ceph upgrade stage
3. Wait until the Ceph upgrade step fails
4. Check the output of the mistral workflow execution and spot the failed step (see above)

Actual results:
My face -> :(

Expected results:
My face -> :)

Additional info:

Comment 3 Ken Dreyer (Red Hat) 2018-07-23 21:33:12 UTC
will be in the next upstream stable-3.1 tag after v3.1.0rc10

Comment 11 errata-xmlrpc 2018-09-26 18:21:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2819