Bug 1590628 - ceph-ansible fails at the ceph-create-keys removal stage
Summary: ceph-ansible fails at the ceph-create-keys removal stage
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Ceph-Ansible
Version: 3.1
Hardware: Unspecified
OS: Unspecified
low
medium
Target Milestone: rc
: 3.1
Assignee: Sébastien Han
QA Contact: Yogev Rabl
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-06-13 05:06 UTC by Dan Macpherson
Modified: 2019-10-24 05:38 UTC (History)
11 users (show)

Fixed In Version: RHEL: ceph-ansible-3.1.0-0.1.rc12.el7cp Ubuntu: ceph-ansible_3.1.0~rc12-2redhat1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-09-26 18:21:59 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph-ansible pull 2830 0 None None None 2018-06-29 10:12:44 UTC
Red Hat Product Errata RHBA-2018:2819 0 None None None 2018-09-26 18:23:08 UTC

Description Dan Macpherson 2018-06-13 05:06:56 UTC
Description of problem:
I was running a test upgrade of OSP12/Ceph 2 to OSP13/Ceph 3. I hit an issue at the Ceph Storage upgrade stage. The ceph-ansible playbook fails at the following task:

TASK [ceph-client : kill a dummy container that created pool(s)/key(s)] ********
Wednesday 13 June 2018  14:32:19 +1000 (0:00:00.028)       0:05:17.843 ******** 
fatal: [192.0.2.107]: FAILED! => {"changed": false, "cmd": ["docker", "rm", "-f", "ceph-create-keys"], "delta": "0:00:00.019216", "end": "2018-06-13 04:32:19.560333", "msg": "non-zero return code", "rc": 1, "start": "2018-06-13 04:32:19.541117", "stderr": "Error response from daemon: No such container: ceph-create-keys", "stderr_lines": ["Error response from daemon: No such container: ceph-create-keys"], "stdout": "", "stdout_lines": []}

I think the container already gets removed due to the change introduced with this BZ:

https://bugzilla.redhat.com/show_bug.cgi?id=1568157

So when the command runs, it fails because the container has already been removed due to the introduced --rm option in the previous task.

Have tested without the "kill a dummy container" task and it seems to work.

Version-Release number of selected component (if applicable):
3.1.0-0.1.rc8.el7cp

How reproducible:
Always

Steps to Reproduce:
1. Upgrade an OSP12 with Ceph 2 to OSP13 with Ceph 3
2. Get to the Ceph upgrade stage
3. Wait until the Ceph upgrade step fails
4. Check the output of the mistral workflow execution and spot the failed step (see above)

Actual results:
My face -> :(

Expected results:
My face -> :)

Additional info:

Comment 3 Ken Dreyer (Red Hat) 2018-07-23 21:33:12 UTC
will be in the next upstream stable-3.1 tag after v3.1.0rc10

Comment 11 errata-xmlrpc 2018-09-26 18:21:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2819


Note You need to log in before you can comment on or make changes to this bug.