Bug 1656935 - ceph-ansible: purge-cluster.yml fails when initiated second time
Summary: ceph-ansible: purge-cluster.yml fails when initiated second time
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: Ceph-Ansible
Version: 3.2
Hardware: Unspecified
OS: Linux
Target Milestone: rc
: 3.3
Assignee: Guillaume Abrioux
QA Contact: Vasishta
John Brier
Depends On:
Blocks: 1722663
TreeView+ depends on / blocked
Reported: 2018-12-06 17:09 UTC by Tiffany Nguyen
Modified: 2019-06-20 22:39 UTC (History)
18 users (show)

Fixed In Version: RHEL: ceph-ansible-3.2.12-1.el7cp Ubuntu: ceph-ansible_3.2.12-2redhat1
Doc Type: Bug Fix
Doc Text:
.The `ceph-ansible purge-cluster.yml` playbook no longer fails when run against a cluster that has already been purged Previously, the `ceph-ansible purge-cluster.yml` playbook failed when run against a cluster that had already been purged. This was because `ceph-volume` had been removed during the first run, and the command could no longer be found. With this update, the underlying issue has been fixed, and running `ceph-ansible purge-cluster.yml` for a second time no longer fails.
Clone Of:
: 1722663 (view as bug list)
Last Closed: 2019-06-20 22:39:36 UTC
Target Upstream Version:

Attachments (Terms of Use)
ansible-playbook log (25.84 KB, text/plain)
2018-12-06 17:09 UTC, Tiffany Nguyen
no flags Details
File contains log snippets (642.37 KB, text/plain)
2019-04-24 10:54 UTC, Vasishta
no flags Details

System ID Private Priority Status Summary Last Updated
Github ceph ceph-ansible pull 3436 0 None closed purge-cluster: skip tasks that use ceph-volume if it's not installed 2020-10-30 11:21:34 UTC
Github ceph ceph-ansible pull 3552 0 None closed purge-cluster: do not use sudo when checking if ceph-volume exists 2020-10-30 11:21:19 UTC
Red Hat Product Errata RHSA-2019:0911 0 None None None 2019-04-30 15:57:00 UTC

Description Tiffany Nguyen 2018-12-06 17:09:09 UTC
Created attachment 1512212 [details]
ansible-playbook log

Description of problem:
purge-cluster.yml fails when initiated second time:

TASK [zap and destroy osds created by ceph-volume with devices] ********************************************************************************
Thursday 06 December 2018  16:30:57 +0000 (0:00:00.142)       0:00:14.177 ***** 
failed: [mero005] (item=/dev/sda) => {"changed": false, "cmd": "ceph-volume lvm zap --destroy /dev/sda", "item": "/dev/sda", "msg": "[Errno 2] No such file or directory", "rc": 2}

Version-Release number of selected component (if applicable):
ceph-ansible: 3.2.0-0.1.rc8.el7cp

How reproducible:
1. Deploy cluster with 3.2.0-0.1.rc8.el7cp
2. Perform purge-cluster.yml 2 times: first purge works as expected, 2nd purge fails at task zap and destroy osds.

Expected results:
Task should be skipped

Comment 3 seb 2018-12-11 09:56:42 UTC
Does /dev/sda exist on your system?

Comment 4 Andrew Schoen 2018-12-11 15:04:46 UTC
I think this is happening because on the second purge ceph-volume doesn't exist on the system anymore. The playbook will need to be modified to skip that task if ceph-volume isn't installed.

Comment 5 seb 2018-12-11 15:26:42 UTC
If you reached the point where packages have been uninstalled then it's pointless to run purge a second time, but we can probably handle that error more elegantly.

Comment 6 Tiffany Nguyen 2018-12-13 17:53:41 UTC
With latest build 3.2.0-1.el7cp, I don't see the failure any more. It is now skipping the task:

TASK [zap and destroy osds created by ceph-volume with devices] ******************************************************
Thursday 13 December 2018  17:43:38 +0000 (0:00:00.160)       0:00:13.161 ***** 
skipping: [mero005] => (item=/dev/sda) 
skipping: [mero005] => (item=/dev/sdb) 
skipping: [mero005] => (item=/dev/sdc) 
skipping: [mero005] => (item=/dev/sdd)

Did we check in the fix for this in latest build?

Comment 8 Ken Dreyer (Red Hat) 2019-01-07 21:18:47 UTC
https://github.com/ceph/ceph-ansible/pull/3446 is now available in ceph-ansible v3.2.1.

Comment 13 Vasishta 2019-01-14 06:16:39 UTC

lvms are not removed from ubuntu machines when playbook is initiated as non-root user as the 'command' in the task "see if ceph-volume is installed" (newly introduced as part of PR 3436) not working as expected.

log - https://bugzilla.redhat.com/attachment.cgi?id=1520023

ansible user- ubuntu 
$ sudo cat /etc/sudoers.d/ubuntu
ubuntu ALL = (root) NOPASSWD:ALL
ubuntu@magna029:~$ ls -l /etc/sudoers.d/ubuntu
-r--r----- 1 root root 33 Jan 11 06:57 /etc/sudoers.d/ubuntu

When tried 'command' manually as ansible user -
$ sudo command -v ceph-volume
sudo: command: command not found

Moving back to ASSIGNED state, request you to kindly look int this and let me know your views.

Vasishta shastry
QE, Ceph

Comment 16 Sébastien Han 2019-01-16 09:13:12 UTC
Yes, the fix is part of 3.2.3. Thanks

Comment 17 Vasishta 2019-01-16 09:59:10 UTC
Hi Andrew,

Request you to kindly provide your views on Comment 13
The fix for this BZ is blocking us to VERIFY the fix for Bug 1653307 .

Comment 25 Vasishta 2019-04-24 10:54:18 UTC
Created attachment 1558140 [details]
File contains log snippets

(Attachment contains failure log snippet, inventory with lvm_volumes argument, lsblk before second run, playbook log)

playbook is failing when initiated for second time if there were any existing lvs which were not part of the cluster.

Moving back to ASSIGNED state, reducing severity to low.

Vasishta Shastry
QE, Ceph

Comment 31 errata-xmlrpc 2019-04-30 15:56:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.