1656935 – ceph-ansible: purge-cluster.yml fails when initiated second time

Bug 1656935 - ceph-ansible: purge-cluster.yml fails when initiated second time

Summary: ceph-ansible: purge-cluster.yml fails when initiated second time

Keywords:
Status:	CLOSED NEXTRELEASE
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	Ceph-Ansible
Sub Component:
Version:	3.2
Hardware:	Unspecified
OS:	Linux
Priority:	low
Severity:	low
Target Milestone:	rc
Target Release:	3.3
Assignee:	Guillaume Abrioux
QA Contact:	Vasishta
Docs Contact:	John Brier
URL:
Whiteboard:
Depends On:
Blocks:	1722663
TreeView+	depends on / blocked

Reported:	2018-12-06 17:09 UTC by Tiffany Nguyen
Modified:	2019-06-20 22:39 UTC (History)
CC List:	18 users (show)
Fixed In Version:	RHEL: ceph-ansible-3.2.12-1.el7cp Ubuntu: ceph-ansible_3.2.12-2redhat1
Doc Type:	Bug Fix
Doc Text:	.The `ceph-ansible purge-cluster.yml` playbook no longer fails when run against a cluster that has already been purged Previously, the `ceph-ansible purge-cluster.yml` playbook failed when run against a cluster that had already been purged. This was because `ceph-volume` had been removed during the first run, and the command could no longer be found. With this update, the underlying issue has been fixed, and running `ceph-ansible purge-cluster.yml` for a second time no longer fails.
Clone Of:
Clones:	1722663 (view as bug list)
Environment:
Last Closed:	2019-06-20 22:39:36 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
ansible-playbook log (25.84 KB, text/plain) 2018-12-06 17:09 UTC, Tiffany Nguyen	no flags	Details
File contains log snippets (642.37 KB, text/plain) 2019-04-24 10:54 UTC, Vasishta	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Github	ceph ceph-ansible pull 3436	None	closed	purge-cluster: skip tasks that use ceph-volume if it's not installed	2020-10-30 11:21:34 UTC
Github	ceph ceph-ansible pull 3552	None	closed	purge-cluster: do not use sudo when checking if ceph-volume exists	2020-10-30 11:21:19 UTC
Red Hat Product Errata	RHSA-2019:0911	None	None	None	2019-04-30 15:57:00 UTC

Description Tiffany Nguyen 2018-12-06 17:09:09 UTC

Created attachment 1512212 [details]
ansible-playbook log

Description of problem:
purge-cluster.yml fails when initiated second time:

TASK [zap and destroy osds created by ceph-volume with devices] ********************************************************************************
Thursday 06 December 2018  16:30:57 +0000 (0:00:00.142)       0:00:14.177 ***** 
failed: [mero005] (item=/dev/sda) => {"changed": false, "cmd": "ceph-volume lvm zap --destroy /dev/sda", "item": "/dev/sda", "msg": "[Errno 2] No such file or directory", "rc": 2}
....


Version-Release number of selected component (if applicable):
ceph-ansible: 3.2.0-0.1.rc8.el7cp

How reproducible:
1. Deploy cluster with 3.2.0-0.1.rc8.el7cp
2. Perform purge-cluster.yml 2 times: first purge works as expected, 2nd purge fails at task zap and destroy osds.

Expected results:
Task should be skipped

Comment 3 seb 2018-12-11 09:56:42 UTC

Does /dev/sda exist on your system?

Comment 4 Andrew Schoen 2018-12-11 15:04:46 UTC

I think this is happening because on the second purge ceph-volume doesn't exist on the system anymore. The playbook will need to be modified to skip that task if ceph-volume isn't installed.

Comment 5 seb 2018-12-11 15:26:42 UTC

If you reached the point where packages have been uninstalled then it's pointless to run purge a second time, but we can probably handle that error more elegantly.

Comment 6 Tiffany Nguyen 2018-12-13 17:53:41 UTC

With latest build 3.2.0-1.el7cp, I don't see the failure any more. It is now skipping the task:

TASK [zap and destroy osds created by ceph-volume with devices] ******************************************************
Thursday 13 December 2018  17:43:38 +0000 (0:00:00.160)       0:00:13.161 ***** 
skipping: [mero005] => (item=/dev/sda) 
skipping: [mero005] => (item=/dev/sdb) 
skipping: [mero005] => (item=/dev/sdc) 
skipping: [mero005] => (item=/dev/sdd)

Did we check in the fix for this in latest build?

Comment 8 Ken Dreyer (Red Hat) 2019-01-07 21:18:47 UTC

https://github.com/ceph/ceph-ansible/pull/3446 is now available in ceph-ansible v3.2.1.

Comment 13 Vasishta 2019-01-14 06:16:39 UTC

Hi,

lvms are not removed from ubuntu machines when playbook is initiated as non-root user as the 'command' in the task "see if ceph-volume is installed" (newly introduced as part of PR 3436) not working as expected.

log - https://bugzilla.redhat.com/attachment.cgi?id=1520023

ansible user- ubuntu 
$ sudo cat /etc/sudoers.d/ubuntu
ubuntu ALL = (root) NOPASSWD:ALL
ubuntu@magna029:~$ ls -l /etc/sudoers.d/ubuntu
-r--r----- 1 root root 33 Jan 11 06:57 /etc/sudoers.d/ubuntu

When tried 'command' manually as ansible user -
$ sudo command -v ceph-volume
sudo: command: command not found

Moving back to ASSIGNED state, request you to kindly look int this and let me know your views.

Regards,
Vasishta shastry
QE, Ceph

Comment 16 Sébastien Han 2019-01-16 09:13:12 UTC

Yes, the fix is part of 3.2.3. Thanks

Comment 17 Vasishta 2019-01-16 09:59:10 UTC

Hi Andrew,

Request you to kindly provide your views on Comment 13
The fix for this BZ is blocking us to VERIFY the fix for Bug 1653307 .

Comment 25 Vasishta 2019-04-24 10:54:18 UTC

Created attachment 1558140 [details]
File contains log snippets

(Attachment contains failure log snippet, inventory with lvm_volumes argument, lsblk before second run, playbook log)

playbook is failing when initiated for second time if there were any existing lvs which were not part of the cluster.

Moving back to ASSIGNED state, reducing severity to low.

Regards,
Vasishta Shastry
QE, Ceph

Comment 31 errata-xmlrpc 2019-04-30 15:56:43 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:0911

Note You need to log in before you can comment on or make changes to this bug.