Bug 1656935

Summary: ceph-ansible: purge-cluster.yml fails when initiated second time
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Tiffany Nguyen <tunguyen>
Component: Ceph-AnsibleAssignee: Guillaume Abrioux <gabrioux>
Status: CLOSED NEXTRELEASE QA Contact: Vasishta <vashastr>
Severity: low Docs Contact: John Brier <jbrier>
Priority: low    
Version: 3.2CC: anharris, aschoen, ceph-eng-bugs, ceph-qe-bugs, edonnell, gabrioux, gmeno, hnallurv, jbrier, kdreyer, nthomas, pasik, sankarshan, seb, shan, tchandra, tserlin, tunguyen
Target Milestone: rcKeywords: Reopened
Target Release: 3.3   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: RHEL: ceph-ansible-3.2.12-1.el7cp Ubuntu: ceph-ansible_3.2.12-2redhat1 Doc Type: Bug Fix
Doc Text:
.The `ceph-ansible purge-cluster.yml` playbook no longer fails when run against a cluster that has already been purged Previously, the `ceph-ansible purge-cluster.yml` playbook failed when run against a cluster that had already been purged. This was because `ceph-volume` had been removed during the first run, and the command could no longer be found. With this update, the underlying issue has been fixed, and running `ceph-ansible purge-cluster.yml` for a second time no longer fails.
Story Points: ---
Clone Of:
: 1722663 (view as bug list) Environment:
Last Closed: 2019-06-20 22:39:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1722663    
Attachments:
Description Flags
ansible-playbook log
none
File contains log snippets none

Description Tiffany Nguyen 2018-12-06 17:09:09 UTC
Created attachment 1512212 [details]
ansible-playbook log

Description of problem:
purge-cluster.yml fails when initiated second time:

TASK [zap and destroy osds created by ceph-volume with devices] ********************************************************************************
Thursday 06 December 2018  16:30:57 +0000 (0:00:00.142)       0:00:14.177 ***** 
failed: [mero005] (item=/dev/sda) => {"changed": false, "cmd": "ceph-volume lvm zap --destroy /dev/sda", "item": "/dev/sda", "msg": "[Errno 2] No such file or directory", "rc": 2}
....


Version-Release number of selected component (if applicable):
ceph-ansible: 3.2.0-0.1.rc8.el7cp

How reproducible:
1. Deploy cluster with 3.2.0-0.1.rc8.el7cp
2. Perform purge-cluster.yml 2 times: first purge works as expected, 2nd purge fails at task zap and destroy osds.

Expected results:
Task should be skipped

Comment 3 seb 2018-12-11 09:56:42 UTC
Does /dev/sda exist on your system?

Comment 4 Andrew Schoen 2018-12-11 15:04:46 UTC
I think this is happening because on the second purge ceph-volume doesn't exist on the system anymore. The playbook will need to be modified to skip that task if ceph-volume isn't installed.

Comment 5 seb 2018-12-11 15:26:42 UTC
If you reached the point where packages have been uninstalled then it's pointless to run purge a second time, but we can probably handle that error more elegantly.

Comment 6 Tiffany Nguyen 2018-12-13 17:53:41 UTC
With latest build 3.2.0-1.el7cp, I don't see the failure any more. It is now skipping the task:

TASK [zap and destroy osds created by ceph-volume with devices] ******************************************************
Thursday 13 December 2018  17:43:38 +0000 (0:00:00.160)       0:00:13.161 ***** 
skipping: [mero005] => (item=/dev/sda) 
skipping: [mero005] => (item=/dev/sdb) 
skipping: [mero005] => (item=/dev/sdc) 
skipping: [mero005] => (item=/dev/sdd)

Did we check in the fix for this in latest build?

Comment 8 Ken Dreyer (Red Hat) 2019-01-07 21:18:47 UTC
https://github.com/ceph/ceph-ansible/pull/3446 is now available in ceph-ansible v3.2.1.

Comment 13 Vasishta 2019-01-14 06:16:39 UTC
Hi,

lvms are not removed from ubuntu machines when playbook is initiated as non-root user as the 'command' in the task "see if ceph-volume is installed" (newly introduced as part of PR 3436) not working as expected.

log - https://bugzilla.redhat.com/attachment.cgi?id=1520023

ansible user- ubuntu 
$ sudo cat /etc/sudoers.d/ubuntu
ubuntu ALL = (root) NOPASSWD:ALL
ubuntu@magna029:~$ ls -l /etc/sudoers.d/ubuntu
-r--r----- 1 root root 33 Jan 11 06:57 /etc/sudoers.d/ubuntu

When tried 'command' manually as ansible user -
$ sudo command -v ceph-volume
sudo: command: command not found

Moving back to ASSIGNED state, request you to kindly look int this and let me know your views.

Regards,
Vasishta shastry
QE, Ceph

Comment 16 Sébastien Han 2019-01-16 09:13:12 UTC
Yes, the fix is part of 3.2.3. Thanks

Comment 17 Vasishta 2019-01-16 09:59:10 UTC
Hi Andrew,

Request you to kindly provide your views on Comment 13
The fix for this BZ is blocking us to VERIFY the fix for Bug 1653307 .

Comment 25 Vasishta 2019-04-24 10:54:18 UTC
Created attachment 1558140 [details]
File contains log snippets

(Attachment contains failure log snippet, inventory with lvm_volumes argument, lsblk before second run, playbook log)

playbook is failing when initiated for second time if there were any existing lvs which were not part of the cluster.

Moving back to ASSIGNED state, reducing severity to low.

Regards,
Vasishta Shastry
QE, Ceph

Comment 31 errata-xmlrpc 2019-04-30 15:56:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:0911