Bug 1418980 - [ceph-ansible]: purge cluster on ubuntu fails to purge client node.
Summary: [ceph-ansible]: purge cluster on ubuntu fails to purge client node.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Storage Console
Classification: Red Hat Storage
Component: ceph-ansible
Version: 2
Hardware: Unspecified
OS: Linux
unspecified
medium
Target Milestone: ---
: 2
Assignee: Andrew Schoen
QA Contact: ceph-qe-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-02-03 10:45 UTC by Tejas
Modified: 2017-03-14 15:54 UTC (History)
11 users (show)

Fixed In Version: ceph-ansible-2.1.7-1.el7scon
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-03-14 15:54:14 UTC
Embargoed:


Attachments (Terms of Use)
ansible playbook log (19.04 KB, text/plain)
2017-02-03 10:45 UTC, Tejas
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:0515 0 normal SHIPPED_LIVE Important: ansible and ceph-ansible security, bug fix, and enhancement update 2017-04-18 21:12:31 UTC

Description Tejas 2017-02-03 10:45:14 UTC
Created attachment 1247397 [details]
ansible playbook  log

Description of problem:
    I am running a purge_cluster.yml on a Ubuntu cluster with 3 MON 3 OSD and 1 client node.

Seeing this failure on the client node:

TASK [detect init system] ******************************************************
changed: [magna031]
changed: [magna028]
changed: [magna058]
changed: [magna052]
changed: [magna046]
fatal: [magna061]: FAILED! => {"changed": false, "cmd": "ceph-detect-init", "failed": true, "msg": "[Errno 2] No such file or directory", "rc": 2}


Version-Release number of selected component (if applicable):
ceph-ansible-2.1.6-1.el7scon.noarch

magna061:~# ceph -v
ceph version 10.2.5-12redhat1xenial (b765e9865c0ee5b5260cd08c65c7581258a5b3c1)


How reproducible:
Always

Steps to Reproduce:
1. Run the purge_cluster.yml on a Ubuntu cluster.
2. The script fails on the client node


Additional info:

When I ran this command manually on another Ubuntu cluster:

MON node:
root@magna063:~# ceph-detect-init
systemd

Client:
magna087:~# ceph-detect-init
The program 'ceph-detect-init' is currently not installed. You can install it by typing:
apt install ceph

Attaching the ansible playbook log in this bug.

Comment 3 Andrew Schoen 2017-02-03 12:19:41 UTC
Tejas,

Could you share your hosts file? How was this client node created? I'm looking at the purge-cluster.yml playbook and it doesn't operate against a 'clients' group.

Thanks,
Andrew

Comment 5 seb 2017-02-03 13:38:46 UTC
I don't see 61 in your host file and this is the one failing.

Comment 7 seb 2017-02-06 10:56:57 UTC
Ahhh I understand now, magna061 is also a radosgw this is why the task fails. This is annoying. I think we should force this package installation. I don't want to end up into another way to detect the init system. So yes you need to install a package before purging the cluster, I know this is weird. I'll add a check and fail if the command is not present.

Ken, can we somehow make this package a dependency of the radosgw installation package?

Comment 9 Andrew Schoen 2017-02-06 12:04:08 UTC
(In reply to seb from comment #7)
> Ahhh I understand now, magna061 is also a radosgw this is why the task
> fails. This is annoying. 

But we're purging rgw nodes in our upstream tests and have never run into this. Is this packaging issues specific to downstream only?

Also, there is still the issue of purge cluster not operating against a 'clients' group. I would not expect a dedicated 'client' node to be purged at all by this playbook. Do we need to add support for purging of client nodes for this release?

Comment 11 seb 2017-02-06 13:33:39 UTC
True, we haven't come across that in our upstream CI, maybe RHCS packages have different dependencies downstream?

I'd say yes we should add the support for purging clients as well.

Comment 12 seb 2017-02-06 13:54:14 UTC
FYI upstream packages have ceph-detect-init on the client so we won't hit that issue on our upstream CI. The following gets installed:

[centos@ceph-client ~]$ rpm -qa | grep ceph
ceph-fuse-10.2.5-0.el7.x86_64
libcephfs1-10.2.5-0.el7.x86_64
ceph-common-10.2.5-0.el7.x86_64
ceph-selinux-10.2.5-0.el7.x86_64
ceph-release-1-1.el7.noarch
python-cephfs-10.2.5-0.el7.x86_64
ceph-base-10.2.5-0.el7.x86_64

I suspect this is due to the ceph-base package? Ken?

Comment 13 seb 2017-02-06 14:01:53 UTC
upstream PR https://github.com/ceph/ceph-ansible/pull/1281

Comment 14 Ken Dreyer (Red Hat) 2017-02-06 15:05:42 UTC
ceph-detect-init ships in the ceph-base package. This is the case upstream and downstream.

ceph-radosgw does not depend on the ceph-base package. In fact ceph-base is not present in the Tools repository - you need a Ceph product subscription to Mon or OSD repos to get this.

Why would purging an RGW node require ceph-detect-init?

Comment 15 seb 2017-02-06 15:32:50 UTC
So we know it's e.g: systemd and then we can properly stop rgw.

Comment 16 Ken Dreyer (Red Hat) 2017-02-07 17:25:36 UTC
Since ceph-ansible only supports a limited set of distros (and even fewer in the downstream product: Ubuntu Xenial and RHEL 7), we should be able to determine the init system is either upstart or systemd without ceph-detect-init.

Comment 17 Christina Meno 2017-02-07 17:52:19 UTC
Sounds like we already do this in the rolling_update playbook. Andrew is working on a patch

Comment 18 Andrew Schoen 2017-02-07 18:04:35 UTC
PR opened upstream: https://github.com/ceph/ceph-ansible/pull/1284

Comment 21 Tejas 2017-02-13 03:38:01 UTC
Verified on build:
ceph-ansible-2.1.7-1.el7scon.noarch

Comment 23 errata-xmlrpc 2017-03-14 15:54:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:0515


Note You need to log in before you can comment on or make changes to this bug.