Created attachment 1247397 [details] ansible playbook log Description of problem: I am running a purge_cluster.yml on a Ubuntu cluster with 3 MON 3 OSD and 1 client node. Seeing this failure on the client node: TASK [detect init system] ****************************************************** changed: [magna031] changed: [magna028] changed: [magna058] changed: [magna052] changed: [magna046] fatal: [magna061]: FAILED! => {"changed": false, "cmd": "ceph-detect-init", "failed": true, "msg": "[Errno 2] No such file or directory", "rc": 2} Version-Release number of selected component (if applicable): ceph-ansible-2.1.6-1.el7scon.noarch magna061:~# ceph -v ceph version 10.2.5-12redhat1xenial (b765e9865c0ee5b5260cd08c65c7581258a5b3c1) How reproducible: Always Steps to Reproduce: 1. Run the purge_cluster.yml on a Ubuntu cluster. 2. The script fails on the client node Additional info: When I ran this command manually on another Ubuntu cluster: MON node: root@magna063:~# ceph-detect-init systemd Client: magna087:~# ceph-detect-init The program 'ceph-detect-init' is currently not installed. You can install it by typing: apt install ceph Attaching the ansible playbook log in this bug.
Tejas, Could you share your hosts file? How was this client node created? I'm looking at the purge-cluster.yml playbook and it doesn't operate against a 'clients' group. Thanks, Andrew
I don't see 61 in your host file and this is the one failing.
Ahhh I understand now, magna061 is also a radosgw this is why the task fails. This is annoying. I think we should force this package installation. I don't want to end up into another way to detect the init system. So yes you need to install a package before purging the cluster, I know this is weird. I'll add a check and fail if the command is not present. Ken, can we somehow make this package a dependency of the radosgw installation package?
See: https://github.com/ceph/ceph-ansible/pull/1281
(In reply to seb from comment #7) > Ahhh I understand now, magna061 is also a radosgw this is why the task > fails. This is annoying. But we're purging rgw nodes in our upstream tests and have never run into this. Is this packaging issues specific to downstream only? Also, there is still the issue of purge cluster not operating against a 'clients' group. I would not expect a dedicated 'client' node to be purged at all by this playbook. Do we need to add support for purging of client nodes for this release?
True, we haven't come across that in our upstream CI, maybe RHCS packages have different dependencies downstream? I'd say yes we should add the support for purging clients as well.
FYI upstream packages have ceph-detect-init on the client so we won't hit that issue on our upstream CI. The following gets installed: [centos@ceph-client ~]$ rpm -qa | grep ceph ceph-fuse-10.2.5-0.el7.x86_64 libcephfs1-10.2.5-0.el7.x86_64 ceph-common-10.2.5-0.el7.x86_64 ceph-selinux-10.2.5-0.el7.x86_64 ceph-release-1-1.el7.noarch python-cephfs-10.2.5-0.el7.x86_64 ceph-base-10.2.5-0.el7.x86_64 I suspect this is due to the ceph-base package? Ken?
upstream PR https://github.com/ceph/ceph-ansible/pull/1281
ceph-detect-init ships in the ceph-base package. This is the case upstream and downstream. ceph-radosgw does not depend on the ceph-base package. In fact ceph-base is not present in the Tools repository - you need a Ceph product subscription to Mon or OSD repos to get this. Why would purging an RGW node require ceph-detect-init?
So we know it's e.g: systemd and then we can properly stop rgw.
Since ceph-ansible only supports a limited set of distros (and even fewer in the downstream product: Ubuntu Xenial and RHEL 7), we should be able to determine the init system is either upstart or systemd without ceph-detect-init.
Sounds like we already do this in the rolling_update playbook. Andrew is working on a patch
PR opened upstream: https://github.com/ceph/ceph-ansible/pull/1284
Verified on build: ceph-ansible-2.1.7-1.el7scon.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:0515