Description of problem: In some case, user may mount a partition to /var/lib/ceph, and purge will try to umount it, this will result in a failure. Only the OSD dir should be umounted. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
In https://github.com/ceph/ceph-ansible/releases/tag/v3.0.43 and https://github.com/ceph/ceph-ansible/releases/tag/v3.1.0rc18
Hi Sebastian, Steps followed to verify 0)created a dir to mount in /var/lib/ceph/ mkdir /var/lib/ceph/mntdir 1) mount some x partition(used a disk partition which is not part of cluster) on /var/lib/ceph, mount /dev/sdc1 /var/lib/ceph/mntdir 2) Purge the cluster (playbook failed) RUNNING HANDLER [remove data] ************************************************************** The full traceback is: File "/tmp/ansible_rai1lQ/ansible_module_file.py", line 278, in main shutil.rmtree(b_path, ignore_errors=False) File "/usr/lib64/python2.7/shutil.py", line 247, in rmtree rmtree(fullname, ignore_errors, onerror) File "/usr/lib64/python2.7/shutil.py", line 256, in rmtree onerror(os.rmdir, path, sys.exc_info()) File "/usr/lib64/python2.7/shutil.py", line 254, in rmtree os.rmdir(path) fatal: [magna021]: FAILED! => { "changed": false, "failed": true, "invocation": { "module_args": { "attributes": null, "backup": null, "content": null, "delimiter": null, "diff_peek": null, "directory_mode": null, "follow": false, "force": false, "group": null, "mode": null, "original_basename": null, "owner": null, "path": "/var/lib/ceph", "recurse": false, "regexp": null, "remote_src": null, "selevel": null, "serole": null, "setype": null, "seuser": null, "src": null, "state": "absent", "unsafe_writes": null, "validate": null } }, "msg": "rmtree failed: [Errno 16] Device or resource busy: '/var/lib/ceph/mntdir'" 3) Purge should be successful. Only the OSD dir should be umounted. 4) recreate the cluster. Cluster should come up successfully pls let me know if you have any concerns with the steps. thanks
That's not the right approach, /var/lib/ceph should be the mountpoint
i have even tried mounting a partition at /var/lib/ceph and got this error while purging. TASK [umount osd data partition] *************************************************************** task path: /usr/share/ceph-ansible/purge-cluster.yml:283 Friday 24 August 2018 05:03:56 +0000 (0:00:02.419) 0:00:41.310 ********* Using module file /usr/lib/python2.7/site-packages/ansible/modules/commands/command.py Using module file /usr/lib/python2.7/site-packages/ansible/modules/commands/command.py Using module file /usr/lib/python2.7/site-packages/ansible/modules/commands/command.py <magna029> ESTABLISH SSH CONNECTION FOR USER: None <magna028> ESTABLISH SSH CONNECTION FOR USER: None <magna021> ESTABLISH SSH CONNECTION FOR USER: None <magna028> SSH: EXEC ssh -vvv -o ControlMaster=auto -o ControlPersist=600s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=30 -o ControlPath=/root/.ansible/cp/%h-%r-%p magna028 '/bin/sh -c '"'"'sudo -H -S -n -u root /bin/sh -c '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-ijwlzcpfwvhdgjztrqeyktdgdxffjurf; /usr/bin/python'"'"'"'"'"'"'"'"' && sleep 0'"'"'' <magna029> SSH: EXEC ssh -vvv -o ControlMaster=auto -o ControlPersist=600s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=30 -o ControlPath=/root/.ansible/cp/%h-%r-%p magna029 '/bin/sh -c '"'"'sudo -H -S -n -u root /bin/sh -c '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-sepaaedppavugaibmpfvqkczqadufrxv; /usr/bin/python'"'"'"'"'"'"'"'"' && sleep 0'"'"'' <magna021> SSH: EXEC ssh -vvv -o ControlMaster=auto -o ControlPersist=600s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=30 -o ControlPath=/root/.ansible/cp/%h-%r-%p magna021 '/bin/sh -c '"'"'sudo -H -S -n -u root /bin/sh -c '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-trarpzyyqdpfrhouhisakoemvvxwauvc; /usr/bin/python'"'"'"'"'"'"'"'"' && sleep 0'"'"'' <magna021> (1, '\n{"changed": true, "end": "2018-08-24 05:03:58.585417", "stdout": "", "cmd": "umount /var/lib/ceph/osd-lockbox/a9ae88fa-56ae-4025-8330-7e2fc36b875c", "failed": true, "delta": "0:00:00.033318", "stderr": "umount: /var/lib/ceph/osd-lockbox/a9ae88fa-56ae-4025-8330-7e2fc36b875c: mountpoint not found", "rc": 32, "invocation": {"module_args": {"warn": true, "executable": null, "_uses_shell": true, "_raw_params": "umount /var/lib/ceph/osd-lockbox/a9ae88fa-56ae-4025-8330-7e2fc36b875c", "removes": null, "creates": null, "chdir": null, "stdin": null}}, "start": "2018-08-24 05:03:58.552099", "msg": "non-zero return code"}\n', 'OpenSSH_7.4p1, OpenSSL 1.0.2k-fips 26 Jan 2017\r\ndebug1: Reading configuration data /root/.ssh/config\r\ndebug1: Reading configuration data /etc/ssh/ssh_config\r\ndebug1: /etc/ssh/ssh_config line 8: Applying options for *\r\ndebug1: auto-mux: Trying existing master\r\ndebug2: fd 3 setting O_NONBLOCK\r\ndebug2: mux_client_hello_exchange: master version 4\r\ndebug3: mux_client_forwards: request forwardings: 0 local, 0 remote\r\ndebug3: mux_client_request_session: entering\r\ndebug3: mux_client_request_alive: entering\r\ndebug3: mux_client_request_alive: done pid = 12793\r\ndebug3: mux_client_request_session: session request sent\r\ndebug1: mux_client_request_session: master session id: 2\r\ndebug3: mux_client_read_packet: read header failed: Broken pipe\r\ndebug2: Received exit status from master 1\r\n') failed: [magna021] (item=/var/lib/ceph/osd-lockbox/a9ae88fa-56ae-4025-8330-7e2fc36b875c) => { "changed": true, "cmd": "umount /var/lib/ceph/osd-lockbox/a9ae88fa-56ae-4025-8330-7e2fc36b875c", "delta": "0:00:00.033318", "end": "2018-08-24 05:03:58.585417", "failed": true, "invocation": { "module_args": { "_raw_params": "umount /var/lib/ceph/osd-lockbox/a9ae88fa-56ae-4025-8330-7e2fc36b875c", "_uses_shell": true, "chdir": null, "creates": null, "executable": null, "removes": null, "stdin": null, "warn": true } }, "item": "/var/lib/ceph/osd-lockbox/a9ae88fa-56ae-4025-8330-7e2fc36b875c", "msg": "non-zero return code", "rc": 32, "start": "2018-08-24 05:03:58.552099", "stderr": "umount: /var/lib/ceph/osd-lockbox/a9ae88fa-56ae-4025-8330-7e2fc36b875c: mountpoint not found", "stderr_lines": [ "umount: /var/lib/ceph/osd-lockbox/a9ae88fa-56ae-4025-8330-7e2fc36b875c: mountpoint not found" ], "stdout": "", if you ls in /var/lib/ceph after mounting a partition at that location you get lost+found dir.all the existing files in them are not visible.Hence Above error
The mountpoint /var/lib/ceph must contain files from the original /var/lib/ceph/.
As discussed followed the below steps to verify: 1) created /var/lib/ceph dir on nodeX and mounted a disk partition in it .(the disk partition will not be part of ceph cluster) 2) deployed ceph cluster with ceph-ansible (with above nodeX as one of osd node).Cluster deployed fine 3) purged the ceph cluster purge-cluster.yml errored at one Task [remove ceph systemd unit files] -->running Handler Remove Data """"msg": "rmtree failed: [Errno 16] Device or resource busy: '/var/lib/ceph'"" (next tasks worked fine) Attaching logs version: ceph-ansible-3.1.0-0.1.rc21.el7cp.noarch
Followed the steps as per comment #12 ,purge works fine .moving to verified state [ubuntu@magna097 ~]$ rpm -qa | grep ansible ansible-2.4.6.0-1.el7ae.noarch ceph-ansible-3.1.9-1.el7cp.noarch
Updated Doc Text from Known Issue to Bug Fix.
Code landed in ceph-ansible v3.1.3, we shipped v3.1.5 in https://access.redhat.com/errata/RHBA-2018:2819 QE verified on ceph-ansible-3.1.9-1.el7cp . Latest available version is ceph-ansible-3.2.0-1.el7cp from http://access.redhat.com/errata/RHBA-2019:0020