Bugzilla will be upgraded to version 5.0 on a still to be determined date in the near future. The original upgrade date has been delayed.
Bug 1615872 - purge cluster: do not umount /var/lib/ceph
purge cluster: do not umount /var/lib/ceph
Status: VERIFIED
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: Ceph-Ansible (Show other bugs)
3.1
Unspecified Unspecified
medium Severity medium
: z1
: 3.1
Assigned To: leseb
subhash
Aron Gunn
: TestOnly
Depends On:
Blocks: 1584264
  Show dependency treegraph
 
Reported: 2018-08-14 08:59 EDT by leseb
Modified: 2018-10-18 09:44 EDT (History)
15 users (show)

See Also:
Fixed In Version: RHEL: ceph-ansible-3.1.3.el7cp Ubuntu: ceph-ansible_3.1.3-2redhat1
Doc Type: Bug Fix
Doc Text:
.Purging the cluster no longer unmounts a partition from /var/lib/ceph Previously, if you mounted a partition to /var/lib/ceph, running the purge playbook caused a failure when it tried to unmount it. With this update, partitions mounted to /var/lib/ceph are not unmounted during a cluster purge.
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Github ceph/ceph-ansible/pull/3014 None None None 2018-08-14 08:59 EDT
Github ceph/ceph-ansible/pull/3068 None None None 2018-09-03 04:51 EDT

  None (edit)
Description leseb 2018-08-14 08:59:38 EDT
Description of problem:

In some case, user may mount a partition to /var/lib/ceph, and purge will try to umount
it, this will result in a failure. Only the OSD dir should be umounted.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
Comment 8 subhash 2018-08-24 03:59:08 EDT
Hi Sebastian,


Steps followed to verify
0)created a dir to mount in /var/lib/ceph/
mkdir /var/lib/ceph/mntdir
1) mount some x partition(used a disk partition which is not part of cluster) on /var/lib/ceph, 
mount /dev/sdc1 /var/lib/ceph/mntdir
2) Purge the cluster (playbook failed)

RUNNING HANDLER [remove data] **************************************************************
The full traceback is:
  File "/tmp/ansible_rai1lQ/ansible_module_file.py", line 278, in main
    shutil.rmtree(b_path, ignore_errors=False)
  File "/usr/lib64/python2.7/shutil.py", line 247, in rmtree
    rmtree(fullname, ignore_errors, onerror)
  File "/usr/lib64/python2.7/shutil.py", line 256, in rmtree
    onerror(os.rmdir, path, sys.exc_info())
  File "/usr/lib64/python2.7/shutil.py", line 254, in rmtree
    os.rmdir(path)

fatal: [magna021]: FAILED! => {
    "changed": false, 
    "failed": true, 
    "invocation": {
        "module_args": {
            "attributes": null, 
            "backup": null, 
            "content": null, 
            "delimiter": null, 
            "diff_peek": null, 
            "directory_mode": null, 
            "follow": false, 
            "force": false, 
            "group": null, 
            "mode": null, 
            "original_basename": null, 
            "owner": null, 
            "path": "/var/lib/ceph", 
            "recurse": false, 
            "regexp": null, 
            "remote_src": null, 
            "selevel": null, 
            "serole": null, 
            "setype": null, 
            "seuser": null, 
            "src": null, 
            "state": "absent", 
            "unsafe_writes": null, 
            "validate": null
        }
    }, 
    "msg": "rmtree failed: [Errno 16] Device or resource busy: '/var/lib/ceph/mntdir'"

3) Purge should be successful. Only the OSD dir should be umounted.
4) recreate the cluster. Cluster should come up successfully

pls let me know if you have any concerns with the steps.
thanks
Comment 9 leseb 2018-08-24 07:20:50 EDT
That's not the right approach, /var/lib/ceph should be the mountpoint
Comment 10 subhash 2018-08-24 08:22:14 EDT
i have even tried mounting a partition at /var/lib/ceph and got this error while purging.


TASK [umount osd data partition] ***************************************************************
task path: /usr/share/ceph-ansible/purge-cluster.yml:283
Friday 24 August 2018  05:03:56 +0000 (0:00:02.419)       0:00:41.310 *********
Using module file /usr/lib/python2.7/site-packages/ansible/modules/commands/command.py
Using module file /usr/lib/python2.7/site-packages/ansible/modules/commands/command.py
Using module file /usr/lib/python2.7/site-packages/ansible/modules/commands/command.py
<magna029> ESTABLISH SSH CONNECTION FOR USER: None
<magna028> ESTABLISH SSH CONNECTION FOR USER: None
<magna021> ESTABLISH SSH CONNECTION FOR USER: None
<magna028> SSH: EXEC ssh -vvv -o ControlMaster=auto -o ControlPersist=600s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=30 -o ControlPath=/root/.ansible/cp/%h-%r-%p magna028 '/bin/sh -c '"'"'sudo -H -S -n -u root /bin/sh -c '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-ijwlzcpfwvhdgjztrqeyktdgdxffjurf; /usr/bin/python'"'"'"'"'"'"'"'"' && sleep 0'"'"''
<magna029> SSH: EXEC ssh -vvv -o ControlMaster=auto -o ControlPersist=600s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=30 -o ControlPath=/root/.ansible/cp/%h-%r-%p magna029 '/bin/sh -c '"'"'sudo -H -S -n -u root /bin/sh -c '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-sepaaedppavugaibmpfvqkczqadufrxv; /usr/bin/python'"'"'"'"'"'"'"'"' && sleep 0'"'"''
<magna021> SSH: EXEC ssh -vvv -o ControlMaster=auto -o ControlPersist=600s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=30 -o ControlPath=/root/.ansible/cp/%h-%r-%p magna021 '/bin/sh -c '"'"'sudo -H -S -n -u root /bin/sh -c '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-trarpzyyqdpfrhouhisakoemvvxwauvc; /usr/bin/python'"'"'"'"'"'"'"'"' && sleep 0'"'"''
<magna021> (1, '\n{"changed": true, "end": "2018-08-24 05:03:58.585417", "stdout": "", "cmd": "umount /var/lib/ceph/osd-lockbox/a9ae88fa-56ae-4025-8330-7e2fc36b875c", "failed": true, "delta": "0:00:00.033318", "stderr": "umount: /var/lib/ceph/osd-lockbox/a9ae88fa-56ae-4025-8330-7e2fc36b875c: mountpoint not found", "rc": 32, "invocation": {"module_args": {"warn": true, "executable": null, "_uses_shell": true, "_raw_params": "umount /var/lib/ceph/osd-lockbox/a9ae88fa-56ae-4025-8330-7e2fc36b875c", "removes": null, "creates": null, "chdir": null, "stdin": null}}, "start": "2018-08-24 05:03:58.552099", "msg": "non-zero return code"}\n', 'OpenSSH_7.4p1, OpenSSL 1.0.2k-fips  26 Jan 2017\r\ndebug1: Reading configuration data /root/.ssh/config\r\ndebug1: Reading configuration data /etc/ssh/ssh_config\r\ndebug1: /etc/ssh/ssh_config line 8: Applying options for *\r\ndebug1: auto-mux: Trying existing master\r\ndebug2: fd 3 setting O_NONBLOCK\r\ndebug2: mux_client_hello_exchange: master version 4\r\ndebug3: mux_client_forwards: request forwardings: 0 local, 0 remote\r\ndebug3: mux_client_request_session: entering\r\ndebug3: mux_client_request_alive: entering\r\ndebug3: mux_client_request_alive: done pid = 12793\r\ndebug3: mux_client_request_session: session request sent\r\ndebug1: mux_client_request_session: master session id: 2\r\ndebug3: mux_client_read_packet: read header failed: Broken pipe\r\ndebug2: Received exit status from master 1\r\n')
failed: [magna021] (item=/var/lib/ceph/osd-lockbox/a9ae88fa-56ae-4025-8330-7e2fc36b875c) => {
    "changed": true,
    "cmd": "umount /var/lib/ceph/osd-lockbox/a9ae88fa-56ae-4025-8330-7e2fc36b875c",
    "delta": "0:00:00.033318",
    "end": "2018-08-24 05:03:58.585417",
    "failed": true,
    "invocation": {
        "module_args": {
            "_raw_params": "umount /var/lib/ceph/osd-lockbox/a9ae88fa-56ae-4025-8330-7e2fc36b875c",
            "_uses_shell": true,
            "chdir": null,
            "creates": null,
            "executable": null,
            "removes": null,
            "stdin": null,
            "warn": true
        }
    },
    "item": "/var/lib/ceph/osd-lockbox/a9ae88fa-56ae-4025-8330-7e2fc36b875c",
    "msg": "non-zero return code",
    "rc": 32,
    "start": "2018-08-24 05:03:58.552099",
    "stderr": "umount: /var/lib/ceph/osd-lockbox/a9ae88fa-56ae-4025-8330-7e2fc36b875c: mountpoint not found",
    "stderr_lines": [
        "umount: /var/lib/ceph/osd-lockbox/a9ae88fa-56ae-4025-8330-7e2fc36b875c: mountpoint not found"
    ],
    "stdout": "",


if you ls in /var/lib/ceph after mounting a partition at that location

you get lost+found dir.all the existing files in them are not visible.Hence Above error
Comment 11 leseb 2018-08-24 08:26:03 EDT
The mountpoint /var/lib/ceph must contain files from the original /var/lib/ceph/.
Comment 12 subhash 2018-08-24 17:05:58 EDT
As discussed followed the below steps to verify:
1) created /var/lib/ceph  dir on nodeX and mounted a disk partition in it .(the disk partition will not be part of ceph cluster)
2) deployed ceph cluster with ceph-ansible (with above nodeX as  one of osd node).Cluster deployed fine 
3) purged the ceph cluster

purge-cluster.yml errored at one Task [remove ceph systemd unit files] -->running Handler Remove Data """"msg": "rmtree failed: [Errno 16] Device or resource busy: '/var/lib/ceph'"" 
(next tasks worked fine)

Attaching logs

version: ceph-ansible-3.1.0-0.1.rc21.el7cp.noarch
Comment 23 subhash 2018-10-18 09:15:59 EDT
Followed the steps as per comment #12 ,purge works fine .moving to verified state

[ubuntu@magna097 ~]$ rpm -qa | grep ansible
ansible-2.4.6.0-1.el7ae.noarch
ceph-ansible-3.1.9-1.el7cp.noarch
Comment 24 John Brier 2018-10-18 09:44:47 EDT
Updated Doc Text from Known Issue to Bug Fix.

Note You need to log in before you can comment on or make changes to this bug.