Bug 1615872

Summary: purge cluster: do not umount /var/lib/ceph
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Sébastien Han <shan>
Component: Ceph-AnsibleAssignee: Sébastien Han <shan>
Status: CLOSED CURRENTRELEASE QA Contact: subhash <vpoliset>
Severity: medium Docs Contact: Aron Gunn <agunn>
Priority: medium    
Version: 3.1CC: agunn, anharris, aschoen, ceph-eng-bugs, ceph-qe-bugs, gabrioux, gmeno, hgurav, hnallurv, jbrier, kdreyer, nthomas, sankarshan, shan, tserlin
Target Milestone: z1Keywords: TestOnly
Target Release: 3.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: RHEL: ceph-ansible-3.1.3.el7cp Ubuntu: ceph-ansible_3.1.3-2redhat1 Doc Type: Bug Fix
Doc Text:
.Purging the cluster no longer unmounts a partition from /var/lib/ceph Previously, if you mounted a partition to /var/lib/ceph, running the purge playbook caused a failure when it tried to unmount it. With this update, partitions mounted to /var/lib/ceph are not unmounted during a cluster purge.
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-01-08 17:26:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1584264    

Description Sébastien Han 2018-08-14 12:59:38 UTC
Description of problem:

In some case, user may mount a partition to /var/lib/ceph, and purge will try to umount
it, this will result in a failure. Only the OSD dir should be umounted.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 8 subhash 2018-08-24 07:59:08 UTC
Hi Sebastian,


Steps followed to verify
0)created a dir to mount in /var/lib/ceph/
mkdir /var/lib/ceph/mntdir
1) mount some x partition(used a disk partition which is not part of cluster) on /var/lib/ceph, 
mount /dev/sdc1 /var/lib/ceph/mntdir
2) Purge the cluster (playbook failed)

RUNNING HANDLER [remove data] **************************************************************
The full traceback is:
  File "/tmp/ansible_rai1lQ/ansible_module_file.py", line 278, in main
    shutil.rmtree(b_path, ignore_errors=False)
  File "/usr/lib64/python2.7/shutil.py", line 247, in rmtree
    rmtree(fullname, ignore_errors, onerror)
  File "/usr/lib64/python2.7/shutil.py", line 256, in rmtree
    onerror(os.rmdir, path, sys.exc_info())
  File "/usr/lib64/python2.7/shutil.py", line 254, in rmtree
    os.rmdir(path)

fatal: [magna021]: FAILED! => {
    "changed": false, 
    "failed": true, 
    "invocation": {
        "module_args": {
            "attributes": null, 
            "backup": null, 
            "content": null, 
            "delimiter": null, 
            "diff_peek": null, 
            "directory_mode": null, 
            "follow": false, 
            "force": false, 
            "group": null, 
            "mode": null, 
            "original_basename": null, 
            "owner": null, 
            "path": "/var/lib/ceph", 
            "recurse": false, 
            "regexp": null, 
            "remote_src": null, 
            "selevel": null, 
            "serole": null, 
            "setype": null, 
            "seuser": null, 
            "src": null, 
            "state": "absent", 
            "unsafe_writes": null, 
            "validate": null
        }
    }, 
    "msg": "rmtree failed: [Errno 16] Device or resource busy: '/var/lib/ceph/mntdir'"

3) Purge should be successful. Only the OSD dir should be umounted.
4) recreate the cluster. Cluster should come up successfully

pls let me know if you have any concerns with the steps.
thanks

Comment 9 Sébastien Han 2018-08-24 11:20:50 UTC
That's not the right approach, /var/lib/ceph should be the mountpoint

Comment 10 subhash 2018-08-24 12:22:14 UTC
i have even tried mounting a partition at /var/lib/ceph and got this error while purging.


TASK [umount osd data partition] ***************************************************************
task path: /usr/share/ceph-ansible/purge-cluster.yml:283
Friday 24 August 2018  05:03:56 +0000 (0:00:02.419)       0:00:41.310 *********
Using module file /usr/lib/python2.7/site-packages/ansible/modules/commands/command.py
Using module file /usr/lib/python2.7/site-packages/ansible/modules/commands/command.py
Using module file /usr/lib/python2.7/site-packages/ansible/modules/commands/command.py
<magna029> ESTABLISH SSH CONNECTION FOR USER: None
<magna028> ESTABLISH SSH CONNECTION FOR USER: None
<magna021> ESTABLISH SSH CONNECTION FOR USER: None
<magna028> SSH: EXEC ssh -vvv -o ControlMaster=auto -o ControlPersist=600s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=30 -o ControlPath=/root/.ansible/cp/%h-%r-%p magna028 '/bin/sh -c '"'"'sudo -H -S -n -u root /bin/sh -c '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-ijwlzcpfwvhdgjztrqeyktdgdxffjurf; /usr/bin/python'"'"'"'"'"'"'"'"' && sleep 0'"'"''
<magna029> SSH: EXEC ssh -vvv -o ControlMaster=auto -o ControlPersist=600s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=30 -o ControlPath=/root/.ansible/cp/%h-%r-%p magna029 '/bin/sh -c '"'"'sudo -H -S -n -u root /bin/sh -c '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-sepaaedppavugaibmpfvqkczqadufrxv; /usr/bin/python'"'"'"'"'"'"'"'"' && sleep 0'"'"''
<magna021> SSH: EXEC ssh -vvv -o ControlMaster=auto -o ControlPersist=600s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=30 -o ControlPath=/root/.ansible/cp/%h-%r-%p magna021 '/bin/sh -c '"'"'sudo -H -S -n -u root /bin/sh -c '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-trarpzyyqdpfrhouhisakoemvvxwauvc; /usr/bin/python'"'"'"'"'"'"'"'"' && sleep 0'"'"''
<magna021> (1, '\n{"changed": true, "end": "2018-08-24 05:03:58.585417", "stdout": "", "cmd": "umount /var/lib/ceph/osd-lockbox/a9ae88fa-56ae-4025-8330-7e2fc36b875c", "failed": true, "delta": "0:00:00.033318", "stderr": "umount: /var/lib/ceph/osd-lockbox/a9ae88fa-56ae-4025-8330-7e2fc36b875c: mountpoint not found", "rc": 32, "invocation": {"module_args": {"warn": true, "executable": null, "_uses_shell": true, "_raw_params": "umount /var/lib/ceph/osd-lockbox/a9ae88fa-56ae-4025-8330-7e2fc36b875c", "removes": null, "creates": null, "chdir": null, "stdin": null}}, "start": "2018-08-24 05:03:58.552099", "msg": "non-zero return code"}\n', 'OpenSSH_7.4p1, OpenSSL 1.0.2k-fips  26 Jan 2017\r\ndebug1: Reading configuration data /root/.ssh/config\r\ndebug1: Reading configuration data /etc/ssh/ssh_config\r\ndebug1: /etc/ssh/ssh_config line 8: Applying options for *\r\ndebug1: auto-mux: Trying existing master\r\ndebug2: fd 3 setting O_NONBLOCK\r\ndebug2: mux_client_hello_exchange: master version 4\r\ndebug3: mux_client_forwards: request forwardings: 0 local, 0 remote\r\ndebug3: mux_client_request_session: entering\r\ndebug3: mux_client_request_alive: entering\r\ndebug3: mux_client_request_alive: done pid = 12793\r\ndebug3: mux_client_request_session: session request sent\r\ndebug1: mux_client_request_session: master session id: 2\r\ndebug3: mux_client_read_packet: read header failed: Broken pipe\r\ndebug2: Received exit status from master 1\r\n')
failed: [magna021] (item=/var/lib/ceph/osd-lockbox/a9ae88fa-56ae-4025-8330-7e2fc36b875c) => {
    "changed": true,
    "cmd": "umount /var/lib/ceph/osd-lockbox/a9ae88fa-56ae-4025-8330-7e2fc36b875c",
    "delta": "0:00:00.033318",
    "end": "2018-08-24 05:03:58.585417",
    "failed": true,
    "invocation": {
        "module_args": {
            "_raw_params": "umount /var/lib/ceph/osd-lockbox/a9ae88fa-56ae-4025-8330-7e2fc36b875c",
            "_uses_shell": true,
            "chdir": null,
            "creates": null,
            "executable": null,
            "removes": null,
            "stdin": null,
            "warn": true
        }
    },
    "item": "/var/lib/ceph/osd-lockbox/a9ae88fa-56ae-4025-8330-7e2fc36b875c",
    "msg": "non-zero return code",
    "rc": 32,
    "start": "2018-08-24 05:03:58.552099",
    "stderr": "umount: /var/lib/ceph/osd-lockbox/a9ae88fa-56ae-4025-8330-7e2fc36b875c: mountpoint not found",
    "stderr_lines": [
        "umount: /var/lib/ceph/osd-lockbox/a9ae88fa-56ae-4025-8330-7e2fc36b875c: mountpoint not found"
    ],
    "stdout": "",


if you ls in /var/lib/ceph after mounting a partition at that location

you get lost+found dir.all the existing files in them are not visible.Hence Above error

Comment 11 Sébastien Han 2018-08-24 12:26:03 UTC
The mountpoint /var/lib/ceph must contain files from the original /var/lib/ceph/.

Comment 12 subhash 2018-08-24 21:05:58 UTC
As discussed followed the below steps to verify:
1) created /var/lib/ceph  dir on nodeX and mounted a disk partition in it .(the disk partition will not be part of ceph cluster)
2) deployed ceph cluster with ceph-ansible (with above nodeX as  one of osd node).Cluster deployed fine 
3) purged the ceph cluster

purge-cluster.yml errored at one Task [remove ceph systemd unit files] -->running Handler Remove Data """"msg": "rmtree failed: [Errno 16] Device or resource busy: '/var/lib/ceph'"" 
(next tasks worked fine)

Attaching logs

version: ceph-ansible-3.1.0-0.1.rc21.el7cp.noarch

Comment 23 subhash 2018-10-18 13:15:59 UTC
Followed the steps as per comment #12 ,purge works fine .moving to verified state

[ubuntu@magna097 ~]$ rpm -qa | grep ansible
ansible-2.4.6.0-1.el7ae.noarch
ceph-ansible-3.1.9-1.el7cp.noarch

Comment 24 John Brier 2018-10-18 13:44:47 UTC
Updated Doc Text from Known Issue to Bug Fix.

Comment 25 Ken Dreyer (Red Hat) 2019-01-08 17:26:42 UTC
Code landed in ceph-ansible v3.1.3, we shipped v3.1.5 in https://access.redhat.com/errata/RHBA-2018:2819

QE verified on ceph-ansible-3.1.9-1.el7cp . Latest available version is ceph-ansible-3.2.0-1.el7cp from http://access.redhat.com/errata/RHBA-2019:0020