1615872 – purge cluster: do not umount /var/lib/ceph

Bug 1615872 - purge cluster: do not umount /var/lib/ceph

Summary: purge cluster: do not umount /var/lib/ceph

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	Ceph-Ansible
Sub Component:
Version:	3.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	z1
Target Release:	3.1
Assignee:	Sébastien Han
QA Contact:	subhash
Docs Contact:	Aron Gunn
URL:
Whiteboard:
Depends On:
Blocks:	1584264
TreeView+	depends on / blocked

Reported:	2018-08-14 12:59 UTC by Sébastien Han
Modified:	2019-01-08 17:26 UTC (History)
CC List:	15 users (show)
Fixed In Version:	RHEL: ceph-ansible-3.1.3.el7cp Ubuntu: ceph-ansible_3.1.3-2redhat1
Doc Type:	Bug Fix
Doc Text:	.Purging the cluster no longer unmounts a partition from /var/lib/ceph Previously, if you mounted a partition to /var/lib/ceph, running the purge playbook caused a failure when it tried to unmount it. With this update, partitions mounted to /var/lib/ceph are not unmounted during a cluster purge.
Clone Of:
Environment:
Last Closed:	2019-01-08 17:26:42 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	ceph ceph-ansible pull 3014	0	None	closed	Use /var/lib/ceph/osd folder to filter osd mount point	2020-11-13 08:01:49 UTC
Github	ceph ceph-ansible pull 3068	0	None	closed	purge: only purge /var/lib/ceph content	2020-11-13 08:01:49 UTC

Description Sébastien Han 2018-08-14 12:59:38 UTC

Description of problem:

In some case, user may mount a partition to /var/lib/ceph, and purge will try to umount
it, this will result in a failure. Only the OSD dir should be umounted.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 3 Sébastien Han 2018-08-16 15:05:33 UTC

In https://github.com/ceph/ceph-ansible/releases/tag/v3.0.43 and https://github.com/ceph/ceph-ansible/releases/tag/v3.1.0rc18

Comment 8 subhash 2018-08-24 07:59:08 UTC

Hi Sebastian,


Steps followed to verify
0)created a dir to mount in /var/lib/ceph/
mkdir /var/lib/ceph/mntdir
1) mount some x partition(used a disk partition which is not part of cluster) on /var/lib/ceph, 
mount /dev/sdc1 /var/lib/ceph/mntdir
2) Purge the cluster (playbook failed)

RUNNING HANDLER [remove data] **************************************************************
The full traceback is:
  File "/tmp/ansible_rai1lQ/ansible_module_file.py", line 278, in main
    shutil.rmtree(b_path, ignore_errors=False)
  File "/usr/lib64/python2.7/shutil.py", line 247, in rmtree
    rmtree(fullname, ignore_errors, onerror)
  File "/usr/lib64/python2.7/shutil.py", line 256, in rmtree
    onerror(os.rmdir, path, sys.exc_info())
  File "/usr/lib64/python2.7/shutil.py", line 254, in rmtree
    os.rmdir(path)

fatal: [magna021]: FAILED! => {
    "changed": false, 
    "failed": true, 
    "invocation": {
        "module_args": {
            "attributes": null, 
            "backup": null, 
            "content": null, 
            "delimiter": null, 
            "diff_peek": null, 
            "directory_mode": null, 
            "follow": false, 
            "force": false, 
            "group": null, 
            "mode": null, 
            "original_basename": null, 
            "owner": null, 
            "path": "/var/lib/ceph", 
            "recurse": false, 
            "regexp": null, 
            "remote_src": null, 
            "selevel": null, 
            "serole": null, 
            "setype": null, 
            "seuser": null, 
            "src": null, 
            "state": "absent", 
            "unsafe_writes": null, 
            "validate": null
        }
    }, 
    "msg": "rmtree failed: [Errno 16] Device or resource busy: '/var/lib/ceph/mntdir'"

3) Purge should be successful. Only the OSD dir should be umounted.
4) recreate the cluster. Cluster should come up successfully

pls let me know if you have any concerns with the steps.
thanks

Comment 9 Sébastien Han 2018-08-24 11:20:50 UTC

That's not the right approach, /var/lib/ceph should be the mountpoint

Comment 10 subhash 2018-08-24 12:22:14 UTC

i have even tried mounting a partition at /var/lib/ceph and got this error while purging.


TASK [umount osd data partition] ***************************************************************
task path: /usr/share/ceph-ansible/purge-cluster.yml:283
Friday 24 August 2018  05:03:56 +0000 (0:00:02.419)       0:00:41.310 *********
Using module file /usr/lib/python2.7/site-packages/ansible/modules/commands/command.py
Using module file /usr/lib/python2.7/site-packages/ansible/modules/commands/command.py
Using module file /usr/lib/python2.7/site-packages/ansible/modules/commands/command.py
<magna029> ESTABLISH SSH CONNECTION FOR USER: None
<magna028> ESTABLISH SSH CONNECTION FOR USER: None
<magna021> ESTABLISH SSH CONNECTION FOR USER: None
<magna028> SSH: EXEC ssh -vvv -o ControlMaster=auto -o ControlPersist=600s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=30 -o ControlPath=/root/.ansible/cp/%h-%r-%p magna028 '/bin/sh -c '"'"'sudo -H -S -n -u root /bin/sh -c '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-ijwlzcpfwvhdgjztrqeyktdgdxffjurf; /usr/bin/python'"'"'"'"'"'"'"'"' && sleep 0'"'"''
<magna029> SSH: EXEC ssh -vvv -o ControlMaster=auto -o ControlPersist=600s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=30 -o ControlPath=/root/.ansible/cp/%h-%r-%p magna029 '/bin/sh -c '"'"'sudo -H -S -n -u root /bin/sh -c '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-sepaaedppavugaibmpfvqkczqadufrxv; /usr/bin/python'"'"'"'"'"'"'"'"' && sleep 0'"'"''
<magna021> SSH: EXEC ssh -vvv -o ControlMaster=auto -o ControlPersist=600s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=30 -o ControlPath=/root/.ansible/cp/%h-%r-%p magna021 '/bin/sh -c '"'"'sudo -H -S -n -u root /bin/sh -c '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-trarpzyyqdpfrhouhisakoemvvxwauvc; /usr/bin/python'"'"'"'"'"'"'"'"' && sleep 0'"'"''
<magna021> (1, '\n{"changed": true, "end": "2018-08-24 05:03:58.585417", "stdout": "", "cmd": "umount /var/lib/ceph/osd-lockbox/a9ae88fa-56ae-4025-8330-7e2fc36b875c", "failed": true, "delta": "0:00:00.033318", "stderr": "umount: /var/lib/ceph/osd-lockbox/a9ae88fa-56ae-4025-8330-7e2fc36b875c: mountpoint not found", "rc": 32, "invocation": {"module_args": {"warn": true, "executable": null, "_uses_shell": true, "_raw_params": "umount /var/lib/ceph/osd-lockbox/a9ae88fa-56ae-4025-8330-7e2fc36b875c", "removes": null, "creates": null, "chdir": null, "stdin": null}}, "start": "2018-08-24 05:03:58.552099", "msg": "non-zero return code"}\n', 'OpenSSH_7.4p1, OpenSSL 1.0.2k-fips  26 Jan 2017\r\ndebug1: Reading configuration data /root/.ssh/config\r\ndebug1: Reading configuration data /etc/ssh/ssh_config\r\ndebug1: /etc/ssh/ssh_config line 8: Applying options for *\r\ndebug1: auto-mux: Trying existing master\r\ndebug2: fd 3 setting O_NONBLOCK\r\ndebug2: mux_client_hello_exchange: master version 4\r\ndebug3: mux_client_forwards: request forwardings: 0 local, 0 remote\r\ndebug3: mux_client_request_session: entering\r\ndebug3: mux_client_request_alive: entering\r\ndebug3: mux_client_request_alive: done pid = 12793\r\ndebug3: mux_client_request_session: session request sent\r\ndebug1: mux_client_request_session: master session id: 2\r\ndebug3: mux_client_read_packet: read header failed: Broken pipe\r\ndebug2: Received exit status from master 1\r\n')
failed: [magna021] (item=/var/lib/ceph/osd-lockbox/a9ae88fa-56ae-4025-8330-7e2fc36b875c) => {
    "changed": true,
    "cmd": "umount /var/lib/ceph/osd-lockbox/a9ae88fa-56ae-4025-8330-7e2fc36b875c",
    "delta": "0:00:00.033318",
    "end": "2018-08-24 05:03:58.585417",
    "failed": true,
    "invocation": {
        "module_args": {
            "_raw_params": "umount /var/lib/ceph/osd-lockbox/a9ae88fa-56ae-4025-8330-7e2fc36b875c",
            "_uses_shell": true,
            "chdir": null,
            "creates": null,
            "executable": null,
            "removes": null,
            "stdin": null,
            "warn": true
        }
    },
    "item": "/var/lib/ceph/osd-lockbox/a9ae88fa-56ae-4025-8330-7e2fc36b875c",
    "msg": "non-zero return code",
    "rc": 32,
    "start": "2018-08-24 05:03:58.552099",
    "stderr": "umount: /var/lib/ceph/osd-lockbox/a9ae88fa-56ae-4025-8330-7e2fc36b875c: mountpoint not found",
    "stderr_lines": [
        "umount: /var/lib/ceph/osd-lockbox/a9ae88fa-56ae-4025-8330-7e2fc36b875c: mountpoint not found"
    ],
    "stdout": "",


if you ls in /var/lib/ceph after mounting a partition at that location

you get lost+found dir.all the existing files in them are not visible.Hence Above error

Comment 11 Sébastien Han 2018-08-24 12:26:03 UTC

The mountpoint /var/lib/ceph must contain files from the original /var/lib/ceph/.

Comment 12 subhash 2018-08-24 21:05:58 UTC

As discussed followed the below steps to verify:
1) created /var/lib/ceph  dir on nodeX and mounted a disk partition in it .(the disk partition will not be part of ceph cluster)
2) deployed ceph cluster with ceph-ansible (with above nodeX as  one of osd node).Cluster deployed fine 
3) purged the ceph cluster

purge-cluster.yml errored at one Task [remove ceph systemd unit files] -->running Handler Remove Data """"msg": "rmtree failed: [Errno 16] Device or resource busy: '/var/lib/ceph'"" 
(next tasks worked fine)

Attaching logs

version: ceph-ansible-3.1.0-0.1.rc21.el7cp.noarch

Comment 23 subhash 2018-10-18 13:15:59 UTC

Followed the steps as per comment #12 ,purge works fine .moving to verified state

[ubuntu@magna097 ~]$ rpm -qa | grep ansible
ansible-2.4.6.0-1.el7ae.noarch
ceph-ansible-3.1.9-1.el7cp.noarch

Comment 24 John Brier 2018-10-18 13:44:47 UTC

Updated Doc Text from Known Issue to Bug Fix.

Comment 25 Ken Dreyer (Red Hat) 2019-01-08 17:26:42 UTC

Code landed in ceph-ansible v3.1.3, we shipped v3.1.5 in https://access.redhat.com/errata/RHBA-2018:2819

QE verified on ceph-ansible-3.1.9-1.el7cp . Latest available version is ceph-ansible-3.2.0-1.el7cp from http://access.redhat.com/errata/RHBA-2019:0020

Note You need to log in before you can comment on or make changes to this bug.