Bug 1571252

Summary: uninstall.yml fails to remove /var/lib/docker if its mounted on separate filesystem
Product: OpenShift Container Platform Reporter: ggore
Component: InstallerAssignee: Scott Dodson <sdodson>
Status: CLOSED WONTFIX QA Contact: sheng.lao <shlao>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.9.0CC: aos-bugs, bleanhar, dapark, jokerman, jswensso, mgugino, mmariyan, mmccomas, sdodson
Target Milestone: ---   
Target Release: 3.10.z   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-11-29 16:53:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description ggore 2018-04-24 12:05:03 UTC
Description of problem:
/usr/share/ansible/openshift-ansible/playbooks/adhoc/uninstall.yml tries to remove /var/lib/docker and fails with below error, since its mounted on separate filesystem.

The full traceback is:
  File "/tmp/ansible_5zSHkj/ansible_module_file.py", line 278, in main
    shutil.rmtree(b_path, ignore_errors=False)
  File "/usr/lib64/python2.7/shutil.py", line 256, in rmtree
    onerror(os.rmdir, path, sys.exc_info())
  File "/usr/lib64/python2.7/shutil.py", line 254, in rmtree
    os.rmdir(path)
failed: [ose_master.example.com] (item=/var/lib/docker) => {
    "changed": false,
    "invocation": {
        "module_args": {
            "attributes": null,
            "backup": null,
            "content": null,
            "delimiter": null,
            "diff_peek": null,
            "directory_mode": null,
            "follow": false,
            "force": false,
            "group": null,
            "mode": null,
            "original_basename": null,
            "owner": null,
            "path": "/var/lib/docker",
            "recurse": false,
            "regexp": null,
            "remote_src": null,
            "selevel": null,
            "serole": null,
            "setype": null,
            "seuser": null,
            "src": null,
            "state": "absent",
            "unsafe_writes": null,
            "validate": null
        }
    },
    "item": "/var/lib/docker",
    "msg": "rmtree failed: [Errno 16] Device or resource busy: '/var/lib/docker'"
}
        to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/adhoc/uninstall.retry

PLAY RECAP *************************************************************************************************************************
ose_master.example.com : ok=38   changed=11   unreachable=0    failed=1


How reproducible:
Always

Steps to Reproduce:
1. Install OCP 3.9 with /var/lib/docker on separate filesystem.
2. Run uninstall.yml playbook


Expected results:
uninstall.yml playbook should remove /var/lib/docker/*

Comment 2 Klaas Demter 2018-08-27 13:56:36 UTC
This issue also affects 3.10

Comment 5 Brenton Leanhardt 2018-08-28 13:09:17 UTC
By default it was decided that docker storage should not be removed at uninstall time.  In order to uninstall docker storage on 3.10 the openshift_uninstall_docker=True setting would need to be set.  This should ship with this week's release.

Comment 6 Scott Dodson 2018-08-28 14:06:26 UTC
openshift-ansible-3.10.19-1 and later should have those changes.

Comment 7 Klaas Demter 2018-08-29 06:07:10 UTC
if openshift-ansible-3.10.19-1 has those changes then it should not be on QA but released. openshift-ansible-3.10.21-1.git.0.6446011.el7.noarch is the current version in 3.10 repositories.

Comment 8 Klaas Demter 2018-08-29 06:32:49 UTC
Running ansible-playbook --extra-vars "openshift_uninstall_docker=True" -i inventories/openshift/openshift.yml /usr/share/ansible/openshift-ansible/playbooks/adhoc/uninstall.yml

with openshift-ansible-3.10.21-1.git.0.6446011.el7.noarch

still gives me an error:
fatal: [hostname.domain.tld]: FAILED! => {"changed": true, "cmd": "rm -rf /var/lib/docker", "delta": "0:00:00.008434", "end": "2018-08-29 08:13:45.895883", "failed": true, "msg": "non-zero return code", "rc": 1, "start": "2018-08-29 08:13:45.887449", "stderr": "rm: cannot remove ‘/var/lib/docker/containers’: Device or resource busy\nrm: cannot remove ‘/var/lib/docker/devicemapper/mnt/dcf4231544c4f9d0900c7fa21cef1e5c160beb366d01ab0a1d2633729299e4b3’: Device or resource busy\nrm: cannot remove ‘/var/lib/docker/devicemapper/mnt/dbb4f7f800627cbdec29cb813a92ce6a97ff1d357746e79e5ea5292bd534eb41’: Device or resource busy", "stderr_lines": ["rm: cannot remove ‘/var/lib/docker/containers’: Device or resource busy", "rm: cannot remove ‘/var/lib/docker/devicemapper/mnt/dcf4231544c4f9d0900c7fa21cef1e5c160beb366d01ab0a1d2633729299e4b3’: Device or resource busy", "rm: cannot remove ‘/var/lib/docker/devicemapper/mnt/dbb4f7f800627cbdec29cb813a92ce6a97ff1d357746e79e5ea5292bd534eb41’: Device or resource busy"], "stdout": "", 


Some more details in support case 02169842

Comment 9 Klaas Demter 2018-08-29 06:49:18 UTC
this only happens on my masters in a multi-master setup

Comment 12 sheng.lao 2018-08-30 05:34:56 UTC
Test version: openshift-ansible-3.10.41-1 with openshift_uninstall_docker=True
Test result: Failed 

Verification Procedures:
# docker-storage-setup --reset
INFO: Found an already configured thin pool /dev/mapper/rhel-docker--pool in /etc/sysconfig/docker-storage
  Logical volume "docker-pool" successfully removed

# rm -rf /var/lib/docker
rm: cannot remove ‘/var/lib/docker’: Device or resource busy

# mount |grep /var/lib/docker
/dev/mapper/dockervg-dockerlv on /var/lib/docker type xfs (rw,relatime,seclabel,attr2,inode64,prjquota)

It works as follwoing:
# umount /var/lib/docker 
# rm -rf /var/lib/docker
#

Comment 14 Michael Gugino 2018-11-29 16:53:26 UTC
We don't intend to support every possible scenario of uninstalling docker on various systems.  If docker uninstall does not meet your needs, please do not use it and uninstall/reconfigure in a manner that matches your environment.