Description of problem: when we run the uninstall.yml playbook with the following command to uninstall OCP and docker storage, the uninstall completes properly. ansible-playbook -e "openshift_uninstall_docker=True" -i inventories/openshift/openshift.yml /usr/share/ansible/openshift-ansible/playbooks/adhoc/uninstall.yml Attaching that output on the bugzilla, When I attempt a reinstall of the OCP cluster using initially prerequisites.yml and deploy_cluster.yml the playbook fails to list the docker files(saying no such file) which were removed in the uninstall.yml file. Attaching the other verbose logs as well. Version-Release number of the following components: [quicklab@master-0 ~]$ rpm -q openshift-ansible openshift-ansible-3.10.47-1.git.0.95bc2d2.el7_5.noarch [quicklab@master-0 ~]$ rpm -q ansible ansible-2.4.6.0-1.el7ae.noarch [quicklab@master-0 ~]$ ansible --version ansible 2.4.6.0 config file = /home/quicklab/ansible.cfg configured module search path = [u'/home/quicklab/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules'] ansible python module location = /usr/lib/python2.7/site-packages/ansible executable location = /usr/bin/ansible python version = 2.7.5 (default, May 31 2018, 09:41:32) [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] [quicklab@master-0 ~]$ How reproducible: Steps to Reproduce: 1. Configure a 3.10 cluster and run the uninstall.yml with variable 'openshift_uninstall_docker=True' 2. Once the uninstall completes, reinstall OCP on the same cluster. Actual results: fatal: [xx.xx.xx.xx]: FAILED! => { "changed": true, "cmd": [ "docker", "images", "-q", "registry.access.redhat.com/openshift3/ose-node:v3.10" ], "delta": "0:00:00.007388", "end": "2018-10-02 08:38:57.558805", "failed": true, "invocation": { "module_args": { "_raw_params": "docker images -q \"registry.access.redhat.com/openshift3/ose-node:v3.10\"", "_uses_shell": false, "chdir": null, "creates": null, "executable": null, "removes": null, "stdin": null, "warn": true } }, "msg": "non-zero return code", "rc": 1, "start": "2018-10-02 08:38:57.551417", "stderr": "/bin/docker: line 2: /etc/sysconfig/docker: No such file or directory", "stderr_lines": [ "/bin/docker: line 2: /etc/sysconfig/docker: No such file or directory" ], "stdout": "", "stdout_lines": [] } Expected results: The install should be fine if the cluster was correctly uninstalled. Additional info: Please attach logs from ansible-playbook with the -vvv flag Attaching all the logs in here.
The context of the uninstall playbooks are to "Remove as little as possible after failing to install openshift to enable retrying the install" Some items during install might not be idempotent due to the large complexity of installing openshift. We are not making any attempt to restore a host to any condition other than being ready for another install attempt of openshift. Optionally, you can choose to uninstall docker in a heavy handed manner. Docker storage setup is a prerequisite before running any playbooks, thus we don't know what the end user's docker configuration is because we didn't set it up. These playbooks are not meant for a production cluster. You can use them for POCs and iterating on a test set of machines if you are trying to get your initial install process figured out.
I ran into this exact scenario. The following workaround worked for me: "yum -y reinstall docker-common", which reinstalled the /etc/sysconfig/docker file.
PR Created in 3.11: https://github.com/openshift/openshift-ansible/pull/10810
In openshift-ansible-3.11.60-1 and later.
1. Reproduce the failure # rpm -qa |grep openshift-ansible openshift-ansible-3.11.56-1.git.0.59f0535.el7.noarch.rpm After the finished playbook uninstall.yml with enabled openshift_uninstall_docker, I check with command: # rpm -qa |grep docker docker-1.13.1-88.git07f3374.el7.x86_64 Re-install OCP with commands: # ansible-playbook -i inventory.yml playbooks/prerequisites.yml And get error messages from docker service: # systemctl status docker Jan 25 01:51:30 xxx dockerd-current[69936]: /usr/bin/docker-containerd: line 2: /etc/sysconfig/docker: No such file or directory 2. Verify the Patch Check in the uninstalled enviroment # rpm -qa |grep openshift-ansible openshift-ansible-3.11.73-1.git.0.89d3763.el7.noarch.rpm Re-install OCP and get the same error: # systemctl status docker Jan 25 03:23:35 xxx dockerd-current[64749]: /usr/bin/docker-containerd: line 2: /etc/sysconfig/docker: No such file or directory # rpm -qf /etc/sysconfig/docker docker-common-1.13.1-88.git07f3374.el7.x86_64 3. Result: Failed, because installer need to uninstall the rpm of docker-common
Need to update the task referenced in https://github.com/openshift/openshift-ansible/pull/10810 to consider all installed docker packages, for example docker-common.
https://github.com/openshift/openshift-ansible/pull/11104
Verified with: openshift-ansible-playbooks-3.11.82-1.git.0.f29227a.el7.noarch.rpm
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0326