Bug 1635254 - [3.11] uninstall will remove the all the docker related files and reinstall on the same cluster fails
Summary: [3.11] uninstall will remove the all the docker related files and reinstall o...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.11.0
Hardware: x86_64
OS: Linux
low
high
Target Milestone: ---
: 3.11.z
Assignee: Russell Teague
QA Contact: Qin Ping
URL:
Whiteboard:
Depends On:
Blocks: 1655684
TreeView+ depends on / blocked
 
Reported: 2018-10-02 13:15 UTC by Jatan Malde
Modified: 2019-02-20 14:11 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: All docker related packages are not removed during uninstall Consequence: Docker is not reinstalled properly during install causing docker cli tasks to fail. Fix: Added all related docker packages to uninstall Result: Reinstall succeeds after running uninstall playbook
Clone Of:
: 1655684 (view as bug list)
Environment:
Last Closed: 2019-02-20 14:11:01 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0326 None None None 2019-02-20 14:11:07 UTC

Description Jatan Malde 2018-10-02 13:15:06 UTC
Description of problem:

when we run the uninstall.yml playbook with the following command to uninstall OCP and docker storage, the uninstall completes properly.

ansible-playbook -e "openshift_uninstall_docker=True" -i inventories/openshift/openshift.yml /usr/share/ansible/openshift-ansible/playbooks/adhoc/uninstall.yml

Attaching that output on the bugzilla, 

When I attempt a reinstall of the OCP cluster using initially prerequisites.yml and deploy_cluster.yml the playbook fails to list the docker files(saying no such file) which were removed in the uninstall.yml file.

Attaching the other verbose logs as well.

Version-Release number of the following components:
[quicklab@master-0 ~]$ rpm -q openshift-ansible
openshift-ansible-3.10.47-1.git.0.95bc2d2.el7_5.noarch
[quicklab@master-0 ~]$ rpm -q ansible
ansible-2.4.6.0-1.el7ae.noarch
[quicklab@master-0 ~]$ ansible --version
ansible 2.4.6.0
  config file = /home/quicklab/ansible.cfg
  configured module search path = [u'/home/quicklab/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.5 (default, May 31 2018, 09:41:32) [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)]
[quicklab@master-0 ~]$ 

How reproducible:

Steps to Reproduce:
1. Configure a 3.10 cluster and run the uninstall.yml with variable 'openshift_uninstall_docker=True'
2. Once the uninstall completes, reinstall OCP on the same cluster.

Actual results:

fatal: [xx.xx.xx.xx]: FAILED! => {
    "changed": true, 
    "cmd": [
        "docker", 
        "images", 
        "-q", 
        "registry.access.redhat.com/openshift3/ose-node:v3.10"
    ], 
    "delta": "0:00:00.007388", 
    "end": "2018-10-02 08:38:57.558805", 
    "failed": true, 
    "invocation": {
        "module_args": {
            "_raw_params": "docker images -q \"registry.access.redhat.com/openshift3/ose-node:v3.10\"", 
            "_uses_shell": false, 
            "chdir": null, 
            "creates": null, 
            "executable": null, 
            "removes": null, 
            "stdin": null, 
            "warn": true
        }
    }, 
    "msg": "non-zero return code", 
    "rc": 1, 
    "start": "2018-10-02 08:38:57.551417", 
    "stderr": "/bin/docker: line 2: /etc/sysconfig/docker: No such file or directory", 
    "stderr_lines": [
        "/bin/docker: line 2: /etc/sysconfig/docker: No such file or directory"
    ], 
    "stdout": "", 
    "stdout_lines": []
}


Expected results:
The install should be fine if the cluster was correctly uninstalled.

Additional info:
Please attach logs from ansible-playbook with the -vvv flag
Attaching all the logs in here.

Comment 8 Michael Gugino 2018-10-05 16:30:04 UTC
The context of the uninstall playbooks are to "Remove as little as possible after failing to install openshift to enable retrying the install"

Some items during install might not be idempotent due to the large complexity of installing openshift.  We are not making any attempt to restore a host to any condition other than being ready for another install attempt of openshift.

Optionally, you can choose to uninstall docker in a heavy handed manner.  Docker storage setup is a prerequisite before running any playbooks, thus we don't know what the end user's docker configuration is because we didn't set it up.

These playbooks are not meant for a production cluster.  You can use them for POCs and iterating on a test set of machines if you are trying to get your initial install process figured out.

Comment 10 tripletrk 2018-11-11 23:29:19 UTC
I ran into this exact scenario. The following workaround worked for me: 
"yum -y reinstall docker-common", which reinstalled the /etc/sysconfig/docker file.

Comment 11 Michael Gugino 2018-12-03 17:02:35 UTC
PR Created in 3.11: https://github.com/openshift/openshift-ansible/pull/10810

Comment 12 Scott Dodson 2019-01-24 14:53:56 UTC
In openshift-ansible-3.11.60-1 and later.

Comment 13 sheng.lao 2019-01-25 08:24:38 UTC
1. Reproduce the failure
# rpm -qa |grep openshift-ansible
openshift-ansible-3.11.56-1.git.0.59f0535.el7.noarch.rpm

After the finished playbook uninstall.yml with enabled openshift_uninstall_docker, I check with command:
# rpm -qa |grep docker
docker-1.13.1-88.git07f3374.el7.x86_64

Re-install OCP with commands:
# ansible-playbook -i inventory.yml playbooks/prerequisites.yml 

And get error messages from docker service:
# systemctl status docker
Jan 25 01:51:30 xxx dockerd-current[69936]: /usr/bin/docker-containerd: line 2: /etc/sysconfig/docker: No such file or directory

2. Verify the Patch
Check in the uninstalled enviroment
# rpm -qa |grep openshift-ansible
openshift-ansible-3.11.73-1.git.0.89d3763.el7.noarch.rpm

Re-install OCP and get the same error:
# systemctl status docker
Jan 25 03:23:35 xxx dockerd-current[64749]: /usr/bin/docker-containerd: line 2: /etc/sysconfig/docker: No such file or directory

# rpm -qf /etc/sysconfig/docker
docker-common-1.13.1-88.git07f3374.el7.x86_64


3. Result: Failed, because installer need to uninstall the rpm of docker-common

Comment 14 Scott Dodson 2019-01-31 14:42:44 UTC
Need to update the task referenced in https://github.com/openshift/openshift-ansible/pull/10810 to consider all installed docker packages, for example docker-common.

Comment 17 Qin Ping 2019-02-11 07:46:45 UTC
Verified with:
openshift-ansible-playbooks-3.11.82-1.git.0.f29227a.el7.noarch.rpm

Comment 19 errata-xmlrpc 2019-02-20 14:11:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0326


Note You need to log in before you can comment on or make changes to this bug.