Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1380317

Summary: OpenShift upgrade fails while trying to upgrade docker on etcd servers
Product: OpenShift Container Platform Reporter: Jaspreet Kaur <jkaur>
Component: Cluster Version OperatorAssignee: Devan Goodwin <dgoodwin>
Status: CLOSED ERRATA QA Contact: Anping Li <anli>
Severity: high Docs Contact:
Priority: high    
Version: 3.2.0CC: aos-bugs, jkaur, jokerman, mmagnani, mmccomas
Target Milestone: ---   
Target Release: 3.2.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Upgrade procedure assumed docker rpm would be available in repositories on standalone etcd nodes. Consequence: Upgrade could fail if the etcd node in question did not have the extras repository enabled. Fix: Upgrade no longer checks what version of docker is available if docker is not installed at all. Result: Upgrade will now proceed on etcd nodes which do not have docker installed, or available in their repositories.
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-10-27 16:13:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jaspreet Kaur 2016-09-29 09:41:50 UTC
Description of problem:  openshift cluster fails to upgrade with the upgrade playbook from 3.2.0 to 3.2.1. 

However, it fails with the error: Upgrade cannot continue. The following hosts did not complete etcd backup.

Earlier in the proces also an error is present: 
The conditional check 'avail_docker_version.stdout | version_compare('1.10','<') and docker_version is not defined' failed. The error was: Version comparison: LooseVersion instance has no attribute 'version'\n\nThe error appears to have been in '/usr/share/ansible/openshift-ansible/playbooks/common/openshift-cluster/upgrades/docker/upgrade_check.yml': line 28, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- fail
However the nfs server only serves nfs and does not provide docker functionality.

 

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 4 Scott Dodson 2016-10-04 15:19:49 UTC
We believe that this is fixed in 3.3 playbooks, we'll investigate fixing this for 3.2.x playbooks.

Comment 5 Devan Goodwin 2016-10-12 12:26:59 UTC
Jaspreet could you confirm something for me, looking at your inventory, your nfs host is listed as an etcd server, but you comment that is only provides nfs functionality. 

Did you intend to have etcd running on the nfs host as well?

Is there anything unusual about the nfs host? Does it not have access to docker rpm?

Comment 10 Devan Goodwin 2016-10-14 12:32:34 UTC
Workarounds available:

1. ensure rhel-7-server-extras-rpms is enabled on standalone etcd hosts
2. specify docker_upgrade=false on relevant hosts in inventory [etcd] group

Steps to reproduce for QE: 

1. Install rpm environment, any version of 3.2 with standalone etcd hosts.
2. Disable rhel-7-server-extras-rpms on standalone etcd hosts.
2. Run playbooks/byo/openshift-cluster/upgrades/v3_2/upgrade.yml. 

You should see the failed avail_docker check on those hosts.



We will fix to skip the avail docker check if docker is not already installed and release to 3.2 steams shortly.

Comment 13 Anping Li 2016-10-21 05:13:05 UTC
Docker can't be upgraded to 1.10.3 even if we set docker_upgrade=true and docker_version=1.10.3 in inventory.


1. upgrade failed for docker condition checking.

TASK [docker : Fail if Docker version requested but downgrade is required] *****
skipping: [openshift-121.lab.eng.nay.redhat.com] => {
    "changed": false,
    "skip_reason": "Conditional check failed",
    "skipped": true
}

TASK [docker : Error out if attempting to upgrade Docker across the 1.10 boundary] ***
fatal: [openshift-121.lab.eng.nay.redhat.com]: FAILED! => {
    "changed": false,
    "failed": true
}

MSG:

Cannot upgrade Docker to >= 1.10, please upgrade or remove Docker manually, or use the Docker upgrade playbook if OpenShift is already installed.
        to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_3/upgrade_nodes.retry

Comment 15 Devan Goodwin 2016-10-21 13:52:48 UTC
Report is missing a little info, however I was able to reproduce by rebuilding. When possible please attach ansible.log, exact command run, and openshift-ansible version, as well as the inventory above. 

I think it's safe to call this a separate bug, the original issue here can be reproduced without forcing a 1.10 boundary upgrade, it would occur just checking what docker was available on a etcd host without the extras repo enabled.

I have some thoughts on the new issue but I can save those for discussion there. This should be low priority as Docker 1.10 became a requirement back in 3.2.x releases, and we document that you should run latest 3.2.x before upgrading to 3.3. Fixing may be a little complicated.

Comment 17 Anping Li 2016-10-24 01:16:54 UTC
Got it, move to Verified.

Comment 19 errata-xmlrpc 2016-10-27 16:13:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:2122