Bug 1380317
| Summary: | OpenShift upgrade fails while trying to upgrade docker on etcd servers | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Jaspreet Kaur <jkaur> |
| Component: | Cluster Version Operator | Assignee: | Devan Goodwin <dgoodwin> |
| Status: | CLOSED ERRATA | QA Contact: | Anping Li <anli> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 3.2.0 | CC: | aos-bugs, jkaur, jokerman, mmagnani, mmccomas |
| Target Milestone: | --- | ||
| Target Release: | 3.2.1 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
Cause: Upgrade procedure assumed docker rpm would be available in repositories on standalone etcd nodes.
Consequence: Upgrade could fail if the etcd node in question did not have the extras repository enabled.
Fix: Upgrade no longer checks what version of docker is available if docker is not installed at all.
Result: Upgrade will now proceed on etcd nodes which do not have docker installed, or available in their repositories.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-10-27 16:13:33 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
We believe that this is fixed in 3.3 playbooks, we'll investigate fixing this for 3.2.x playbooks. Jaspreet could you confirm something for me, looking at your inventory, your nfs host is listed as an etcd server, but you comment that is only provides nfs functionality. Did you intend to have etcd running on the nfs host as well? Is there anything unusual about the nfs host? Does it not have access to docker rpm? Workarounds available: 1. ensure rhel-7-server-extras-rpms is enabled on standalone etcd hosts 2. specify docker_upgrade=false on relevant hosts in inventory [etcd] group Steps to reproduce for QE: 1. Install rpm environment, any version of 3.2 with standalone etcd hosts. 2. Disable rhel-7-server-extras-rpms on standalone etcd hosts. 2. Run playbooks/byo/openshift-cluster/upgrades/v3_2/upgrade.yml. You should see the failed avail_docker check on those hosts. We will fix to skip the avail docker check if docker is not already installed and release to 3.2 steams shortly. Docker can't be upgraded to 1.10.3 even if we set docker_upgrade=true and docker_version=1.10.3 in inventory.
1. upgrade failed for docker condition checking.
TASK [docker : Fail if Docker version requested but downgrade is required] *****
skipping: [openshift-121.lab.eng.nay.redhat.com] => {
"changed": false,
"skip_reason": "Conditional check failed",
"skipped": true
}
TASK [docker : Error out if attempting to upgrade Docker across the 1.10 boundary] ***
fatal: [openshift-121.lab.eng.nay.redhat.com]: FAILED! => {
"changed": false,
"failed": true
}
MSG:
Cannot upgrade Docker to >= 1.10, please upgrade or remove Docker manually, or use the Docker upgrade playbook if OpenShift is already installed.
to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_3/upgrade_nodes.retry
Report is missing a little info, however I was able to reproduce by rebuilding. When possible please attach ansible.log, exact command run, and openshift-ansible version, as well as the inventory above. I think it's safe to call this a separate bug, the original issue here can be reproduced without forcing a 1.10 boundary upgrade, it would occur just checking what docker was available on a etcd host without the extras repo enabled. I have some thoughts on the new issue but I can save those for discussion there. This should be low priority as Docker 1.10 became a requirement back in 3.2.x releases, and we document that you should run latest 3.2.x before upgrading to 3.3. Fixing may be a little complicated. Got it, move to Verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:2122 |
Description of problem: openshift cluster fails to upgrade with the upgrade playbook from 3.2.0 to 3.2.1. However, it fails with the error: Upgrade cannot continue. The following hosts did not complete etcd backup. Earlier in the proces also an error is present: The conditional check 'avail_docker_version.stdout | version_compare('1.10','<') and docker_version is not defined' failed. The error was: Version comparison: LooseVersion instance has no attribute 'version'\n\nThe error appears to have been in '/usr/share/ansible/openshift-ansible/playbooks/common/openshift-cluster/upgrades/docker/upgrade_check.yml': line 28, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- fail However the nfs server only serves nfs and does not provide docker functionality. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: