Bug 1391608
Summary: | Upgrade Playbook from 3.3.0.35 to 3.3.1.3 failed on checking embedded etcd on multi-master/etcd environment | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Eric Jones <erjones> | ||||
Component: | Cluster Version Operator | Assignee: | Devan Goodwin <dgoodwin> | ||||
Status: | CLOSED ERRATA | QA Contact: | Anping Li <anli> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 3.3.0 | CC: | anli, aos-bugs, bleanhar, dgoodwin, jialiu, jokerman, mmccomas | ||||
Target Milestone: | --- | Keywords: | Unconfirmed | ||||
Target Release: | 3.3.1 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | openshift-ansible-3.3.50-1.git.0.5bdbeaa.el7 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2016-11-15 19:11:02 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Eric Jones
2016-11-03 15:46:27 UTC
I think I have found a reproducer, if I install a cluster and then remove /etc/ansible/facts.d/openshift.fact on each master, then try to re-run a 3_3 upgrade, it will fail with exactly this error. It appears openshift_master_etcd_hosts is not being set during upgrade, but the error is hidden if you have the cached fact present from running original config.yml from cluster setup. Looks as though their fact cache was removed or somehow they hit a way for the cached value to disappear. Working on a fix now. If customer uses config.yml playbook (used for installation) for continued maintenance, it looks like re-running this will re-generate the facts cache, after which upgrade should complete. However my understanding is customers seldom use this playbook for ongoing maintenance. Proposed fix: https://github.com/openshift/openshift-ansible/pull/2730 Steps to reproduce for QE: ansible masters -i ./hosts -a "rm /etc/ansible/facts.d/openshift.fact" I'm not 100% sure how customer hit this but I believe the above step is the best way to reproduce this bug. The problem likely cannot affect embedded etcd deployments, or deployments with etcd on entirely separate hosts. I believe it will only trigger when etcd is colocated on the masters. We found the issue was master facts not being fully loaded and defaulting to embedded etcd true, which causes the etcd fact loading to fail due to a missing file. (as it's not actually embedded etcd) Fix: https://github.com/openshift/openshift-ansible/pull/2730 I have tested on containerized co-located etcd, rpm embedded etcd, rpm separate etcd hosts, and rpm co-located etcd. Created attachment 1220260 [details]
Ansible hosts and ansible logs
Upgrade failed.
AnsibleUndefinedVariable: 'dict object' has no attribute 'debug_level'
fatal: [openshift-190.lab.eng.nay.redhat.com]: FAILED! => {
"changed": false,
"failed": true
}
MSG:
AnsibleUndefinedVariable: 'dict object' has no attribute 'debug_level'
to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_3/upgrade.retry
It works well on atomic-openshift-utils-3.4.25-1 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2016:2778 |