1396295 – Upgrade fails with AnsibleUndefinedVariable

Bug 1396295 - Upgrade fails with AnsibleUndefinedVariable

Summary: Upgrade fails with AnsibleUndefinedVariable

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Cluster Version Operator
Sub Component:
Version:	3.2.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	3.2.1
Assignee:	Devan Goodwin
QA Contact:	Anping Li
Docs Contact:
URL:
Whiteboard:
Depends On:	1392276
Blocks:
TreeView+	depends on / blocked

Reported:	2016-11-17 22:28 UTC by Eric Jones
Modified:	2019-12-16 07:24 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	An Ansible 2.2 compatibility fix has been applied correcting the error "AnsibleUndefinedVariable: 'dict object' has no attribute 'debug_level'".
Clone Of:	1392276
Environment:
Last Closed:	2016-11-22 19:33:34 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:2814	0	normal	SHIPPED_LIVE	OpenShift Container Platform atomic-openshift-utils bug fix update	2016-11-23 00:23:29 UTC

Comment 1 Devan Goodwin 2016-11-18 14:08:03 UTC

Aleks could you check the contents of /etc/ansible/facts.d/openshift.fact and paste it's contents somewhere?

Would you be able to re-try with this change manually, I think it might fix it:

$ git diff                                                                    
diff --git a/playbooks/common/openshift-cluster/upgrades/pre.yml b/playbooks/common/openshift-cluster/upgrades/pre.yml
index 5c8803e..7b0066b 100644
--- a/playbooks/common/openshift-cluster/upgrades/pre.yml
+++ b/playbooks/common/openshift-cluster/upgrades/pre.yml
@@ -232,10 +232,23 @@
 ###############################################################################
 # Backup etcd
 ###############################################################################
+# If facts cache were for some reason deleted, this fact may not be set, and if not set
+# it will always default to true. This causes problems for the etcd data dir fact detection
+# so we must first make sure this is set correctly before attempting the backup.
+- name: Set master embedded_etcd fact
+  hosts: oo_masters_to_config
+  roles:
+  - openshift_facts
+  tasks:
+  - openshift_facts:
+      role: master
+      local_facts:
+        embedded_etcd: "{{ groups.oo_etcd_to_config | length == 0 }}"
+
 - name: Backup etcd
   hosts: etcd_hosts_to_backup
   vars:
-    embedded_etcd: "{{ hostvars[groups.oo_first_master.0].openshift.master.embedded_etcd }}"
+    embedded_etcd: "{{ groups.oo_etcd_to_config | default([]) | length == 0 }}"
     timestamp: "{{ lookup('pipe', 'date +%Y%m%d%H%M%S') }}"
   roles:
   - openshift_facts

Comment 2 Aleks Lazic 2016-11-18 14:21:47 UTC

[root@itsrv1554 ~ ] # downloads/jq-linux64 < /etc/ansible/facts.d/openshift.fact
{
  "node": {},
  "docker": {
    "blocked_registries": [],
    "hosted_registry_network": "172.30.0.0/16",
    "insecure_registries": [],
    "additional_registries": []
  },
  "master": {
    "ha": true
  },
  "common": {
    "generate_no_proxy_hosts": true,
    "cluster_id": "default",
    "is_containerized": false,
    "deployment_type": "openshift-enterprise"
  }
}

I have added this snipplet to the playbook.
But the etcd backup runs just fine?!

Tonight I will run the update again and update the ticket.
Is anyone available on the weekend?

I mean this is a prio 1 ticket because our production is affected with this issue.

Comment 3 Devan Goodwin 2016-11-18 14:30:28 UTC

Slight correction to the above patch, an additional line for the debug level:

https://gist.github.com/dgoodwin/d06c8f89f5d78349166f4f26509b20b0

Comment 4 Devan Goodwin 2016-11-18 15:33:06 UTC

Aleks: The additional line I just added to the gist link in comment #3 adds in debug_level as well. embedded_etcd was one way this surfaced, QE found a later problem explicitly with debug_level if they fully removed the cache prior to upgrade.

I believe this is related to problems arising when either the fact cache is deleted between cluster setup and upgrade, or just doesn't contain something we expect it to (perhaps because the version installed was older and we haven't yet pinned down how that is possible)

In any case though, the real problem is that upgrade was assuming certain facts were set but not explicitly making sure they were set, so it would only pass if the cache was present and contained the expected values. 

The above change ensures certain master facts are set, specifically the two that caused failures in upgrade. 

I will be around at times and monitoring my email. Please just make sure to disregard my patch in comment #1 and use the gist in comment #3.

Proposed PR https://github.com/openshift/openshift-ansible/pull/2826

Comment 6 Devan Goodwin 2016-11-18 18:47:17 UTC

I can reproduce the original issue reliably by installing an HA cluster with latest 3.2 installer openshift-ansible-3.2.42-1, then on each master editing /etc/ansible/facts.d/openshift.fact, and removing all 3 of the "debug_level" keys. Then when re-running 3.2 upgrade:

TASK [Create the master api service env file] **********************************
Friday 18 November 2016  13:54:07 -0400 (0:00:01.011)       0:06:12.699 ******* 
fatal: [ec2-75-101-226-223.compute-1.amazonaws.com]: FAILED! => {"changed": false, "failed": true, "msg": "AnsibleUndefinedVariable: 'dict object' has no attribute 'debug_level'"}
fatal: [ec2-107-20-5-117.compute-1.amazonaws.com]: FAILED! => {"changed": false, "failed": true, "msg": "AnsibleUndefinedVariable: 'dict object' has no attribute 'debug_level'"}
fatal: [ec2-54-165-253-95.compute-1.amazonaws.com]: FAILED! => {"changed": false, "failed": true, "msg": "AnsibleUndefinedVariable: 'dict object' has no attribute 'debug_level'"}

Re-trying with the fix from comment #3 and the pull request referenced above, the upgrade will complete successfully. I am very confident this will fix the customer's issue.

It was likely caused by installing with an older 3.2 before these facts existed.

I will be submitting a separate PR against master to significantly reduce the complexity in how these variables are being used.

Comment 7 Aleks Lazic 2016-11-18 21:13:09 UTC

WOW unbelievable.

The update was now successfully 8-O.

Finally after ~2 1/2 month the bug was found and fixed.

Best regards
Aleks

Comment 10 Anping Li 2016-11-21 08:56:41 UTC

The ansible playbook works. ansible-2.2.0.0-1.el7.noarch,openshift-ansible-3.2.43-1.git.0.fe29bec.el7.noarch

Comment 12 errata-xmlrpc 2016-11-22 19:33:34 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:2814

Note You need to log in before you can comment on or make changes to this bug.