Bug 1431077

Summary: fatal excluder error when upgrade atomic OCP
Product: OpenShift Container Platform Reporter: Anping Li <anli>
Component: Cluster Version OperatorAssignee: Jan Chaloupka <jchaloup>
Status: CLOSED ERRATA QA Contact: Anping Li <anli>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.5.0CC: anli, aos-bugs, jokerman, mmccomas
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: the upgrade plays are upgrading excluders on AH Consequence: the plays fail as excluders on AH are not supported Fix: skip excluders on AH Result: excluders are skip on AH, the plays no longer fail
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-04-12 19:03:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Anping Li 2017-03-10 10:19:08 UTC
Description of problem:
Fatal message are reported with excluder TASKs

ASK [Docker excluder version detected] ****************************************
fatal: [10.8.175.65]: FAILED! => {
    "failed": true
}

MSG:

the  field 'args' has an invalid value, which appears to include a variable  that is undefined. The error was: 'dict object' has no attribute  'stdout'

The  error appears to have been in  '/usr/share/ansible/openshift-ansible/playbooks/common/openshift-cluster/upgrades/pre/validate_excluder.yml':  line 13, column 3, but may
be elsewhere in the file depending on the exact syntax problem.

Version-Release number of selected component (if applicable):
atomic-openshift-utils-3.5.28-1.git.0.103513e.el7.noarch

How reproducible:
always

Steps to Reproduce:
1. Install OCP 3.4
2. Upgrade to v3.5
   ansible-playbook -i hosts /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_5/upgrade.yml

Actual results:
Upgrade failed, there fatal message with  excluder TASKs

TASK [Docker excluder version detected] ****************************************
fatal: [10.8.175.65]: FAILED! => {
    "failed": true
}

MSG:

the field 'args' has an invalid value, which appears to include a variable that is undefined. The error was: 'dict object' has no attribute 'stdout'

The error appears to have been in '/usr/share/ansible/openshift-ansible/playbooks/common/openshift-cluster/upgrades/pre/validate_excluder.yml': line 13, column 3, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:


- name: Docker excluder version detected
  ^ here

fatal: [10.8.173.61]: FAILED! => {
    "failed": true
}

MSG:

the field 'args' has an invalid value, which appears to include a variable that is undefined. The error was: 'dict object' has no attribute 'stdout'

The error appears to have been in '/usr/share/ansible/openshift-ansible/playbooks/common/openshift-cluster/upgrades/pre/validate_excluder.yml': line 13, column 3, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

<--snip-->
<--snip-->

TASK [fail] ********************************************************************
fatal: [localhost]: FAILED! => {
    "changed": false,
    "failed": true
}

MSG:

Upgrade cannot continue. The following hosts did not complete etcd backup: 10.8.173.61,10.8.173.40,10.8.175.65
        to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_5/upgrade.retry

PLAY RECAP *********************************************************************
10.8.173.147               : ok=30   changed=1    unreachable=0    failed=1
10.8.173.25                : ok=30   changed=1    unreachable=0    failed=1
10.8.173.40                : ok=31   changed=1    unreachable=0    failed=1
10.8.173.44                : ok=30   changed=1    unreachable=0    failed=1
10.8.173.61                : ok=34   changed=1    unreachable=0    failed=1
10.8.175.236               : ok=30   changed=1    unreachable=0    failed=1
10.8.175.4                 : ok=115  changed=11   unreachable=0    failed=1
10.8.175.65                : ok=31   changed=1    unreachable=0    failed=1
10.8.175.82                : ok=30   changed=1    unreachable=0    failed=1
localhost                  : ok=30   changed=0    unreachable=0    failed=1

Expected results:


Additional info:

Comment 1 Jan Chaloupka 2017-03-10 14:37:46 UTC
Anping Li, can you provide steps to reproduce and the entire ansible tasks log?

Comment 2 Jan Chaloupka 2017-03-10 16:15:24 UTC
I am not able to reproduce it with the latest commit in master branch with [1] applied. I will run another installation and upgrade scenario later on again once [1] and its dependent PR [2] are merged.

[1] https://github.com/openshift/openshift-ansible/pull/3620
[2] https://github.com/openshift/openshift-ansible/pull/3610

Comment 4 Jan Chaloupka 2017-03-10 16:45:57 UTC
Can you check your repositories if the excluders are available at all? Just before you run the upgrade, can you run:

# yum update atomic-openshift-docker-excluder atomic-openshift-excluder

without -y option? Just to verify the excluders can be updated to 3.5.

Comment 6 Jan Chaloupka 2017-03-13 11:56:56 UTC
Upstream PR: https://github.com/openshift/openshift-ansible/pull/3631

Excluders are not supported on AH yet.

Comment 7 Anping Li 2017-03-14 07:04:36 UTC
Verified and pass with openshift-ansible-3.5.32

Comment 9 errata-xmlrpc 2017-04-12 19:03:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0903