Bug 1315564 - upgrade to ose3.2 failed on Atomic Hosts
Summary: upgrade to ose3.2 failed on Atomic Hosts
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cluster Version Operator
Version: 3.2.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Brenton Leanhardt
QA Contact: Anping Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-03-08 05:41 UTC by Anping Li
Modified: 2016-05-12 16:31 UTC (History)
6 users (show)

Fixed In Version: openshift-ansible-3.0.57-1.git.0.c633ce7
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-05-12 16:31:39 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2016:1064 0 normal SHIPPED_LIVE Important: Red Hat OpenShift Enterprise 3.2 security, bug fix, and enhancement update 2016-05-12 20:19:17 UTC

Description Anping Li 2016-03-08 05:41:25 UTC
Description of problem:
Migrate to next relese 3.2 on Atomic Host, "unknown clean all" was reported due to there isn't yum package on this platform. 

Version-Release number of selected component (if applicable):
atomic-openshift-utils-3.0.54

How reproducible:
always

Steps to Reproduce:
1. Install containered ose3.1 on Atomic Host.
2. upgrade to ose 3.2
ansible-playbook -i config/atomicose /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_1_to_v3_2/upgrade.yml -vvv| tee upgrade.log
Actual results:
<10.14.6.126> ESTABLISH CONNECTION FOR USER: root
<10.14.6.126> REMOTE_MODULE command unknown clean all
<10.14.6.126> EXEC ssh -C -tt -v -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/root/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.14.6.126 /bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1457392903.37-85935774016039 && echo $HOME/.ansible/tmp/ansible-tmp-1457392903.37-85935774016039'
EXEC previous known host file not found for 10.14.6.126
<10.14.6.126> PUT /tmp/tmp21fABE TO /root/.ansible/tmp/ansible-tmp-1457392903.37-85935774016039/command
<10.14.6.126> EXEC ssh -C -tt -v -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/root/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.14.6.126 /bin/sh -c 'LANG=C LC_CTYPE=C /usr/bin/python /root/.ansible/tmp/ansible-tmp-1457392903.37-85935774016039/command; rm -rf /root/.ansible/tmp/ansible-tmp-1457392903.37-85935774016039/ >/dev/null 2>&1'
EXEC previous known host file not found for 10.14.6.126
failed: [10.14.6.126] => {"cmd": "unknown clean all", "failed": true, "rc": 2}
msg: [Errno 2] No such file or directory

FATAL: all hosts have already failed -- aborting

PLAY RECAP ********************************************************************
           to retry, use: --limit @/root/upgrade.retry

10.14.6.120                : ok=32   changed=2    unreachable=0    failed=1
10.14.6.126                : ok=32   changed=2    unreachable=0    failed=1
localhost                  : ok=7    changed=0    unreachable=0    failed=0

Expected results:


Additional info:

Comment 1 Brenton Leanhardt 2016-03-08 18:29:32 UTC
https://github.com/openshift/openshift-ansible/pull/1566

Comment 3 Anping Li 2016-03-09 07:02:08 UTC
yum docker upgrade was called

TASK: [Upgrade Docker] ******************************************************** 
<10.14.6.120> ESTABLISH CONNECTION FOR USER: root
<10.14.6.120> REMOTE_MODULE command unknown update -y docker
<10.14.6.120> EXEC ssh -C -tt -v -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/root/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.14.6.120 /bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1457506547.78-171897095479100 && echo $HOME/.ansible/tmp/ansible-tmp-1457506547.78-171897095479100'
EXEC previous known host file not found for 10.14.6.120
<10.14.6.120> PUT /tmp/tmpdbJiPJ TO /root/.ansible/tmp/ansible-tmp-1457506547.78-171897095479100/command
<10.14.6.120> EXEC ssh -C -tt -v -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/root/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.14.6.120 /bin/sh -c 'LANG=C LC_CTYPE=C /usr/bin/python /root/.ansible/tmp/ansible-tmp-1457506547.78-171897095479100/command; rm -rf /root/.ansible/tmp/ansible-tmp-1457506547.78-171897095479100/ >/dev/null 2>&1'
EXEC previous known host file not found for 10.14.6.120
failed: [10.14.6.120] => {"cmd": "unknown update -y docker", "failed": true, "rc": 2}
msg: [Errno 2] No such file or directory

FATAL: all hosts have already failed -- aborting

PLAY RECAP ******************************************************************** 
           to retry, use: --limit @/root/upgrade.retry

10.14.6.120                : ok=54   changed=7    unreachable=0    failed=1   
localhost                  : ok=13   changed=0    unreachable=0    failed=0

Comment 4 Brenton Leanhardt 2016-03-09 17:33:38 UTC
Another great catch.  This should fix it: https://github.com/openshift/openshift-ansible/pull/1576

Comment 6 Anping Li 2016-03-14 11:33:23 UTC
No sure what cause this error, attached the logs and inventory file

TASK: [Determine available versions] ****************************************** 
changed: [oseatomic-node1.example.com]
changed: [oseatomic-master1.example.com]

TASK: [set_fact ] ************************************************************* 
ok: [oseatomic-master1.example.com]
fatal: [oseatomic-node1.example.com] => Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/ansible/runner/__init__.py", line 586, in _executor
    exec_rc = self._executor_internal(host, new_stdin)
  File "/usr/lib/python2.7/site-packages/ansible/runner/__init__.py", line 789, in _executor_internal
    return self._executor_internal_inner(host, self.module_name, self.module_args, inject, port, complex_args=complex_args)
  File "/usr/lib/python2.7/site-packages/ansible/runner/__init__.py", line 1013, in _executor_internal_inner
    complex_args = template.template(self.basedir, complex_args, inject, fail_on_undefined=self.error_on_undefined_vars)
  File "/usr/lib/python2.7/site-packages/ansible/utils/template.py", line 140, in template
    d[k] = template(basedir, v, templatevars, lookup_fatal, depth, expand_lists, convert_bare, fail_on_undefined, filter_fatal)
  File "/usr/lib/python2.7/site-packages/ansible/utils/template.py", line 124, in template
    varname = template_from_string(basedir, varname, templatevars, fail_on_undefined)
  File "/usr/lib/python2.7/site-packages/ansible/utils/template.py", line 382, in template_from_string
    res = jinja2.utils.concat(rf)
  File "<template>", line 9, in root
  File "/usr/lib64/python2.7/site-packages/yaml/__init__.py", line 93, in safe_load
    return load(stream, SafeLoader)
  File "/usr/lib64/python2.7/site-packages/yaml/__init__.py", line 71, in load
    return loader.get_single_data()
  File "/usr/lib64/python2.7/site-packages/yaml/constructor.py", line 37, in get_single_data
    node = self.get_single_node()
  File "/usr/lib64/python2.7/site-packages/yaml/composer.py", line 36, in get_single_node
    document = self.compose_document()
  File "/usr/lib64/python2.7/site-packages/yaml/composer.py", line 55, in compose_document
    node = self.compose_node(None, None)
  File "/usr/lib64/python2.7/site-packages/yaml/composer.py", line 84, in compose_node
    node = self.compose_mapping_node(anchor)
  File "/usr/lib64/python2.7/site-packages/yaml/composer.py", line 127, in compose_mapping_node
    while not self.check_event(MappingEndEvent):
  File "/usr/lib64/python2.7/site-packages/yaml/parser.py", line 98, in check_event
    self.current_event = self.state()
  File "/usr/lib64/python2.7/site-packages/yaml/parser.py", line 428, in parse_block_mapping_key
    if self.check_token(KeyToken):
  File "/usr/lib64/python2.7/site-packages/yaml/scanner.py", line 116, in check_token
    self.fetch_more_tokens()
  File "/usr/lib64/python2.7/site-packages/yaml/scanner.py", line 220, in fetch_more_tokens
    return self.fetch_value()
  File "/usr/lib64/python2.7/site-packages/yaml/scanner.py", line 580, in fetch_value
    self.get_mark())
ScannerError: mapping values are not allowed here
  in "<string>", line 1, column 39:
     ... response from daemon: no such id: atomic-openshift-node


[root@anli config]# cat oseatomic 
[OSEv3:children]
masters
nodes

[OSEv3:vars]
ansible_ssh_user=root
openshift_use_openshift_sdn=true
deployment_type=openshift-enterprise
osm_default_subdomain=miniaomic.example.com
openshift_master_identity_providers=[{'name': 'allow_all', 'login': 'true', 'challenge': 'true', 'kind': 'AllowAllPasswordIdentityProvider'}]

[masters]
oseatomic-master1.example.com

[nodes]
oseatomic-master1.example.com  openshift_node_labels="{'region': 'infra', 'zone': 'default'}" openshift_hostname=oseatomic-master1.example.com openshift_public_hostname=oseatomic-master1.example.com openshift_schedulable=true
oseatomic-node1.example.com  openshift_node_labels="{'region': 'primary', 'zone': 'west'}" openshift_hostname=oseatomic-node1.example.com openshift_public_hostname=oseatomic-node1.example.com


                                         ^

Comment 8 Brenton Leanhardt 2016-03-14 21:11:34 UTC
https://github.com/openshift/openshift-ansible/pull/1599

Comment 9 Brenton Leanhardt 2016-03-14 21:19:16 UTC
For this bug and Bug #1315563 it seems your node container wasn't running.  I've added an additional check to start the node in case it wasn't running for some reason.  I'm not convinced this will be the only error checking we'll need to add bug we definitely need to figure out why you atomic-openshift-node container couldn't be found.

Comment 11 Anping Li 2016-03-15 11:13:22 UTC
It works well now. So move bug to verified.

Comment 13 errata-xmlrpc 2016-05-12 16:31:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2016:1064


Note You need to log in before you can comment on or make changes to this bug.