Bug 1315564 - upgrade to ose3.2 failed on Atomic Hosts
upgrade to ose3.2 failed on Atomic Hosts
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Upgrade (Show other bugs)
3.2.0
Unspecified Unspecified
medium Severity medium
: ---
: ---
Assigned To: Brenton Leanhardt
Anping Li
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-03-08 00:41 EST by Anping Li
Modified: 2016-05-12 12:31 EDT (History)
6 users (show)

See Also:
Fixed In Version: openshift-ansible-3.0.57-1.git.0.c633ce7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-05-12 12:31:39 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Anping Li 2016-03-08 00:41:25 EST
Description of problem:
Migrate to next relese 3.2 on Atomic Host, "unknown clean all" was reported due to there isn't yum package on this platform. 

Version-Release number of selected component (if applicable):
atomic-openshift-utils-3.0.54

How reproducible:
always

Steps to Reproduce:
1. Install containered ose3.1 on Atomic Host.
2. upgrade to ose 3.2
ansible-playbook -i config/atomicose /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_1_to_v3_2/upgrade.yml -vvv| tee upgrade.log
Actual results:
<10.14.6.126> ESTABLISH CONNECTION FOR USER: root
<10.14.6.126> REMOTE_MODULE command unknown clean all
<10.14.6.126> EXEC ssh -C -tt -v -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/root/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.14.6.126 /bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1457392903.37-85935774016039 && echo $HOME/.ansible/tmp/ansible-tmp-1457392903.37-85935774016039'
EXEC previous known host file not found for 10.14.6.126
<10.14.6.126> PUT /tmp/tmp21fABE TO /root/.ansible/tmp/ansible-tmp-1457392903.37-85935774016039/command
<10.14.6.126> EXEC ssh -C -tt -v -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/root/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.14.6.126 /bin/sh -c 'LANG=C LC_CTYPE=C /usr/bin/python /root/.ansible/tmp/ansible-tmp-1457392903.37-85935774016039/command; rm -rf /root/.ansible/tmp/ansible-tmp-1457392903.37-85935774016039/ >/dev/null 2>&1'
EXEC previous known host file not found for 10.14.6.126
failed: [10.14.6.126] => {"cmd": "unknown clean all", "failed": true, "rc": 2}
msg: [Errno 2] No such file or directory

FATAL: all hosts have already failed -- aborting

PLAY RECAP ********************************************************************
           to retry, use: --limit @/root/upgrade.retry

10.14.6.120                : ok=32   changed=2    unreachable=0    failed=1
10.14.6.126                : ok=32   changed=2    unreachable=0    failed=1
localhost                  : ok=7    changed=0    unreachable=0    failed=0

Expected results:


Additional info:
Comment 1 Brenton Leanhardt 2016-03-08 13:29:32 EST
https://github.com/openshift/openshift-ansible/pull/1566
Comment 3 Anping Li 2016-03-09 02:02:08 EST
yum docker upgrade was called

TASK: [Upgrade Docker] ******************************************************** 
<10.14.6.120> ESTABLISH CONNECTION FOR USER: root
<10.14.6.120> REMOTE_MODULE command unknown update -y docker
<10.14.6.120> EXEC ssh -C -tt -v -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/root/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.14.6.120 /bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1457506547.78-171897095479100 && echo $HOME/.ansible/tmp/ansible-tmp-1457506547.78-171897095479100'
EXEC previous known host file not found for 10.14.6.120
<10.14.6.120> PUT /tmp/tmpdbJiPJ TO /root/.ansible/tmp/ansible-tmp-1457506547.78-171897095479100/command
<10.14.6.120> EXEC ssh -C -tt -v -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/root/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.14.6.120 /bin/sh -c 'LANG=C LC_CTYPE=C /usr/bin/python /root/.ansible/tmp/ansible-tmp-1457506547.78-171897095479100/command; rm -rf /root/.ansible/tmp/ansible-tmp-1457506547.78-171897095479100/ >/dev/null 2>&1'
EXEC previous known host file not found for 10.14.6.120
failed: [10.14.6.120] => {"cmd": "unknown update -y docker", "failed": true, "rc": 2}
msg: [Errno 2] No such file or directory

FATAL: all hosts have already failed -- aborting

PLAY RECAP ******************************************************************** 
           to retry, use: --limit @/root/upgrade.retry

10.14.6.120                : ok=54   changed=7    unreachable=0    failed=1   
localhost                  : ok=13   changed=0    unreachable=0    failed=0
Comment 4 Brenton Leanhardt 2016-03-09 12:33:38 EST
Another great catch.  This should fix it: https://github.com/openshift/openshift-ansible/pull/1576
Comment 6 Anping Li 2016-03-14 07:33:23 EDT
No sure what cause this error, attached the logs and inventory file

TASK: [Determine available versions] ****************************************** 
changed: [oseatomic-node1.example.com]
changed: [oseatomic-master1.example.com]

TASK: [set_fact ] ************************************************************* 
ok: [oseatomic-master1.example.com]
fatal: [oseatomic-node1.example.com] => Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/ansible/runner/__init__.py", line 586, in _executor
    exec_rc = self._executor_internal(host, new_stdin)
  File "/usr/lib/python2.7/site-packages/ansible/runner/__init__.py", line 789, in _executor_internal
    return self._executor_internal_inner(host, self.module_name, self.module_args, inject, port, complex_args=complex_args)
  File "/usr/lib/python2.7/site-packages/ansible/runner/__init__.py", line 1013, in _executor_internal_inner
    complex_args = template.template(self.basedir, complex_args, inject, fail_on_undefined=self.error_on_undefined_vars)
  File "/usr/lib/python2.7/site-packages/ansible/utils/template.py", line 140, in template
    d[k] = template(basedir, v, templatevars, lookup_fatal, depth, expand_lists, convert_bare, fail_on_undefined, filter_fatal)
  File "/usr/lib/python2.7/site-packages/ansible/utils/template.py", line 124, in template
    varname = template_from_string(basedir, varname, templatevars, fail_on_undefined)
  File "/usr/lib/python2.7/site-packages/ansible/utils/template.py", line 382, in template_from_string
    res = jinja2.utils.concat(rf)
  File "<template>", line 9, in root
  File "/usr/lib64/python2.7/site-packages/yaml/__init__.py", line 93, in safe_load
    return load(stream, SafeLoader)
  File "/usr/lib64/python2.7/site-packages/yaml/__init__.py", line 71, in load
    return loader.get_single_data()
  File "/usr/lib64/python2.7/site-packages/yaml/constructor.py", line 37, in get_single_data
    node = self.get_single_node()
  File "/usr/lib64/python2.7/site-packages/yaml/composer.py", line 36, in get_single_node
    document = self.compose_document()
  File "/usr/lib64/python2.7/site-packages/yaml/composer.py", line 55, in compose_document
    node = self.compose_node(None, None)
  File "/usr/lib64/python2.7/site-packages/yaml/composer.py", line 84, in compose_node
    node = self.compose_mapping_node(anchor)
  File "/usr/lib64/python2.7/site-packages/yaml/composer.py", line 127, in compose_mapping_node
    while not self.check_event(MappingEndEvent):
  File "/usr/lib64/python2.7/site-packages/yaml/parser.py", line 98, in check_event
    self.current_event = self.state()
  File "/usr/lib64/python2.7/site-packages/yaml/parser.py", line 428, in parse_block_mapping_key
    if self.check_token(KeyToken):
  File "/usr/lib64/python2.7/site-packages/yaml/scanner.py", line 116, in check_token
    self.fetch_more_tokens()
  File "/usr/lib64/python2.7/site-packages/yaml/scanner.py", line 220, in fetch_more_tokens
    return self.fetch_value()
  File "/usr/lib64/python2.7/site-packages/yaml/scanner.py", line 580, in fetch_value
    self.get_mark())
ScannerError: mapping values are not allowed here
  in "<string>", line 1, column 39:
     ... response from daemon: no such id: atomic-openshift-node


[root@anli config]# cat oseatomic 
[OSEv3:children]
masters
nodes

[OSEv3:vars]
ansible_ssh_user=root
openshift_use_openshift_sdn=true
deployment_type=openshift-enterprise
osm_default_subdomain=miniaomic.example.com
openshift_master_identity_providers=[{'name': 'allow_all', 'login': 'true', 'challenge': 'true', 'kind': 'AllowAllPasswordIdentityProvider'}]

[masters]
oseatomic-master1.example.com

[nodes]
oseatomic-master1.example.com  openshift_node_labels="{'region': 'infra', 'zone': 'default'}" openshift_hostname=oseatomic-master1.example.com openshift_public_hostname=oseatomic-master1.example.com openshift_schedulable=true
oseatomic-node1.example.com  openshift_node_labels="{'region': 'primary', 'zone': 'west'}" openshift_hostname=oseatomic-node1.example.com openshift_public_hostname=oseatomic-node1.example.com


                                         ^
Comment 8 Brenton Leanhardt 2016-03-14 17:11:34 EDT
https://github.com/openshift/openshift-ansible/pull/1599
Comment 9 Brenton Leanhardt 2016-03-14 17:19:16 EDT
For this bug and Bug #1315563 it seems your node container wasn't running.  I've added an additional check to start the node in case it wasn't running for some reason.  I'm not convinced this will be the only error checking we'll need to add bug we definitely need to figure out why you atomic-openshift-node container couldn't be found.
Comment 11 Anping Li 2016-03-15 07:13:22 EDT
It works well now. So move bug to verified.
Comment 13 errata-xmlrpc 2016-05-12 12:31:39 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2016:1064

Note You need to log in before you can comment on or make changes to this bug.