| Summary: | upgrade to ose3.2 failed on Atomic Hosts | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Anping Li <anli> |
| Component: | Cluster Version Operator | Assignee: | Brenton Leanhardt <bleanhar> |
| Status: | CLOSED ERRATA | QA Contact: | Anping Li <anli> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 3.2.0 | CC: | anli, aos-bugs, bleanhar, jokerman, mmccomas, tdawson |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | openshift-ansible-3.0.57-1.git.0.c633ce7 | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-05-12 16:31:39 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
yum docker upgrade was called
TASK: [Upgrade Docker] ********************************************************
<10.14.6.120> ESTABLISH CONNECTION FOR USER: root
<10.14.6.120> REMOTE_MODULE command unknown update -y docker
<10.14.6.120> EXEC ssh -C -tt -v -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/root/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.14.6.120 /bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1457506547.78-171897095479100 && echo $HOME/.ansible/tmp/ansible-tmp-1457506547.78-171897095479100'
EXEC previous known host file not found for 10.14.6.120
<10.14.6.120> PUT /tmp/tmpdbJiPJ TO /root/.ansible/tmp/ansible-tmp-1457506547.78-171897095479100/command
<10.14.6.120> EXEC ssh -C -tt -v -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/root/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.14.6.120 /bin/sh -c 'LANG=C LC_CTYPE=C /usr/bin/python /root/.ansible/tmp/ansible-tmp-1457506547.78-171897095479100/command; rm -rf /root/.ansible/tmp/ansible-tmp-1457506547.78-171897095479100/ >/dev/null 2>&1'
EXEC previous known host file not found for 10.14.6.120
failed: [10.14.6.120] => {"cmd": "unknown update -y docker", "failed": true, "rc": 2}
msg: [Errno 2] No such file or directory
FATAL: all hosts have already failed -- aborting
PLAY RECAP ********************************************************************
to retry, use: --limit @/root/upgrade.retry
10.14.6.120 : ok=54 changed=7 unreachable=0 failed=1
localhost : ok=13 changed=0 unreachable=0 failed=0
Another great catch. This should fix it: https://github.com/openshift/openshift-ansible/pull/1576 No sure what cause this error, attached the logs and inventory file
TASK: [Determine available versions] ******************************************
changed: [oseatomic-node1.example.com]
changed: [oseatomic-master1.example.com]
TASK: [set_fact ] *************************************************************
ok: [oseatomic-master1.example.com]
fatal: [oseatomic-node1.example.com] => Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/ansible/runner/__init__.py", line 586, in _executor
exec_rc = self._executor_internal(host, new_stdin)
File "/usr/lib/python2.7/site-packages/ansible/runner/__init__.py", line 789, in _executor_internal
return self._executor_internal_inner(host, self.module_name, self.module_args, inject, port, complex_args=complex_args)
File "/usr/lib/python2.7/site-packages/ansible/runner/__init__.py", line 1013, in _executor_internal_inner
complex_args = template.template(self.basedir, complex_args, inject, fail_on_undefined=self.error_on_undefined_vars)
File "/usr/lib/python2.7/site-packages/ansible/utils/template.py", line 140, in template
d[k] = template(basedir, v, templatevars, lookup_fatal, depth, expand_lists, convert_bare, fail_on_undefined, filter_fatal)
File "/usr/lib/python2.7/site-packages/ansible/utils/template.py", line 124, in template
varname = template_from_string(basedir, varname, templatevars, fail_on_undefined)
File "/usr/lib/python2.7/site-packages/ansible/utils/template.py", line 382, in template_from_string
res = jinja2.utils.concat(rf)
File "<template>", line 9, in root
File "/usr/lib64/python2.7/site-packages/yaml/__init__.py", line 93, in safe_load
return load(stream, SafeLoader)
File "/usr/lib64/python2.7/site-packages/yaml/__init__.py", line 71, in load
return loader.get_single_data()
File "/usr/lib64/python2.7/site-packages/yaml/constructor.py", line 37, in get_single_data
node = self.get_single_node()
File "/usr/lib64/python2.7/site-packages/yaml/composer.py", line 36, in get_single_node
document = self.compose_document()
File "/usr/lib64/python2.7/site-packages/yaml/composer.py", line 55, in compose_document
node = self.compose_node(None, None)
File "/usr/lib64/python2.7/site-packages/yaml/composer.py", line 84, in compose_node
node = self.compose_mapping_node(anchor)
File "/usr/lib64/python2.7/site-packages/yaml/composer.py", line 127, in compose_mapping_node
while not self.check_event(MappingEndEvent):
File "/usr/lib64/python2.7/site-packages/yaml/parser.py", line 98, in check_event
self.current_event = self.state()
File "/usr/lib64/python2.7/site-packages/yaml/parser.py", line 428, in parse_block_mapping_key
if self.check_token(KeyToken):
File "/usr/lib64/python2.7/site-packages/yaml/scanner.py", line 116, in check_token
self.fetch_more_tokens()
File "/usr/lib64/python2.7/site-packages/yaml/scanner.py", line 220, in fetch_more_tokens
return self.fetch_value()
File "/usr/lib64/python2.7/site-packages/yaml/scanner.py", line 580, in fetch_value
self.get_mark())
ScannerError: mapping values are not allowed here
in "<string>", line 1, column 39:
... response from daemon: no such id: atomic-openshift-node
[root@anli config]# cat oseatomic
[OSEv3:children]
masters
nodes
[OSEv3:vars]
ansible_ssh_user=root
openshift_use_openshift_sdn=true
deployment_type=openshift-enterprise
osm_default_subdomain=miniaomic.example.com
openshift_master_identity_providers=[{'name': 'allow_all', 'login': 'true', 'challenge': 'true', 'kind': 'AllowAllPasswordIdentityProvider'}]
[masters]
oseatomic-master1.example.com
[nodes]
oseatomic-master1.example.com openshift_node_labels="{'region': 'infra', 'zone': 'default'}" openshift_hostname=oseatomic-master1.example.com openshift_public_hostname=oseatomic-master1.example.com openshift_schedulable=true
oseatomic-node1.example.com openshift_node_labels="{'region': 'primary', 'zone': 'west'}" openshift_hostname=oseatomic-node1.example.com openshift_public_hostname=oseatomic-node1.example.com
^
For this bug and Bug #1315563 it seems your node container wasn't running. I've added an additional check to start the node in case it wasn't running for some reason. I'm not convinced this will be the only error checking we'll need to add bug we definitely need to figure out why you atomic-openshift-node container couldn't be found. It works well now. So move bug to verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2016:1064 |
Description of problem: Migrate to next relese 3.2 on Atomic Host, "unknown clean all" was reported due to there isn't yum package on this platform. Version-Release number of selected component (if applicable): atomic-openshift-utils-3.0.54 How reproducible: always Steps to Reproduce: 1. Install containered ose3.1 on Atomic Host. 2. upgrade to ose 3.2 ansible-playbook -i config/atomicose /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_1_to_v3_2/upgrade.yml -vvv| tee upgrade.log Actual results: <10.14.6.126> ESTABLISH CONNECTION FOR USER: root <10.14.6.126> REMOTE_MODULE command unknown clean all <10.14.6.126> EXEC ssh -C -tt -v -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/root/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.14.6.126 /bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1457392903.37-85935774016039 && echo $HOME/.ansible/tmp/ansible-tmp-1457392903.37-85935774016039' EXEC previous known host file not found for 10.14.6.126 <10.14.6.126> PUT /tmp/tmp21fABE TO /root/.ansible/tmp/ansible-tmp-1457392903.37-85935774016039/command <10.14.6.126> EXEC ssh -C -tt -v -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/root/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.14.6.126 /bin/sh -c 'LANG=C LC_CTYPE=C /usr/bin/python /root/.ansible/tmp/ansible-tmp-1457392903.37-85935774016039/command; rm -rf /root/.ansible/tmp/ansible-tmp-1457392903.37-85935774016039/ >/dev/null 2>&1' EXEC previous known host file not found for 10.14.6.126 failed: [10.14.6.126] => {"cmd": "unknown clean all", "failed": true, "rc": 2} msg: [Errno 2] No such file or directory FATAL: all hosts have already failed -- aborting PLAY RECAP ******************************************************************** to retry, use: --limit @/root/upgrade.retry 10.14.6.120 : ok=32 changed=2 unreachable=0 failed=1 10.14.6.126 : ok=32 changed=2 unreachable=0 failed=1 localhost : ok=7 changed=0 unreachable=0 failed=0 Expected results: Additional info: