Description of problem: Migrate to next relese 3.2 on Atomic Host, "unknown clean all" was reported due to there isn't yum package on this platform. Version-Release number of selected component (if applicable): atomic-openshift-utils-3.0.54 How reproducible: always Steps to Reproduce: 1. Install containered ose3.1 on Atomic Host. 2. upgrade to ose 3.2 ansible-playbook -i config/atomicose /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_1_to_v3_2/upgrade.yml -vvv| tee upgrade.log Actual results: <10.14.6.126> ESTABLISH CONNECTION FOR USER: root <10.14.6.126> REMOTE_MODULE command unknown clean all <10.14.6.126> EXEC ssh -C -tt -v -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/root/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.14.6.126 /bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1457392903.37-85935774016039 && echo $HOME/.ansible/tmp/ansible-tmp-1457392903.37-85935774016039' EXEC previous known host file not found for 10.14.6.126 <10.14.6.126> PUT /tmp/tmp21fABE TO /root/.ansible/tmp/ansible-tmp-1457392903.37-85935774016039/command <10.14.6.126> EXEC ssh -C -tt -v -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/root/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.14.6.126 /bin/sh -c 'LANG=C LC_CTYPE=C /usr/bin/python /root/.ansible/tmp/ansible-tmp-1457392903.37-85935774016039/command; rm -rf /root/.ansible/tmp/ansible-tmp-1457392903.37-85935774016039/ >/dev/null 2>&1' EXEC previous known host file not found for 10.14.6.126 failed: [10.14.6.126] => {"cmd": "unknown clean all", "failed": true, "rc": 2} msg: [Errno 2] No such file or directory FATAL: all hosts have already failed -- aborting PLAY RECAP ******************************************************************** to retry, use: --limit @/root/upgrade.retry 10.14.6.120 : ok=32 changed=2 unreachable=0 failed=1 10.14.6.126 : ok=32 changed=2 unreachable=0 failed=1 localhost : ok=7 changed=0 unreachable=0 failed=0 Expected results: Additional info:
https://github.com/openshift/openshift-ansible/pull/1566
yum docker upgrade was called TASK: [Upgrade Docker] ******************************************************** <10.14.6.120> ESTABLISH CONNECTION FOR USER: root <10.14.6.120> REMOTE_MODULE command unknown update -y docker <10.14.6.120> EXEC ssh -C -tt -v -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/root/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.14.6.120 /bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1457506547.78-171897095479100 && echo $HOME/.ansible/tmp/ansible-tmp-1457506547.78-171897095479100' EXEC previous known host file not found for 10.14.6.120 <10.14.6.120> PUT /tmp/tmpdbJiPJ TO /root/.ansible/tmp/ansible-tmp-1457506547.78-171897095479100/command <10.14.6.120> EXEC ssh -C -tt -v -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/root/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.14.6.120 /bin/sh -c 'LANG=C LC_CTYPE=C /usr/bin/python /root/.ansible/tmp/ansible-tmp-1457506547.78-171897095479100/command; rm -rf /root/.ansible/tmp/ansible-tmp-1457506547.78-171897095479100/ >/dev/null 2>&1' EXEC previous known host file not found for 10.14.6.120 failed: [10.14.6.120] => {"cmd": "unknown update -y docker", "failed": true, "rc": 2} msg: [Errno 2] No such file or directory FATAL: all hosts have already failed -- aborting PLAY RECAP ******************************************************************** to retry, use: --limit @/root/upgrade.retry 10.14.6.120 : ok=54 changed=7 unreachable=0 failed=1 localhost : ok=13 changed=0 unreachable=0 failed=0
Another great catch. This should fix it: https://github.com/openshift/openshift-ansible/pull/1576
No sure what cause this error, attached the logs and inventory file TASK: [Determine available versions] ****************************************** changed: [oseatomic-node1.example.com] changed: [oseatomic-master1.example.com] TASK: [set_fact ] ************************************************************* ok: [oseatomic-master1.example.com] fatal: [oseatomic-node1.example.com] => Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ansible/runner/__init__.py", line 586, in _executor exec_rc = self._executor_internal(host, new_stdin) File "/usr/lib/python2.7/site-packages/ansible/runner/__init__.py", line 789, in _executor_internal return self._executor_internal_inner(host, self.module_name, self.module_args, inject, port, complex_args=complex_args) File "/usr/lib/python2.7/site-packages/ansible/runner/__init__.py", line 1013, in _executor_internal_inner complex_args = template.template(self.basedir, complex_args, inject, fail_on_undefined=self.error_on_undefined_vars) File "/usr/lib/python2.7/site-packages/ansible/utils/template.py", line 140, in template d[k] = template(basedir, v, templatevars, lookup_fatal, depth, expand_lists, convert_bare, fail_on_undefined, filter_fatal) File "/usr/lib/python2.7/site-packages/ansible/utils/template.py", line 124, in template varname = template_from_string(basedir, varname, templatevars, fail_on_undefined) File "/usr/lib/python2.7/site-packages/ansible/utils/template.py", line 382, in template_from_string res = jinja2.utils.concat(rf) File "<template>", line 9, in root File "/usr/lib64/python2.7/site-packages/yaml/__init__.py", line 93, in safe_load return load(stream, SafeLoader) File "/usr/lib64/python2.7/site-packages/yaml/__init__.py", line 71, in load return loader.get_single_data() File "/usr/lib64/python2.7/site-packages/yaml/constructor.py", line 37, in get_single_data node = self.get_single_node() File "/usr/lib64/python2.7/site-packages/yaml/composer.py", line 36, in get_single_node document = self.compose_document() File "/usr/lib64/python2.7/site-packages/yaml/composer.py", line 55, in compose_document node = self.compose_node(None, None) File "/usr/lib64/python2.7/site-packages/yaml/composer.py", line 84, in compose_node node = self.compose_mapping_node(anchor) File "/usr/lib64/python2.7/site-packages/yaml/composer.py", line 127, in compose_mapping_node while not self.check_event(MappingEndEvent): File "/usr/lib64/python2.7/site-packages/yaml/parser.py", line 98, in check_event self.current_event = self.state() File "/usr/lib64/python2.7/site-packages/yaml/parser.py", line 428, in parse_block_mapping_key if self.check_token(KeyToken): File "/usr/lib64/python2.7/site-packages/yaml/scanner.py", line 116, in check_token self.fetch_more_tokens() File "/usr/lib64/python2.7/site-packages/yaml/scanner.py", line 220, in fetch_more_tokens return self.fetch_value() File "/usr/lib64/python2.7/site-packages/yaml/scanner.py", line 580, in fetch_value self.get_mark()) ScannerError: mapping values are not allowed here in "<string>", line 1, column 39: ... response from daemon: no such id: atomic-openshift-node [root@anli config]# cat oseatomic [OSEv3:children] masters nodes [OSEv3:vars] ansible_ssh_user=root openshift_use_openshift_sdn=true deployment_type=openshift-enterprise osm_default_subdomain=miniaomic.example.com openshift_master_identity_providers=[{'name': 'allow_all', 'login': 'true', 'challenge': 'true', 'kind': 'AllowAllPasswordIdentityProvider'}] [masters] oseatomic-master1.example.com [nodes] oseatomic-master1.example.com openshift_node_labels="{'region': 'infra', 'zone': 'default'}" openshift_hostname=oseatomic-master1.example.com openshift_public_hostname=oseatomic-master1.example.com openshift_schedulable=true oseatomic-node1.example.com openshift_node_labels="{'region': 'primary', 'zone': 'west'}" openshift_hostname=oseatomic-node1.example.com openshift_public_hostname=oseatomic-node1.example.com ^
https://github.com/openshift/openshift-ansible/pull/1599
For this bug and Bug #1315563 it seems your node container wasn't running. I've added an additional check to start the node in case it wasn't running for some reason. I'm not convinced this will be the only error checking we'll need to add bug we definitely need to figure out why you atomic-openshift-node container couldn't be found.
It works well now. So move bug to verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2016:1064