Hide Forgot
Description of problem: Upgrade failed to import image-streams on containerized nativeha env. The image files can be found. #ll /usr/share/openshift/examples/image-streams/image-streams-rhel7.json -rw-r--r--. 1 root root 14085 Apr 1 22:33 /usr/share/openshift/examples/image-streams/image-streams-rhel7.json #oc v3.1.1.6-33-g81eabcc kubernetes v1.1.0-origin-1107-g4c8e6f4 Version-Release number of selected component (if applicable): atomic-openshift-utils-3.0.69-1.git.0.c818db9.el7.noarch How reproducible: always Steps to Reproduce: 1.Install nativeha containerized OSE 3.1 on RHEL 2.Upgrade to OSE 3.2 3. Actual results: TASK: [openshift_examples | Import RHEL streams] ****************************** <ha2-master2.example.com> ESTABLISH CONNECTION FOR USER: root <ha2-master2.example.com> REMOTE_MODULE command oc create -n openshift -f /usr/share/openshift/examples/image-streams/image-streams-rhel7.json <ha2-master2.example.com> EXEC ssh -C -tt -v -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/root/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 ha2-master2.example.com /bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1459521200.13-23709753680840 && echo $HOME/.ansible/tmp/ansible-tmp-1459521200.13-23709753680840' <ha2-master2.example.com> PUT /tmp/tmpf4VYgx TO /root/.ansible/tmp/ansible-tmp-1459521200.13-23709753680840/command <ha2-master2.example.com> EXEC ssh -C -tt -v -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/root/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 ha2-master2.example.com /bin/sh -c 'LANG=C LC_CTYPE=C /usr/bin/python /root/.ansible/tmp/ansible-tmp-1459521200.13-23709753680840/command; rm -rf /root/.ansible/tmp/ansible-tmp-1459521200.13-23709753680840/ >/dev/null 2>&1' failed: [ha2-master2.example.com] => {"changed": false, "cmd": ["oc", "create", "-n", "openshift", "-f", "/usr/share/openshift/examples/image-streams/image-streams-rhel7.json"], "delta": "0:00:11.891157", "end": "2016-04-01 22:33:31.599988", "failed": true, "failed_when_result": true, "rc": 1, "start": "2016-04-01 22:33:19.708831", "stdout_lines": [], "warnings": []} stderr: ================================================================================ ATTENTION: You are running oc via a wrapper around 'docker run openshift3/ose'. This wrapper is intended only to be used to bootstrap an environment. Please install client tools on another host once you have granted cluster-admin privileges to a user. See https://docs.openshift.com/enterprise/latest/cli_reference/get_started_cli.html ================================================================================= the path "/usr/share/openshift/examples/image-streams/image-streams-rhel7.json" does not exist FATAL: all hosts have already failed -- aborting PLAY RECAP ******************************************************************** to retry, use: --limit @/root/upgrade.retry ha2-master.example.com : ok=6 changed=1 unreachable=0 failed=0 ha2-master1.example.com : ok=97 changed=16 unreachable=0 failed=0 ha2-master2.example.com : ok=216 changed=35 unreachable=0 failed=1 ha2-master3.example.com : ok=181 changed=31 unreachable=0 failed=0 ha2-node1.example.com : ok=84 changed=13 unreachable=0 failed=0 ha2-node2.example.com : ok=84 changed=13 unreachable=0 failed=0 localhost : ok=41 changed=0 unreachable=0 failed=0 Expected results: Additional info:
I'm not really sure how this happened. Can you upload your inventory and the entire ansible log from the run? I see that ha2-master2.example.com is where the job failed. The way this works is that the example files are copied only to the first master and then the oc commands run there. I'm worried that somehow things are confused in your environment and the example files were uploaded to master1 yet the commands ran on master2. I've been running all my local tests in a multi-master environment so I'm betting your hitting some sort of edge case.
Looking at the output (and that ha2-master2) has the largest number of tasks, it looks like ha2-master2 is the host that was considered oo_first_master for the run. That said, I agree that it looks like it might be an inventory related issue, or possibly another issue during the run. Could you also include the full log output of the ansible run as well?
Created attachment 1143838 [details] Upgrade failed to import images stream I commend the first master in inventory. But I don't think that is the root cause. [root@anli config]# cat hostnative [OSEv3:children] masters nodes etcd lb nfs [OSEv3:vars] ansible_ssh_user=root openshift_use_openshift_sdn=true deployment_type=openshift-enterprise osm_default_subdomain=ha2.example.com openshift_master_identity_providers=[{'name': 'allow_all', 'login': 'true', 'challenge': 'true', 'kind': 'AllowAllPasswordIdentityProvider'}] openshift_set_hostname=True os_sdn_network_plugin_name=redhat/openshift-ovs-multitenant cli_docker_additional_registries=virt-openshift-05.lab.eng.nay.redhat.com:5000 cli_docker_insecure_registries=virt-openshift-05.lab.eng.nay.redhat.com:5000 openshift_docker_additional_registries=virt-openshift-05.lab.eng.nay.redhat.com:5000 openshift_docker_insecure_registries=virt-openshift-05.lab.eng.nay.redhat.com:5000 #openshift_rolling_restart_mode=system openshift_hosted_registry_storage_kind=nfs openshift_hosted_registry_storage_nfs_directory=/var/export/ openshift_hosted_registry_storage_nfs_options='*(rw,sync,all_squash)' openshift_hosted_registry_storage_volume_name=registry openshift_hosted_registry_storage_volume_size=2G openshift_master_cluster_method=native openshift_master_cluster_hostname=ha2-master.example.com openshift_master_cluster_public_hostname=ha2-master.example.com [masters] #ha2-master1.example.com ha2-master2.example.com ha2-master3.example.com [etcd] ha2-master1.example.com ha2-master2.example.com ha2-master3.example.com [nodes] ha2-master1.example.com openshift_node_labels="{'region': 'idle', 'zone': 'default'}" openshift_hostname=ha2-master1.example.com openshift_public_hostname=ha2-master1.example.com ha2-master2.example.com openshift_node_labels="{'region': 'infra', 'zone': 'default'}" openshift_hostname=ha2-master2.example.com openshift_public_hostname=ha2-master2.example.com openshift_schedulable=true ha2-master3.example.com openshift_node_labels="{'region': 'infra', 'zone': 'default'}" openshift_hostname=ha2-master3.example.com openshift_public_hostname=ha2-master3.example.com openshift_schedulable=true ha2-node1.example.com openshift_node_labels="{'region': 'primary', 'zone': 'west'}" openshift_hostname=ha2-node1.example.com openshift_public_hostname=ha2-node1.example.com ha2-node2.example.com openshift_node_labels="{'region': 'primary', 'zone': 'east'}" openshift_hostname=ha2-node2.example.com openshift_public_hostname=ha2-node2.example.com [lb] ha2-master.example.com [nfs] ha2-master1.example.co
I had tried "oc create -f /usr/share/openshift/examples/image-streams/image-streams-rhel7.json" manually and got same error. Unfortunately, The env wasn't kept. not sure what happened.
I plan to investigate this more tomorrow. Is this still happening? I've never seen it happen. I've installed dozens of multi-master and all-in-one environments in the last week so it's a bit of a mystery right now.
Never hit again, I downgrade the severity. it is OK for to close it if we can't find the root cause within a short time.