Bug 1323218

Summary: Upgrade failed to import image-streams on nativeha env
Product: OpenShift Container Platform Reporter: Anping Li <anli>
Component: Cluster Version OperatorAssignee: Brenton Leanhardt <bleanhar>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Anping Li <anli>
Severity: low Docs Contact:
Priority: medium    
Version: 3.2.0CC: anli, aos-bugs, bleanhar, jokerman, mmccomas
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-05-12 15:23:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
Upgrade failed to import images stream none

Description Anping Li 2016-04-01 14:54:58 UTC
Description of problem:
Upgrade failed to import image-streams on containerized nativeha env. The image files can be found.

#ll /usr/share/openshift/examples/image-streams/image-streams-rhel7.json
-rw-r--r--. 1 root root 14085 Apr  1 22:33 /usr/share/openshift/examples/image-streams/image-streams-rhel7.json
#oc v3.1.1.6-33-g81eabcc
kubernetes v1.1.0-origin-1107-g4c8e6f4


Version-Release number of selected component (if applicable):
atomic-openshift-utils-3.0.69-1.git.0.c818db9.el7.noarch


How reproducible:
always

Steps to Reproduce:
1.Install nativeha containerized OSE 3.1 on RHEL
2.Upgrade to OSE 3.2
3.

Actual results:

TASK: [openshift_examples | Import RHEL streams] ******************************
<ha2-master2.example.com> ESTABLISH CONNECTION FOR USER: root
<ha2-master2.example.com> REMOTE_MODULE command oc create -n openshift -f /usr/share/openshift/examples/image-streams/image-streams-rhel7.json
<ha2-master2.example.com> EXEC ssh -C -tt -v -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/root/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 ha2-master2.example.com /bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1459521200.13-23709753680840 && echo $HOME/.ansible/tmp/ansible-tmp-1459521200.13-23709753680840'
<ha2-master2.example.com> PUT /tmp/tmpf4VYgx TO /root/.ansible/tmp/ansible-tmp-1459521200.13-23709753680840/command
<ha2-master2.example.com> EXEC ssh -C -tt -v -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/root/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 ha2-master2.example.com /bin/sh -c 'LANG=C LC_CTYPE=C /usr/bin/python /root/.ansible/tmp/ansible-tmp-1459521200.13-23709753680840/command; rm -rf /root/.ansible/tmp/ansible-tmp-1459521200.13-23709753680840/ >/dev/null 2>&1'
failed: [ha2-master2.example.com] => {"changed": false, "cmd": ["oc", "create", "-n", "openshift", "-f", "/usr/share/openshift/examples/image-streams/image-streams-rhel7.json"], "delta": "0:00:11.891157", "end": "2016-04-01 22:33:31.599988", "failed": true, "failed_when_result": true, "rc": 1, "start": "2016-04-01 22:33:19.708831", "stdout_lines": [], "warnings": []}
stderr:
================================================================================
ATTENTION: You are running oc via a wrapper around 'docker run openshift3/ose'.
This wrapper is intended only to be used to bootstrap an environment. Please
install client tools on another host once you have granted cluster-admin
privileges to a user.
See https://docs.openshift.com/enterprise/latest/cli_reference/get_started_cli.html
=================================================================================

the path "/usr/share/openshift/examples/image-streams/image-streams-rhel7.json" does not exist

FATAL: all hosts have already failed -- aborting

PLAY RECAP ********************************************************************
           to retry, use: --limit @/root/upgrade.retry

ha2-master.example.com     : ok=6    changed=1    unreachable=0    failed=0
ha2-master1.example.com    : ok=97   changed=16   unreachable=0    failed=0
ha2-master2.example.com    : ok=216  changed=35   unreachable=0    failed=1
ha2-master3.example.com    : ok=181  changed=31   unreachable=0    failed=0
ha2-node1.example.com      : ok=84   changed=13   unreachable=0    failed=0
ha2-node2.example.com      : ok=84   changed=13   unreachable=0    failed=0
localhost                  : ok=41   changed=0    unreachable=0    failed=0

Expected results:


Additional info:

Comment 1 Brenton Leanhardt 2016-04-04 14:55:24 UTC
I'm not really sure how this happened.  Can you upload your inventory and the entire ansible log from the run?

I see that ha2-master2.example.com is where the job failed.  The way this works is that the example files are copied only to the first master and then the oc commands run there.  I'm worried that somehow things are confused in your environment and the example files were uploaded to master1 yet the commands ran on master2.

I've been running all my local tests in a multi-master environment so I'm betting your hitting some sort of edge case.

Comment 2 Jason DeTiberus 2016-04-05 02:11:30 UTC
Looking at the output (and that ha2-master2) has the largest number of tasks, it looks like ha2-master2 is the host that was considered oo_first_master for the run.

That said, I agree that it looks like it might be an inventory related issue, or possibly another issue during the run. Could you also include the full log output of the ansible run as well?

Comment 3 Anping Li 2016-04-05 13:21:15 UTC
Created attachment 1143838 [details]
Upgrade failed to import images stream

I commend the first master in inventory. But I don't think that is the root cause.  

[root@anli config]# cat hostnative
[OSEv3:children]
masters
nodes
etcd
lb
nfs

[OSEv3:vars]
ansible_ssh_user=root
openshift_use_openshift_sdn=true
deployment_type=openshift-enterprise
osm_default_subdomain=ha2.example.com
openshift_master_identity_providers=[{'name': 'allow_all', 'login': 'true', 'challenge': 'true', 'kind': 'AllowAllPasswordIdentityProvider'}]
openshift_set_hostname=True
os_sdn_network_plugin_name=redhat/openshift-ovs-multitenant


cli_docker_additional_registries=virt-openshift-05.lab.eng.nay.redhat.com:5000
cli_docker_insecure_registries=virt-openshift-05.lab.eng.nay.redhat.com:5000
openshift_docker_additional_registries=virt-openshift-05.lab.eng.nay.redhat.com:5000
openshift_docker_insecure_registries=virt-openshift-05.lab.eng.nay.redhat.com:5000
#openshift_rolling_restart_mode=system

openshift_hosted_registry_storage_kind=nfs
openshift_hosted_registry_storage_nfs_directory=/var/export/
openshift_hosted_registry_storage_nfs_options='*(rw,sync,all_squash)'
openshift_hosted_registry_storage_volume_name=registry
openshift_hosted_registry_storage_volume_size=2G

openshift_master_cluster_method=native
openshift_master_cluster_hostname=ha2-master.example.com
openshift_master_cluster_public_hostname=ha2-master.example.com

[masters]
#ha2-master1.example.com 
ha2-master2.example.com
ha2-master3.example.com

[etcd]
ha2-master1.example.com
ha2-master2.example.com
ha2-master3.example.com

[nodes]
ha2-master1.example.com  openshift_node_labels="{'region': 'idle', 'zone': 'default'}" openshift_hostname=ha2-master1.example.com openshift_public_hostname=ha2-master1.example.com
ha2-master2.example.com  openshift_node_labels="{'region': 'infra', 'zone': 'default'}" openshift_hostname=ha2-master2.example.com openshift_public_hostname=ha2-master2.example.com openshift_schedulable=true
ha2-master3.example.com  openshift_node_labels="{'region': 'infra', 'zone': 'default'}" openshift_hostname=ha2-master3.example.com openshift_public_hostname=ha2-master3.example.com openshift_schedulable=true
ha2-node1.example.com  openshift_node_labels="{'region': 'primary', 'zone': 'west'}" openshift_hostname=ha2-node1.example.com openshift_public_hostname=ha2-node1.example.com
ha2-node2.example.com  openshift_node_labels="{'region': 'primary', 'zone': 'east'}" openshift_hostname=ha2-node2.example.com openshift_public_hostname=ha2-node2.example.com

[lb]
ha2-master.example.com

[nfs]
ha2-master1.example.co

Comment 4 Anping Li 2016-04-05 13:24:49 UTC
I had tried "oc create -f /usr/share/openshift/examples/image-streams/image-streams-rhel7.json" manually and got same error. Unfortunately, The env wasn't kept. not sure what happened.

Comment 5 Brenton Leanhardt 2016-04-12 21:15:05 UTC
I plan to investigate this more tomorrow.  Is this still happening?  I've never seen it happen. I've installed dozens of multi-master and all-in-one environments in the last week so it's a bit of a mystery right now.

Comment 6 Anping Li 2016-04-13 10:56:46 UTC
Never hit again, I downgrade the severity. it is OK for to close it if we can't find the root cause within a short time.