Bug 1316765

Summary: Upgrade hang on native ha atomic host
Product: OpenShift Container Platform Reporter: Anping Li <anli>
Component: Cluster Version OperatorAssignee: Andrew Butcher <abutcher>
Status: CLOSED WORKSFORME QA Contact: Anping Li <anli>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.2.0CC: anli, aos-bugs, bleanhar, jokerman, mmccomas
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-03-16 10:26:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Upgrade hand when run openshift_facts none

Description Anping Li 2016-03-11 02:56:02 UTC
Created attachment 1135089 [details]
Upgrade hand when run openshift_facts

Description of problem:
Ansible playbook is running slowly on atomic host. For upgrade on a native ha environment. It hang at least 16 hours before I abort it.



Version-Release number of selected component (if applicable):
atomic-openshift-utils-3.0.55

How reproducible:
one time

Steps to Reproduce:
1. set up native ha on atomic host (three masters and 2 nodes).
2. ansible-playbook -i config/pacerhel /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_1_to_v3_2/upgrade.ym

Actual results:
It seems the playbook hang when execute openshift_facts

<master3.example.com> ESTABLISH CONNECTION FOR USER: root
<master3.example.com> REMOTE_MODULE openshift_facts role=common
<master3.example.com> EXEC ssh -C -tt -v -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/root/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 master3.example.com /bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1457602252.56-273790639350362 && echo $HOME/.ansible/tmp/ansible-tmp-1457602252.56-273790639350362'
<master3.example.com> PUT /tmp/tmplsaMof TO /root/.ansible/tmp/ansible-tmp-1457602252.56-273790639350362/openshift_facts
<master3.example.com> EXEC ssh -C -tt -v -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/root/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 master3.example.com /bin/sh -c 'LANG=C LC_CTYPE=C /usr/bin/python /root/.ansible/tmp/ansible-tmp-1457602252.56-273790639350362/openshift_facts; rm -rf /root/.ansible/tmp/ansible-tmp-1457602252.56-273790639350362/ >/dev/null 2>&1'
ok: [master3.example.com] => {"ansible_facts": {"openshift": {"common": {"admin_binary": "/usr/local/bin/oadm", "all_hostnames": ["kubernetes.default", "10.66.79.88", "kubernetes.default.svc.cluster.local", "kubernetes", "openshift.default", "openshift.default.svc", "172.30.0.1", "master.example.com", "master3.example.com", "openshift.default.svc.cluster.local", "kubernetes.default.svc", "openshift"], "cli_image": "openshift3/ose", "client_binary": "/usr/local/bin/oc", "cluster_id": "default", "config_base": "/etc/origin", "data_dir": "/var/lib/origin", "debug_level": "2", "deployment_type": "openshift-enterprise", "dns_domain": "cluster.local", "docker_additional_registries": ["registry.access.redhat.com"], "docker_blocked_registries": [], "docker_insecure_registries": [], "examples_content_version": "v1.1", "hostname": "master3.example.com", "image_tag": "v3.1.1.6", "install_examples": "True", "internal_hostnames": ["kubernetes.default", "10.66.79.88", "kubernetes.default.svc.cluster.local", "kubernetes", "openshift.default", "openshift.default.svc", "172.30.0.1", "master3.example.com", "openshift.default.svc.cluster.local", "kubernetes.default.svc", "openshift"], "ip": "10.66.79.88", "is_atomic": true, "is_containerized": "true", "public_hostname": "master3.example.com", "public_ip": "10.66.79.88", "sdn_network_plugin_name": "redhat/openshift-ovs-subnet", "service_type": "atomic-openshift", "use_cluster_metrics": false, "use_flannel": false, "use_manageiq": true, "use_nuage": false, "use_openshift_sdn": true, "version": "3.1.1.6-16-g5327e56", "version_gte_3_1_1_or_1_1_1": true, "version_gte_3_1_or_1_1": true, "version_gte_3_2_or_1_2": false}, "current_config": {"roles": ["node", "master", "etcd", "hosted"]}, "etcd": {"etcd_data_dir": "/var/lib/etcd/", "etcd_image": "registry.access.redhat.com/rhel7/etcd"}, "hosted": {"registry": {"storage": {"access_modes": ["ReadWriteMany"], "create_pv": true, "host": "nfs.example.com", "kind": "nfs", "nfs": {"directory": "/var/
export/paceatomic", "options": "*(rw,root_squash)"}, "volume": {"name": "registry", "size": "2G"}}}}, "master": {"access_token_max_seconds": 86400, "api_port": "8443", "api_url": "https://master.example.com:8443", "api_use_ssl": true, "auth_token_max_seconds": 500, "bind_addr": "0.0.0.0", "cluster_hostname": "master.example.com", "cluster_method": "native", "cluster_public_hostname": "master.example.com", "console_path": "/console", "console_port": "8443", "console_url": "https://master.example.com:8443/console", "console_use_ssl": true, "controllers_port": "8444", "debug_level": "2", "default_node_selector": "", "default_subdomain": "paceatomic.example.com", "dns_port": "53", "embedded_dns": true, "embedded_etcd": false, "embedded_kube": true, "etcd_hosts": ["master1.example.com", "master2.example.com", "master3.example.com"], "etcd_port": "2379", "etcd_urls": ["https://master1.example.com:2379", "https://master2.example.com:2379", "https://master3.example.com:2379"], "etcd_use_ssl": true, "identity_providers": [{"challenge": "true", "kind": "AllowAllPasswordIdentityProvider", "login": "true", "name": "allow_all"}], "infra_nodes": ["master2.example.com", "master3.example.com"], "loopback_api_url": "https://master3.example.com:8443", "loopback_cluster_name": "master3-example-com:8443", "loopback_context_name": "default/master3-example-com:8443/system:openshift-master", "loopback_user": "system:openshift-master/master-example-com:8443", "master_count": "3", "master_image": "openshift3/ose", "mcs_allocator_range": "s0:/2", "mcs_labels_per_project": 5, "named_certificates": [], "oauth_grant_method": "auto", "portal_net": "172.30.0.0/16", "project_request_message": "", "project_request_template": "", "public_api_url": "https://master.example.com:8443", "public_console_url": "https://master.example.com:8443/console", "registry_selector": "region=infra", "registry_url": "openshift3/ose-${component}:${version}", "router_selector": "region=infra", "sdn_cluster_network_cidr"
: "10.1.0.0/16", "sdn_host_subnet_length": "8", "session_auth_secrets": ["5eglpcd69dSK2vNP+Bvz1cwTED14kL8M"], "session_encryption_secrets": ["rp1Ly9a0MvA9D3eXSpaW8rdN0cqbP3B4"], "session_max_seconds": 3600, "session_name": "ssn", "session_secrets_file": "/etc/origin/master/session-secrets.yaml", "uid_allocator_range": "1000000000-1999999999/10000"}, "node": {"annotations": {}, "debug_level": "2", "iptables_sync_period": "5s", "labels": {"region": "infra", "zone": "default"}, "node_image": "openshift3/node", "ovs_image": "openshift3/openvswitch", "portal_net": "172.30.0.0/16", "proxy_mode": "iptables", "registry_url": "openshift3/ose-${component}:${version}", "schedulable": "true", "sdn_mtu": "1450", "set_node_ip": false, "storage_plugin_deps": ["ceph", "glusterfs", "iscsi"]}}}, "changed": false}
~        

Expected results:


Additional info:

Comment 1 Brenton Leanhardt 2016-03-15 14:16:47 UTC
Is this reproducible?  If so could to create an environment for us to debug?  I haven't see this exact problem in my testing.

Comment 2 Anping Li 2016-03-16 10:26:19 UTC
I can't reproduce this issue. I will close it. if I hit it again. I will reopen it and keep the environment for debugging.