Description of problem: Failed to upgrade the cluster environment to v3.1 (3 masters controlled by pace marker + 3 etcd + 2 nodes ) Version-Release number of selected component (if applicable): atomic-openshift-utils-3.0.8-1.git.0.59ae79c.el7aos.noarch Steps to Reproduce: 1. setup cluster environemnt 2. upgrade the cluster environment to v3.1 Actual results: The following error messages was print <--skip ---> TASK: [Upgrade master configuration] ****************************************** skipping: [master1.example.com] fatal: [master2.example.com] => error while evaluating conditional: deployment_type in ['openshift-enterprise', 'atomic-enterprise'] and g_aos_versions.curr_version | version_compare('3.1', '>=') fatal: [master3.example.com] => error while evaluating conditional: deployment_type in ['openshift-enterprise', 'atomic-enterprise'] and g_aos_versions.curr_version | version_compare('3.1', '>=') <---skip---> <---skip---> fatal: [master1.example.com] => One or more undefined variables: 'dict object' has no attribute 'master_cert_subdir' FATAL: all hosts have already failed -- aborting PLAY RECAP ******************************************************************** to retry, use: --limit @/root/upgrade.retry localhost : ok=3 changed=0 unreachable=0 failed=0 master1.example.com : ok=43 changed=7 unreachable=1 failed=0 master2.example.com : ok=23 changed=4 unreachable=1 failed=0 master3.example.com : ok=23 changed=4 unreachable=1 failed=0 node1.example.com : ok=5 changed=0 unreachable=0 failed=0 node2.example.com : ok=5 changed=0 unreachable=0 failed=0 Expected results: Upgrade should successed Additional info::
Proposed fix is here: https://github.com/openshift/openshift-ansible/pull/870
PR #870 has been closed in favor of #839 which has been merged into master.
The upgrade failed when check 'pcs status' TASK: [openshift_master_cluster | Test if cluster is already configured] ****** fatal: [master1.example.com] => error while evaluating conditional: openshift.master.cluster_method == "pacemaker" FATAL: all hosts have already failed -- aborting PLAY RECAP ******************************************************************** to retry, use: --limit @/root/upgrade.retry localhost : ok=9 changed=0 unreachable=0 failed=0 master1.example.com : ok=65 changed=19 unreachable=1 failed=0 master2.example.com : ok=38 changed=12 unreachable=0 failed=0 master3.example.com : ok=38 changed=12 unreachable=0 failed=0 node1.example.com : ok=15 changed=3 unreachable=0 failed=0 node2.example.com : ok=15 changed=3 unreachable=0 failed=0 After then,I check the PCS status and found the following. [root@master1 ~]# pc status -bash: pc: command not found [root@master1 ~]# pcs status Error: cluster is not currently running on this node [root@master1 ~]# ps -ef|grep pcs root 622 1 0 13:19 ? 00:00:00 /bin/sh /usr/lib/pcsd/pcsd start root 681 622 0 13:19 ? 00:00:00 /bin/bash -c ulimit -S -c 0 >/dev/null 2>&1 ; /usr/bin/ruby -I/usr/lib/pcsd /usr/lib/pcsd/ssl.rb root 684 681 0 13:19 ? 00:00:01 /usr/bin/ruby -I/usr/lib/pcsd /usr/lib/pcsd/ssl.rb root 26430 26387 0 14:36 pts/0 00:00:00 grep --color=auto pcs
I'm unable to reproduce this issue with the master branch of openshift-ansible and my pacemaker HA upgrade completes successfully. Can you verify that your cluster is up and running prior to upgrade and that you have the latest ansible code? I am launching the playbook like this: ansible-playbook ~/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_0_to_v3_1/upgrade.yml Prior to beginning the upgrade 'pcs status' should indicate that the cluster is started. # pcs status Cluster name: openshift_master Last updated: Thu Nov 12 09:34:52 2015 Last change: Thu Nov 12 09:12:26 2015 by root via crm_resource on master4.example.com Stack: corosync Current DC: master5.example.com (version 1.1.13-10.el7-44eb2dd) - partition with quorum 3 nodes and 2 resources configured Online: [ master4.example.com master5.example.com master6.example.com ] Full list of resources: Resource Group: atomic-openshift-master virtual-ip (ocf::heartbeat:IPaddr2): Started master4.example.com master (systemd:atomic-openshift-master): Started master4.example.com PCSD Status: master4.example.com: Online master5.example.com: Online master6.example.com: Online Daemon Status: corosync: active/enabled pacemaker: active/enabled Additionally, ansible should not have an issue evaluating that conditional as the inventory variable is checked when verifying that the upgrade can proceed. If multiple masters are configured and openshift_master_cluster_method is not set to "pacemaker" the upgrade will fail with the following message: PLAY [Verify upgrade can proceed] ********************************************* TASK: [fail ] ***************************************************************** failed: [master4.example.com] => {"failed": true} msg: openshift_master_cluster_method must be set to 'pacemaker'
Proposed fix is here: https://github.com/openshift/openshift-ansible/pull/892
Verified and pass with latest build