Description of problem: I'm running the first step to upgrade overcloud nodes, from L->M . I'm using 3 baremetal controllers, 1 baremetal compute. After performing first step: openstack overcloud deploy --force-postconfig --control-scale 3 --compute-scale 1 --templates -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml -e /home/stack/network-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-aodh.yaml --ntp-server 135.248.16.241 I'm getting this failure: "status": "FAILED", "server_id": "413b675e-2106-443d-9719-08e292196672", "config_id": "3306a226-7a45-46ed-8d2e-142ec8b4157c", "output_values": { "deploy_stdout": " Clone Set: openstack-ceilometer-alarm-notifier-clone [openstack-ceilometer-alarm-notifier]\n Clone Set: openstack-ceilometer-alarm-evaluator-clone [openstack-ceilometer-alarm-evaluator]\nERROR: cluster finished transition but openstack-ceilometer-alarm-evaluator was not in stopped state, exiting.\n", "deploy_stderr": "", "deploy_status_code": 1 }, "creation_time": "2017-06-23T13:05:51", "updated_time": "2017-06-23T13:06:34", "input_values": {}, "action": "CREATE", "status_reason": "deploy_status_code : Deployment exited with non-zero status code: 1", On the controller, i can see the ceilometer resources are not started: [heat-admin@overcloud-controller-2 ~]$ sudo pcs status Cluster name: tripleo_cluster WARNING: no stonith devices and stonith-enabled is not false Stack: corosync Current DC: overcloud-controller-0 (version 1.1.15-11.el7_3.4-e174ec8) - partition with quorum Last updated: Fri Jun 23 13:40:36 2017 Last change: Fri Jun 23 13:06:31 2017 by root via crm_resource on overcloud-controller-0 3 nodes and 112 resources configured: 3 resources DISABLED and 0 BLOCKED from being started due to failures Online: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Full list of resources: ip-172.17.1.10 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0 ip-172.17.4.10 (ocf::heartbeat:IPaddr2): Started overcloud-controller-1 Clone Set: haproxy-clone [haproxy] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] ip-10.5.195.91 (ocf::heartbeat:IPaddr2): Started overcloud-controller-2 ip-172.17.3.10 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0 ip-172.31.255.111 (ocf::heartbeat:IPaddr2): Started overcloud-controller-1 ip-172.17.1.11 (ocf::heartbeat:IPaddr2): Started overcloud-controller-2 Master/Slave Set: redis-master [redis] Masters: [ overcloud-controller-0 ] Slaves: [ overcloud-controller-1 overcloud-controller-2 ] Master/Slave Set: galera-master [galera] Masters: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: mongod-clone [mongod] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: rabbitmq-clone [rabbitmq] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: memcached-clone [memcached] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-nova-scheduler-clone [openstack-nova-scheduler] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: neutron-l3-agent-clone [neutron-l3-agent] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-ceilometer-alarm-notifier-clone [openstack-ceilometer-alarm-notifier] openstack-ceilometer-alarm-notifier (systemd:openstack-ceilometer-alarm-notifier): FAILED overcloud-controller-1 openstack-ceilometer-alarm-notifier (systemd:openstack-ceilometer-alarm-notifier): FAILED overcloud-controller-2 Started: [ overcloud-controller-0 ] Clone Set: openstack-heat-engine-clone [openstack-heat-engine] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-ceilometer-api-clone [openstack-ceilometer-api] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: neutron-metadata-agent-clone [neutron-metadata-agent] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: neutron-ovs-cleanup-clone [neutron-ovs-cleanup] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: neutron-netns-cleanup-clone [neutron-netns-cleanup] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-heat-api-clone [openstack-heat-api] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-cinder-scheduler-clone [openstack-cinder-scheduler] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-nova-api-clone [openstack-nova-api] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-heat-api-cloudwatch-clone [openstack-heat-api-cloudwatch] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-ceilometer-collector-clone [openstack-ceilometer-collector] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-ceilometer-notification-clone [openstack-ceilometer-notification] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: neutron-dhcp-agent-clone [neutron-dhcp-agent] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-glance-api-clone [openstack-glance-api] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: neutron-openvswitch-agent-clone [neutron-openvswitch-agent] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-nova-novncproxy-clone [openstack-nova-novncproxy] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: delay-clone [delay] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: httpd-clone [httpd] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-keystone-clone [openstack-keystone] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-nova-consoleauth-clone [openstack-nova-consoleauth] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-glance-registry-clone [openstack-glance-registry] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-cinder-api-clone [openstack-cinder-api] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-ceilometer-central-clone [openstack-ceilometer-central] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: neutron-server-clone [neutron-server] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-ceilometer-alarm-evaluator-clone [openstack-ceilometer-alarm-evaluator] openstack-ceilometer-alarm-evaluator (systemd:openstack-ceilometer-alarm-evaluator): FAILED overcloud-controller-1 (disabled) openstack-ceilometer-alarm-evaluator (systemd:openstack-ceilometer-alarm-evaluator): FAILED overcloud-controller-2 (disabled) Started: [ overcloud-controller-0 ] Clone Set: openstack-heat-api-cfn-clone [openstack-heat-api-cfn] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] openstack-cinder-volume (systemd:openstack-cinder-volume): Started overcloud-controller-0 Clone Set: openstack-nova-conductor-clone [openstack-nova-conductor] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Failed Actions: * openstack-ceilometer-alarm-notifier_monitor_60000 on overcloud-controller-1 'not running' (7): call=488, status=complete, exitreason='none', last-rc-change='Fri Jun 23 13:07:30 2017', queued=0ms, exec=0ms * openstack-ceilometer-alarm-evaluator_monitor_60000 on overcloud-controller-1 'not running' (7): call=486, status=complete, exitreason='none', last-rc-change='Fri Jun 23 13:07:28 2017', queued=0ms, exec=0ms * openstack-ceilometer-alarm-evaluator_monitor_60000 on overcloud-controller-2 'not running' (7): call=507, status=complete, exitreason='none', last-rc-change='Fri Jun 23 13:07:28 2017', queued=0ms, exec=0ms * openstack-ceilometer-alarm-notifier_monitor_60000 on overcloud-controller-2 'not running' (7): call=509, status=complete, exitreason='none', last-rc-change='Fri Jun 23 13:07:30 2017', queued=0ms, exec=0ms Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled sudo systemctl | grep openstack-ceilometer openstack-ceilometer-api.service loaded active running Cluster Controlled openstack-ceilometer-api openstack-ceilometer-central.service loaded active running Cluster Controlled openstack-ceilometer-central openstack-ceilometer-collector.service loaded active running Cluster Controlled openstack-ceilometer-collector openstack-ceilometer-notification.service loaded active running Cluster Controlled openstack-ceilometer-notification
Created attachment 1291099 [details] os-collect-config from controller02
Created attachment 1291106 [details] os-collect-config from controller0
Created attachment 1291108 [details] deployed.json on controller00
Created attachment 1291135 [details] sosreport from controller00
oki, so we may have a race condition here that leads to failure to disable the openstack-ceilometer-alarm-evaluator resource. During the switch to aodh we remove openstack-ceilometer-alarm package on all three controller nodes. If the pkg removal happens on any of the non-bootstrap controller before the pcs resource is disabled, this operation will fail on the non-bootstrap node where the package has been remove (as it can't stop).
Thanks for the patch! After applying it manually to the undercloud, i could successfully perform the aodh upgrade step.