Bug 1389040
Summary: | [OSP-Director-10] upgrade from OSP 9 to OSP 10 fails because of stopped resources on the cluster failure ( during: UPGRADE CONTROLLER AND BLOCKSTORAGE). | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | mlammon | ||||||
Component: | openstack-tripleo-heat-templates | Assignee: | Michele Baldessari <michele> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Omri Hochman <ohochman> | ||||||
Severity: | urgent | Docs Contact: | |||||||
Priority: | urgent | ||||||||
Version: | 10.0 (Newton) | CC: | dbecker, ipilcher, jcoufal, jschluet, mandreou, mburns, michele, mlammon, morazi, ohochman, rhel-osp-director-maint, sasha, sathlang | ||||||
Target Milestone: | rc | Keywords: | Triaged | ||||||
Target Release: | 10.0 (Newton) | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | openstack-tripleo-heat-templates-5.1.0-3.el7ost | Doc Type: | If docs needed, set a value | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2016-12-14 16:25:56 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
mlammon
2016-10-26 17:23:55 UTC
Additional information: [root@controller-2 ~]# pcs status Cluster name: tripleo_cluster Stack: corosync Current DC: controller-0 (version 1.1.15-11.el7-e174ec8) - partition with quorum Last updated: Wed Oct 26 15:59:01 2016 Last change: Wed Oct 26 09:02:07 2016 by root via cibadmin on controller-0 3 nodes and 124 resources configured Online: [ controller-0 controller-1 controller-2 ] Full list of resources: ip-172.17.1.10 (ocf::heartbeat:IPaddr2): Started controller-0 ip-192.0.2.6 (ocf::heartbeat:IPaddr2): Started controller-1 ip-172.17.4.10 (ocf::heartbeat:IPaddr2): Started controller-2 Clone Set: haproxy-clone [haproxy] Started: [ controller-0 controller-1 controller-2 ] Master/Slave Set: galera-master [galera] Masters: [ controller-0 controller-1 controller-2 ] Clone Set: memcached-clone [memcached] Started: [ controller-0 controller-1 controller-2 ] ip-172.17.3.10 (ocf::heartbeat:IPaddr2): Started controller-0 ip-10.0.0.101 (ocf::heartbeat:IPaddr2): Started controller-1 Clone Set: rabbitmq-clone [rabbitmq] Started: [ controller-0 controller-1 controller-2 ] Clone Set: openstack-core-clone [openstack-core] Started: [ controller-0 controller-1 controller-2 ] Master/Slave Set: redis-master [redis] Masters: [ controller-1 ] Slaves: [ controller-0 controller-2 ] ip-172.17.1.11 (ocf::heartbeat:IPaddr2): Started controller-2 Clone Set: mongod-clone [mongod] Started: [ controller-0 controller-1 controller-2 ] Clone Set: openstack-aodh-evaluator-clone [openstack-aodh-evaluator] Started: [ controller-0 controller-1 controller-2 ] Clone Set: openstack-nova-scheduler-clone [openstack-nova-scheduler] Started: [ controller-0 controller-1 controller-2 ] Clone Set: neutron-l3-agent-clone [neutron-l3-agent] Started: [ controller-0 controller-1 controller-2 ] Clone Set: neutron-netns-cleanup-clone [neutron-netns-cleanup] Started: [ controller-0 controller-1 controller-2 ] Clone Set: neutron-ovs-cleanup-clone [neutron-ovs-cleanup] Started: [ controller-0 controller-1 controller-2 ] openstack-cinder-volume (systemd:openstack-cinder-volume): Started controller-0 Clone Set: openstack-heat-engine-clone [openstack-heat-engine] Started: [ controller-0 controller-1 controller-2 ] Clone Set: openstack-aodh-listener-clone [openstack-aodh-listener] Started: [ controller-0 controller-1 controller-2 ] Clone Set: neutron-metadata-agent-clone [neutron-metadata-agent] Started: [ controller-0 controller-1 controller-2 ] Clone Set: openstack-gnocchi-metricd-clone [openstack-gnocchi-metricd] Started: [ controller-0 controller-1 controller-2 ] Clone Set: openstack-aodh-notifier-clone [openstack-aodh-notifier] Started: [ controller-0 controller-1 controller-2 ] Clone Set: openstack-heat-api-clone [openstack-heat-api] Started: [ controller-0 controller-1 controller-2 ] Clone Set: openstack-ceilometer-collector-clone [openstack-ceilometer-collector] Started: [ controller-0 controller-1 controller-2 ] Clone Set: openstack-glance-api-clone [openstack-glance-api] Started: [ controller-0 controller-1 controller-2 ] Clone Set: openstack-cinder-scheduler-clone [openstack-cinder-scheduler] Started: [ controller-0 controller-1 controller-2 ] Clone Set: openstack-nova-api-clone [openstack-nova-api] Started: [ controller-0 controller-1 controller-2 ] Clone Set: openstack-nova-consoleauth-clone [openstack-nova-consoleauth] Started: [ controller-0 controller-1 controller-2 ] Clone Set: openstack-sahara-api-clone [openstack-sahara-api] Started: [ controller-0 controller-1 controller-2 ] Clone Set: openstack-heat-api-cloudwatch-clone [openstack-heat-api-cloudwatch] Started: [ controller-0 controller-1 controller-2 ] Clone Set: openstack-sahara-engine-clone [openstack-sahara-engine] Started: [ controller-0 controller-1 controller-2 ] Clone Set: openstack-glance-registry-clone [openstack-glance-registry] Started: [ controller-0 controller-1 controller-2 ] Clone Set: openstack-gnocchi-statsd-clone [openstack-gnocchi-statsd] Started: [ controller-0 controller-1 controller-2 ] Clone Set: openstack-ceilometer-notification-clone [openstack-ceilometer-notification] Started: [ controller-0 controller-1 controller-2 ] Clone Set: openstack-cinder-api-clone [openstack-cinder-api] Started: [ controller-0 controller-1 controller-2 ] Clone Set: neutron-dhcp-agent-clone [neutron-dhcp-agent] Started: [ controller-0 controller-1 controller-2 ] Clone Set: neutron-openvswitch-agent-clone [neutron-openvswitch-agent] Started: [ controller-0 controller-1 controller-2 ] Clone Set: openstack-nova-novncproxy-clone [openstack-nova-novncproxy] Started: [ controller-0 controller-1 controller-2 ] Clone Set: delay-clone [delay] Started: [ controller-0 controller-1 controller-2 ] Clone Set: neutron-server-clone [neutron-server] Started: [ controller-0 controller-1 controller-2 ] Clone Set: openstack-ceilometer-central-clone [openstack-ceilometer-central] Started: [ controller-0 controller-1 controller-2 ] Clone Set: httpd-clone [httpd] Started: [ controller-0 controller-1 controller-2 ] Clone Set: openstack-heat-api-cfn-clone [openstack-heat-api-cfn] Started: [ controller-0 controller-1 controller-2 ] Clone Set: openstack-nova-conductor-clone [openstack-nova-conductor] Started: [ controller-0 controller-1 controller-2 ] Failed Actions: * memcached_monitor_60000 on controller-2 'not running' (7): call=93, status=complete, exitreason='none', last-rc-change='Wed Oct 26 09:38:23 2016', queued=0ms, exec=0ms * mongod_monitor_60000 on controller-2 'not running' (7): call=244, status=complete, exitreason='none', last-rc-change='Wed Oct 26 09:38:13 2016', queued=0ms, exec=0ms * openstack-aodh-evaluator_monitor_60000 on controller-2 'not running' (7): call=333, status=complete, exitreason='none', last-rc-change='Wed Oct 26 09:39:27 2016', queued=0ms, exec=0ms * openstack-heat-api_monitor_60000 on controller-2 'not running' (7): call=278, status=complete, exitreason='none', last-rc-change='Wed Oct 26 09:39:40 2016', queued=0ms, exec=0ms * openstack-nova-api_monitor_60000 on controller-2 'not running' (7): call=305, status=complete, exitreason='none', last-rc-change='Wed Oct 26 09:39:42 2016', queued=0ms, exec=0ms * openstack-nova-consoleauth_monitor_60000 on controller-2 'not running' (7): call=295, status=complete, exitreason='none', last-rc-change='Wed Oct 26 09:39:57 2016', queued=0ms, exec=0ms * openstack-ceilometer-notification_monitor_60000 on controller-2 'not running' (7): call=270, status=complete, exitreason='none', last-rc-change='Wed Oct 26 09:39:36 2016', queued=0ms, exec=0ms * neutron-openvswitch-agent_monitor_60000 on controller-2 'OCF_PENDING' (196): call=310, status=complete, exitreason='none', last-rc-change='Wed Oct 26 09:38:01 2016', queued=0ms, exec=0ms * openstack-nova-novncproxy_monitor_60000 on controller-2 'not running' (7): call=302, status=complete, exitreason='none', last-rc-change='Wed Oct 26 09:39:24 2016', queued=0ms, exec=0ms * neutron-server_monitor_60000 on controller-2 'not running' (7): call=303, status=complete, exitreason='none', last-rc-change='Wed Oct 26 09:38:24 2016', queued=0ms, exec=0ms * memcached_monitor_60000 on controller-1 'not running' (7): call=83, status=complete, exitreason='none', last-rc-change='Wed Oct 26 09:38:24 2016', queued=0ms, exec=0ms * mongod_monitor_60000 on controller-1 'not running' (7): call=240, status=complete, exitreason='none', last-rc-change='Wed Oct 26 09:38:27 2016', queued=0ms, exec=0ms * openstack-aodh-evaluator_monitor_60000 on controller-1 'not running' (7): call=337, status=complete, exitreason='none', last-rc-change='Wed Oct 26 09:39:28 2016', queued=0ms, exec=0ms * openstack-aodh-listener_monitor_60000 on controller-1 'not running' (7): call=340, status=complete, exitreason='none', last-rc-change='Wed Oct 26 09:39:30 2016', queued=0ms, exec=0ms * openstack-aodh-notifier_monitor_60000 on controller-1 'not running' (7): call=341, status=complete, exitreason='none', last-rc-change='Wed Oct 26 09:39:30 2016', queued=0ms, exec=0ms * openstack-heat-api_monitor_60000 on controller-1 'not running' (7): call=284, status=complete, exitreason='none', last-rc-change='Wed Oct 26 09:39:39 2016', queued=0ms, exec=0ms * openstack-nova-api_monitor_60000 on controller-1 'OCF_PENDING' (196): call=309, status=complete, exitreason='none', last-rc-change='Wed Oct 26 09:39:36 2016', queued=0ms, exec=0ms * openstack-ceilometer-notification_monitor_60000 on controller-1 'not running' (7): call=274, status=complete, exitreason='none', last-rc-change='Wed Oct 26 09:39:36 2016', queued=0ms, exec=0ms * neutron-dhcp-agent_monitor_60000 on controller-1 'not running' (7): call=316, status=complete, exitreason='none', last-rc-change='Wed Oct 26 09:39:07 2016', queued=0ms, exec=0ms * neutron-openvswitch-agent_monitor_60000 on controller-1 'not running' (7): call=314, status=complete, exitreason='none', last-rc-change='Wed Oct 26 09:39:00 2016', queued=0ms, exec=0ms * neutron-server_monitor_60000 on controller-1 'not running' (7): call=305, status=complete, exitreason='none', last-rc-change='Wed Oct 26 09:39:18 2016', queued=0ms, exec=0ms Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled This was the pcs status just before we "START UPGRADE CONTROLLER AND BLOCKSTORAGE" step take from console of jenkins job 09:18:39 09:18:39 Stack overcloud UPDATE_COMPLETE 09:18:39 09:18:39 Overcloud Endpoint: http://10.0.0.101:5000/v2.0 09:18:39 Overcloud Deployed 09:18:39 clean_up DeployOvercloud: 09:18:39 END return value: 0 09:18:42 Cluster name: tripleo_cluster 09:18:42 Stack: corosync 09:18:42 Current DC: controller-0 (version 1.1.15-11.el7-e174ec8) - partition with quorum 09:18:42 Last updated: Wed Oct 26 09:18:41 2016 Last change: Wed Oct 26 09:02:07 2016 by root via cibadmin on controller-0 09:18:42 09:18:42 3 nodes and 124 resources configured 09:18:42 09:18:42 Online: [ controller-0 controller-1 controller-2 ] 09:18:42 09:18:42 Full list of resources: 09:18:42 09:18:42 ip-172.17.1.10 (ocf::heartbeat:IPaddr2): Started controller-0 09:18:42 ip-192.0.2.6 (ocf::heartbeat:IPaddr2): Started controller-1 09:18:42 ip-172.17.4.10 (ocf::heartbeat:IPaddr2): Started controller-2 09:18:42 Clone Set: haproxy-clone [haproxy] 09:18:42 Started: [ controller-0 controller-1 controller-2 ] 09:18:42 Master/Slave Set: galera-master [galera] 09:18:42 Masters: [ controller-0 controller-1 controller-2 ] 09:18:42 Clone Set: memcached-clone [memcached] 09:18:42 Started: [ controller-0 controller-1 controller-2 ] 09:18:42 ip-172.17.3.10 (ocf::heartbeat:IPaddr2): Started controller-0 09:18:42 ip-10.0.0.101 (ocf::heartbeat:IPaddr2): Started controller-1 09:18:42 Clone Set: rabbitmq-clone [rabbitmq] 09:18:42 Started: [ controller-0 controller-1 controller-2 ] 09:18:42 Clone Set: openstack-core-clone [openstack-core] 09:18:42 Started: [ controller-0 controller-1 controller-2 ] 09:18:42 Master/Slave Set: redis-master [redis] 09:18:42 Masters: [ controller-1 ] 09:18:42 Slaves: [ controller-0 controller-2 ] 09:18:42 ip-172.17.1.11 (ocf::heartbeat:IPaddr2): Started controller-2 09:18:42 Clone Set: mongod-clone [mongod] 09:18:42 Started: [ controller-0 controller-1 controller-2 ] 09:18:42 Clone Set: openstack-aodh-evaluator-clone [openstack-aodh-evaluator] 09:18:42 Started: [ controller-0 controller-1 controller-2 ] 09:18:42 Clone Set: openstack-nova-scheduler-clone [openstack-nova-scheduler] 09:18:42 Started: [ controller-0 controller-1 controller-2 ] 09:18:42 Clone Set: neutron-l3-agent-clone [neutron-l3-agent] 09:18:42 Started: [ controller-0 controller-1 controller-2 ] 09:18:42 Clone Set: neutron-netns-cleanup-clone [neutron-netns-cleanup] 09:18:42 Started: [ controller-0 controller-1 controller-2 ] 09:18:42 Clone Set: neutron-ovs-cleanup-clone [neutron-ovs-cleanup] 09:18:42 Started: [ controller-0 controller-1 controller-2 ] 09:18:42 openstack-cinder-volume (systemd:openstack-cinder-volume): Started controller-0 09:18:42 Clone Set: openstack-heat-engine-clone [openstack-heat-engine] 09:18:42 Started: [ controller-0 controller-1 controller-2 ] 09:18:42 Clone Set: openstack-aodh-listener-clone [openstack-aodh-listener] 09:18:42 Started: [ controller-0 controller-1 controller-2 ] 09:18:42 Clone Set: neutron-metadata-agent-clone [neutron-metadata-agent] 09:18:42 Started: [ controller-0 controller-1 controller-2 ] 09:18:42 Clone Set: openstack-gnocchi-metricd-clone [openstack-gnocchi-metricd] 09:18:42 Started: [ controller-0 controller-1 controller-2 ] 09:18:42 Clone Set: openstack-aodh-notifier-clone [openstack-aodh-notifier] 09:18:42 Started: [ controller-0 controller-1 controller-2 ] 09:18:42 Clone Set: openstack-heat-api-clone [openstack-heat-api] 09:18:42 Started: [ controller-0 controller-1 controller-2 ] 09:18:42 Clone Set: openstack-ceilometer-collector-clone [openstack-ceilometer-collector] 09:18:42 Started: [ controller-0 controller-1 controller-2 ] 09:18:42 Clone Set: openstack-glance-api-clone [openstack-glance-api] 09:18:42 Started: [ controller-0 controller-1 controller-2 ] 09:18:42 Clone Set: openstack-cinder-scheduler-clone [openstack-cinder-scheduler] 09:18:42 Started: [ controller-0 controller-1 controller-2 ] 09:18:42 Clone Set: openstack-nova-api-clone [openstack-nova-api] 09:18:42 Started: [ controller-0 controller-1 controller-2 ] 09:18:42 Clone Set: openstack-nova-consoleauth-clone [openstack-nova-consoleauth] 09:18:42 Started: [ controller-0 controller-1 controller-2 ] 09:18:42 Clone Set: openstack-sahara-api-clone [openstack-sahara-api] 09:18:42 Started: [ controller-0 controller-1 controller-2 ] 09:18:42 Clone Set: openstack-heat-api-cloudwatch-clone [openstack-heat-api-cloudwatch] 09:18:42 Started: [ controller-0 controller-1 controller-2 ] 09:18:42 Clone Set: openstack-sahara-engine-clone [openstack-sahara-engine] 09:18:42 Started: [ controller-0 controller-1 controller-2 ] 09:18:42 Clone Set: openstack-glance-registry-clone [openstack-glance-registry] 09:18:42 Started: [ controller-0 controller-1 controller-2 ] 09:18:42 Clone Set: openstack-gnocchi-statsd-clone [openstack-gnocchi-statsd] 09:18:42 Started: [ controller-0 controller-1 controller-2 ] 09:18:42 Clone Set: openstack-ceilometer-notification-clone [openstack-ceilometer-notification] 09:18:42 Started: [ controller-0 controller-1 controller-2 ] 09:18:42 Clone Set: openstack-cinder-api-clone [openstack-cinder-api] 09:18:42 Started: [ controller-0 controller-1 controller-2 ] 09:18:42 Clone Set: neutron-dhcp-agent-clone [neutron-dhcp-agent] 09:18:42 Started: [ controller-0 controller-1 controller-2 ] 09:18:42 Clone Set: neutron-openvswitch-agent-clone [neutron-openvswitch-agent] 09:18:42 Started: [ controller-0 controller-1 controller-2 ] 09:18:42 Clone Set: openstack-nova-novncproxy-clone [openstack-nova-novncproxy] 09:18:42 Started: [ controller-0 controller-1 controller-2 ] 09:18:42 Clone Set: delay-clone [delay] 09:18:42 Started: [ controller-0 controller-1 controller-2 ] 09:18:42 Clone Set: neutron-server-clone [neutron-server] 09:18:42 Started: [ controller-0 controller-1 controller-2 ] 09:18:42 Clone Set: openstack-ceilometer-central-clone [openstack-ceilometer-central] 09:18:42 Started: [ controller-0 controller-1 controller-2 ] 09:18:42 Clone Set: httpd-clone [httpd] 09:18:42 Started: [ controller-0 controller-1 controller-2 ] 09:18:42 Clone Set: openstack-heat-api-cfn-clone [openstack-heat-api-cfn] 09:18:42 Started: [ controller-0 controller-1 controller-2 ] 09:18:42 Clone Set: openstack-nova-conductor-clone [openstack-nova-conductor] 09:18:42 Started: [ controller-0 controller-1 controller-2 ] 09:18:42 09:18:42 Daemon Status: 09:18:42 corosync: active/enabled 09:18:42 pacemaker: active/enabled 09:18:42 pcsd: active/enabled 09:18:46 Checking stack status 09:18:47 WARNING (shell) "heat stack-list" is deprecated, please use "openstack stack list" instead 09:18:47 +--------------------------------------+------------+-----------------+----------------------+----------------------+ 09:18:47 | id | stack_name | stack_status | creation_time | updated_time | 09:18:47 +--------------------------------------+------------+-----------------+----------------------+----------------------+ 09:18:47 | ee358f54-d537-4fed-be1b-615f50013a07 | overcloud | UPDATE_COMPLETE | 2016-10-26T06:28:33Z | 2016-10-26T09:06:42Z | 09:18:47 +--------------------------------------+------------+-----------------+----------------------+----------------------+ 09:18:48 ### FINISH INIT COMMAND ### 09:18:51 ### START UPGRADE CONTROLLER AND BLOCKSTORAGE ### From checking the logs of the job - there were no issues or failed pcs_resources before the starting of 'UPGRADE CONTROLLER AND BLOCKSTORAGE ' . the pcs issue began during the step. the original jenkins-job console can be found here : https://rhos-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/Director/view/9.0/job/infrared_deploy_9.0_3_control_1_compute_1ceph_no-UCSSL_no-OCSSL_then_update_RHEL_7.3_upgrade_10/23/consoleFull Created attachment 1214385 [details]
heat-engine.log
After fixing failed-resources / restart the PCS cluster - another attempt to re-run the upgrade from the same point failed : strange errors in messages (full messages file attached) : http://pastebin.test.redhat.com/424646 Created attachment 1214419 [details]
full messages
Do we have an idea what might be the root cause here? Is this something new what suddenly started happening or is it some new use cases which we did not test before? Looking at the environment we try to stop openstack-swift-proxy on one of the controller: [root@controller-0 ~]# systemctl status -l openstack-swift-proxy ● openstack-swift-proxy.service - OpenStack Object Storage (swift) - Proxy Server Loaded: loaded (/usr/lib/systemd/system/openstack-swift-proxy.service; enabled; vendor preset: disabled) Active: active (running) since Wed 2016-10-26 08:24:41 UTC; 1 day 6h ago Main PID: 2256 (swift-proxy-ser) CGroup: /system.slice/openstack-swift-proxy.service └─2256 /usr/bin/python2 /usr/bin/swift-proxy-server /etc/swift/proxy-server.conf Oct 27 15:08:17 controller-0.localdomain proxy-server[2256]: ERROR with Account server 172.17.4.13:6002/d1 re: Trying to GET /v1/AUTH_1075f161175442f99fad2e1efc031d26: Connectio d (txn: txe1be088ca13549a998af6-0058121861) (client_ip: 172.17.3.15) Oct 27 15:08:17 controller-0.localdomain proxy-server[2256]: ERROR with Account server 172.17.4.14:6002/d1 re: Trying to GET /v1/AUTH_1075f161175442f99fad2e1efc031d26: Connectio d (txn: txe1be088ca13549a998af6-0058121861) (client_ip: 172.17.3.15) Oct 27 15:08:17 controller-0.localdomain proxy-server[2256]: 172.17.3.15 172.17.3.15 27/Oct/2016/15/08/17 GET /v1/AUTH_1075f161175442f99fad2e1efc031d26%3Fformat%3Djson HTTP/1.0 thon-swiftclient-3.0.0 e8db84cc646c4b7e... - 2 - txe1be088ca13549a998af6-0058121861 - 0.0080 - - 1477580897.777108908 1477580897.785095930 - Oct 27 15:08:17 controller-0.localdomain proxy-server[2256]: Deferring reject downstream Oct 27 15:08:17 controller-0.localdomain proxy-server[2256]: - - 27/Oct/2016/15/08/17 HEAD /v1/AUTH_900da63cc23d4700ad38384e4dc052b1 HTTP/1.0 204 - Swift - - - - tx35b5347c7cd74 -0058121861 - 0.0071 RL - 1477580897.795131922 1477580897.802206993 - Oct 27 15:08:17 controller-0.localdomain proxy-server[2256]: 172.17.3.15 172.17.3.15 27/Oct/2016/15/08/17 GET /v1/AUTH_900da63cc23d4700ad38384e4dc052b1%3Fformat%3Djson HTTP/1.0 thon-swiftclient-3.0.0 e8db84cc646c4b7e... - 2 - tx35b5347c7cd740fa89046-0058121861 - 0.0077 - - 1477580897.804646015 1477580897.812345982 - Oct 27 15:08:18 controller-0.localdomain proxy-server[2256]: ERROR with Account server 172.17.4.14:6002/d1 re: Trying to HEAD /v1/AUTH_1075f161175442f99fad2e1efc031d26: Connecti ed (txn: txd6d67a1f17444699b4305-0058121862) (client_ip: 172.17.3.15) Oct 27 15:08:18 controller-0.localdomain proxy-server[2256]: ERROR with Account server 172.17.4.13:6002/d1 re: Trying to HEAD /v1/AUTH_1075f161175442f99fad2e1efc031d26: Connecti ed (txn: txd6d67a1f17444699b4305-0058121862) (client_ip: 172.17.3.15) Oct 27 15:08:18 controller-0.localdomain proxy-server[2256]: 172.17.3.15 172.17.3.15 27/Oct/2016/15/08/18 HEAD /v1/AUTH_1075f161175442f99fad2e1efc031d26 HTTP/1.0 204 - python-sw t-3.0.0 e8db84cc646c4b7e... - - - txd6d67a1f17444699b4305-0058121862 - 0.0079 - - 1477580898.147253036 1477580898.155152082 - Oct 27 15:08:18 controller-0.localdomain proxy-server[2256]: 172.17.3.15 172.17.3.15 27/Oct/2016/15/08/18 HEAD /v1/AUTH_900da63cc23d4700ad38384e4dc052b1 HTTP/1.0 204 - python-sw t-3.0.0 e8db84cc646c4b7e... - - - tx9b1741fe46ee4089922cb-0058121862 - 0.0074 - - 1477580898.164133072 1477580898.171533108 - It tries to get to swift on the other controllers, but there, the swift-proxy are indeed not listening (stopped) : [heat-admin@controller-1 ~]$ sudo -i [root@controller-1 ~]# netstat -pant | grep 600 -> nothing the same on the controller-2. Reproduce this issue on my BM (but with **SSL on Overcloud) I think it possible that the reason of this issue is - overcloud with SSL. (going to check on that ) This parameter should be added to SSL upgrade: PublicVirtualFixedIPs: [{'ip_address':'192.168.200.180'}]. Don't put 192.168.200.180, but the ip of the public and admin endpoint. This should be part of your local network yaml file. According Ben Nemec we should also add : -e /home/stack/ssl-heat-templates/environments/tls-endpoints-public-ip.yaml Add this ^ on-top of the deployment-command that is being used for the upgrade. @omri I think you were going to verify if this was environmental as in comment #16 ... do we still need this BZ or can we close it? Latest test on 08 NOV 2016 has failed to upgrade. [stack@undercloud-0 ~]$ heat resource-list -n5 overcloud | grep -v COMPLETE WARNING (shell) "heat resource-list" is deprecated, please use "openstack stack resource list" instead +--------------------------------------------+---------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+-----------------+----------------------+----------------------------------------------------------------------------------------------------------------------+ | resource_name | physical_resource_id | resource_type | resource_status | updated_time | stack_name | +--------------------------------------------+---------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+-----------------+----------------------+----------------------------------------------------------------------------------------------------------------------+ | UpdateWorkflow | 74a716e2-23b1-4463-b286-958ced24b0f8 | OS::TripleO::Tasks::UpdateWorkflow | UPDATE_FAILED | 2016-11-08T21:13:40Z | overcloud | | 0 | ad8123ed-b137-4132-bc17-de5f63594268 | OS::Heat::SoftwareDeployment | CREATE_FAILED | 2016-11-08T21:21:56Z | overcloud-UpdateWorkflow-r53etdnzrvoc-ControllerPacemakerUpgradeDeployment_Step1-j72cf6jpqnfr | | 1 | 3b652369-9c83-498e-9d47-cf1c6456458f | OS::Heat::SoftwareDeployment | CREATE_FAILED | 2016-11-08T21:21:56Z | overcloud-UpdateWorkflow-r53etdnzrvoc-ControllerPacemakerUpgradeDeployment_Step1-j72cf6jpqnfr | | ControllerPacemakerUpgradeDeployment_Step1 | 7b904d09-3e0e-4099-9e3d-3185c0ca49d1 | OS::Heat::SoftwareDeploymentGroup | CREATE_FAILED | 2016-11-08T21:21:56Z | overcloud-UpdateWorkflow-r53etdnzrvoc | | 2 | e5e0e6d6-eb1e-4aed-9a98-729233da648d | OS::Heat::SoftwareDeployment | CREATE_FAILED | 2016-11-08T21:21:57Z | overcloud-UpdateWorkflow-r53etdnzrvoc-ControllerPacemakerUpgradeDeployment_Step1-j72cf6jpqnfr | +--------------------------------------------+---------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+-----------------+----------------------+----------------------------------------------------------------------------------------------------------------------+ ---------+-------------+--------------------+-------------+ [stack@undercloud-0 ~]$ heat deployment-show e5e0e6d6-eb1e-4aed-9a98-729233da648d WARNING (shell) "heat deployment-show" is deprecated, please use "openstack software deployment show" instead { "status": "IN_PROGRESS", "server_id": "7caf37c8-3daf-4b78-a256-768a82679876", "config_id": "cc3c4825-4758-49e1-975a-de6d134cb3c6", "output_values": null, "creation_time": "2016-11-08T21:21:59Z", "input_values": { "update_identifier": "", "deploy_identifier": "1478639139" }, "action": "CREATE", "status_reason": "Deploy data available", "id": "e5e0e6d6-eb1e-4aed-9a98-729233da648d" } [root@controller-2 ~]# pcs status Cluster name: tripleo_cluster Stack: corosync Current DC: controller-2 (version 1.1.15-11.el7-e174ec8) - partition with quorum Last updated: Tue Nov 8 21:47:12 2016 Last change: Tue Nov 8 20:43:48 2016 by root via cibadmin on controller-0 3 nodes and 124 resources configured Online: [ controller-0 controller-1 controller-2 ] Full list of resources: ip-fd00.fd00.fd00.4000..10 (ocf::heartbeat:IPaddr2): Started controller-0 ip-192.0.2.6 (ocf::heartbeat:IPaddr2): Started controller-1 Clone Set: haproxy-clone [haproxy] Started: [ controller-0 controller-1 controller-2 ] Master/Slave Set: galera-master [galera] Masters: [ controller-0 controller-1 controller-2 ] Clone Set: memcached-clone [memcached] Started: [ controller-0 controller-1 controller-2 ] ip-2620.52.0.13b8.5054.ff.fe3e.1 (ocf::heartbeat:IPaddr2): Started controller-2 Clone Set: rabbitmq-clone [rabbitmq] Started: [ controller-0 controller-1 controller-2 ] Clone Set: openstack-core-clone [openstack-core] Started: [ controller-0 controller-1 controller-2 ] Master/Slave Set: redis-master [redis] Masters: [ controller-0 ] Slaves: [ controller-1 controller-2 ] ip-fd00.fd00.fd00.3000..10 (ocf::heartbeat:IPaddr2): Started controller-0 ip-fd00.fd00.fd00.2000..10 (ocf::heartbeat:IPaddr2): Started controller-1 ip-fd00.fd00.fd00.2000..11 (ocf::heartbeat:IPaddr2): Started controller-2 Clone Set: mongod-clone [mongod] Started: [ controller-0 controller-1 controller-2 ] Clone Set: openstack-aodh-evaluator-clone [openstack-aodh-evaluator] Started: [ controller-0 controller-1 controller-2 ] Clone Set: openstack-nova-scheduler-clone [openstack-nova-scheduler] Started: [ controller-0 controller-1 controller-2 ] Clone Set: neutron-l3-agent-clone [neutron-l3-agent] Started: [ controller-0 controller-1 controller-2 ] Clone Set: neutron-netns-cleanup-clone [neutron-netns-cleanup] Started: [ controller-0 controller-1 controller-2 ] Clone Set: neutron-ovs-cleanup-clone [neutron-ovs-cleanup] Started: [ controller-0 controller-1 controller-2 ] openstack-cinder-volume (systemd:openstack-cinder-volume): Started controller-0 Clone Set: openstack-heat-engine-clone [openstack-heat-engine] Started: [ controller-0 controller-1 controller-2 ] Clone Set: openstack-aodh-listener-clone [openstack-aodh-listener] Started: [ controller-0 controller-1 controller-2 ] Clone Set: neutron-metadata-agent-clone [neutron-metadata-agent] Started: [ controller-0 controller-1 controller-2 ] Clone Set: openstack-gnocchi-metricd-clone [openstack-gnocchi-metricd] Started: [ controller-0 controller-1 controller-2 ] Clone Set: openstack-aodh-notifier-clone [openstack-aodh-notifier] Started: [ controller-0 controller-1 controller-2 ] Clone Set: openstack-heat-api-clone [openstack-heat-api] Started: [ controller-0 controller-1 controller-2 ] Clone Set: openstack-ceilometer-collector-clone [openstack-ceilometer-collector] Started: [ controller-0 controller-1 controller-2 ] Clone Set: openstack-glance-api-clone [openstack-glance-api] Started: [ controller-0 controller-1 controller-2 ] Clone Set: openstack-cinder-scheduler-clone [openstack-cinder-scheduler] Started: [ controller-0 controller-1 controller-2 ] Clone Set: openstack-nova-api-clone [openstack-nova-api] Started: [ controller-0 controller-1 controller-2 ] Clone Set: openstack-nova-consoleauth-clone [openstack-nova-consoleauth] Started: [ controller-0 controller-1 controller-2 ] Clone Set: openstack-sahara-api-clone [openstack-sahara-api] Started: [ controller-0 controller-1 controller-2 ] Clone Set: openstack-heat-api-cloudwatch-clone [openstack-heat-api-cloudwatch] Started: [ controller-0 controller-1 controller-2 ] Clone Set: openstack-sahara-engine-clone [openstack-sahara-engine] Started: [ controller-0 controller-1 controller-2 ] Clone Set: openstack-glance-registry-clone [openstack-glance-registry] Started: [ controller-0 controller-1 controller-2 ] Clone Set: openstack-gnocchi-statsd-clone [openstack-gnocchi-statsd] Started: [ controller-0 controller-1 controller-2 ] Clone Set: openstack-ceilometer-notification-clone [openstack-ceilometer-notification] Started: [ controller-0 controller-1 controller-2 ] Clone Set: openstack-cinder-api-clone [openstack-cinder-api] Started: [ controller-0 controller-1 controller-2 ] Clone Set: neutron-dhcp-agent-clone [neutron-dhcp-agent] Started: [ controller-0 controller-1 controller-2 ] Clone Set: neutron-openvswitch-agent-clone [neutron-openvswitch-agent] Started: [ controller-0 controller-1 controller-2 ] Clone Set: openstack-nova-novncproxy-clone [openstack-nova-novncproxy] Started: [ controller-0 controller-1 controller-2 ] Clone Set: delay-clone [delay] Started: [ controller-0 controller-1 controller-2 ] Clone Set: neutron-server-clone [neutron-server] Started: [ controller-0 controller-1 controller-2 ] Clone Set: openstack-ceilometer-central-clone [openstack-ceilometer-central] Started: [ controller-0 controller-1 controller-2 ] Clone Set: httpd-clone [httpd] Started: [ controller-0 controller-1 controller-2 ] Clone Set: openstack-heat-api-cfn-clone [openstack-heat-api-cfn] Started: [ controller-0 controller-1 controller-2 ] Clone Set: openstack-nova-conductor-clone [openstack-nova-conductor] Started: [ controller-0 controller-1 controller-2 ] Failed Actions: * memcached_monitor_60000 on controller-1 'not running' (7): call=33, status=complete, exitreason='none', last-rc-change='Tue Nov 8 21:24:10 2016', queued=0ms, exec=0ms * mongod_monitor_60000 on controller-1 'not running' (7): call=78, status=complete, exitreason='none', last-rc-change='Tue Nov 8 21:24:04 2016', queued=0ms, exec=0ms * openstack-aodh-evaluator_monitor_60000 on controller-1 'not running' (7): call=341, status=complete, exitreason='none', last-rc-change='Tue Nov 8 21:25:09 2016', queued=0ms, exec=0ms * openstack-aodh-listener_monitor_60000 on controller-1 'not running' (7): call=344, status=complete, exitreason='none', last-rc-change='Tue Nov 8 21:25:11 2016', queued=0ms, exec=0ms * openstack-aodh-notifier_monitor_60000 on controller-1 'not running' (7): call=345, status=complete, exitreason='none', last-rc-change='Tue Nov 8 21:25:12 2016', queued=0ms, exec=0ms * openstack-nova-api_monitor_60000 on controller-1 'not running' (7): call=200, status=complete, exitreason='none', last-rc-change='Tue Nov 8 21:24:54 2016', queued=0ms, exec=0ms * openstack-nova-consoleauth_monitor_60000 on controller-1 'OCF_PENDING' (196): call=207, status=complete, exitreason='none', last-rc-change='Tue Nov 8 21:25:01 2016', queued=0ms, exec=0ms * neutron-dhcp-agent_monitor_60000 on controller-1 'OCF_PENDING' (196): call=263, status=complete, exitreason='none', last-rc-change='Tue Nov 8 21:23:27 2016', queued=0ms, exec=0ms * neutron-openvswitch-agent_monitor_60000 on controller-1 'not running' (7): call=270, status=complete, exitreason='none', last-rc-change='Tue Nov 8 21:24:30 2016', queued=0ms, exec=0ms * neutron-server_monitor_60000 on controller-1 'not running' (7): call=304, status=complete, exitreason='none', last-rc-change='Tue Nov 8 21:24:27 2016', queued=0ms, exec=0ms * httpd_monitor_60000 on controller-1 'not running' (7): call=306, status=complete, exitreason='none', last-rc-change='Tue Nov 8 21:23:27 2016', queued=0ms, exec=0ms * memcached_monitor_60000 on controller-2 'not running' (7): call=35, status=complete, exitreason='none', last-rc-change='Tue Nov 8 21:24:10 2016', queued=0ms, exec=0ms * mongod_monitor_60000 on controller-2 'not running' (7): call=79, status=complete, exitreason='none', last-rc-change='Tue Nov 8 21:24:04 2016', queued=0ms, exec=0ms * openstack-aodh-evaluator_monitor_60000 on controller-2 'not running' (7): call=344, status=complete, exitreason='none', last-rc-change='Tue Nov 8 21:25:09 2016', queued=0ms, exec=0ms * openstack-aodh-listener_monitor_60000 on controller-2 'not running' (7): call=347, status=complete, exitreason='none', last-rc-change='Tue Nov 8 21:25:11 2016', queued=0ms, exec=0ms * openstack-aodh-notifier_monitor_60000 on controller-2 'not running' (7): call=348, status=complete, exitreason='none', last-rc-change='Tue Nov 8 21:25:11 2016', queued=0ms, exec=0ms * neutron-dhcp-agent_monitor_60000 on controller-2 'OCF_PENDING' (196): call=266, status=complete, exitreason='none', last-rc-change='Tue Nov 8 21:23:26 2016', queued=0ms, exec=0ms * neutron-openvswitch-agent_monitor_60000 on controller-2 'not running' (7): call=273, status=complete, exitreason='none', last-rc-change='Tue Nov 8 21:24:30 2016', queued=0ms, exec=0ms * neutron-server_monitor_60000 on controller-2 'not running' (7): call=307, status=complete, exitreason='none', last-rc-change='Tue Nov 8 21:24:27 2016', queued=0ms, exec=0ms * httpd_monitor_60000 on controller-2 'not running' (7): call=309, status=complete, exitreason='none', last-rc-change='Tue Nov 8 21:23:27 2016', queued=0ms, exec=0ms Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled [root@controller-2 ~]# pcs status | grep -i stopped -B2 [root@controller-2 ~]# pcs status | grep -i unmanaged -B2 [stack@undercloud-0 ~]$ heat deployment-show e5e0e6d6-eb1e-4aed-9a98-729233da648d WARNING (shell) "heat deployment-show" is deprecated, please use "openstack software deployment show" instead { "status": "IN_PROGRESS", "server_id": "7caf37c8-3daf-4b78-a256-768a82679876", "config_id": "cc3c4825-4758-49e1-975a-de6d134cb3c6", "output_values": null, "creation_time": "2016-11-08T21:21:59Z", "input_values": { "update_identifier": "", "deploy_identifier": "1478639139" }, "action": "CREATE", "status_reason": "Deploy data available", "id": "e5e0e6d6-eb1e-4aed-9a98-729233da648d" } same command later in time.. maybe 15 minutes [stack@undercloud-0 ~]$ heat deployment-show e5e0e6d6-eb1e-4aed-9a98-729233da648d WARNING (shell) "heat deployment-show" is deprecated, please use "openstack software deployment show" instead { "status": "FAILED", "server_id": "7caf37c8-3daf-4b78-a256-768a82679876", "config_id": "cc3c4825-4758-49e1-975a-de6d134cb3c6", "output_values": { "deploy_stdout": "mysql upgrade required: 0\nTue Nov 8 21:23:18 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop httpd\nTue Nov 8 21:23:20 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop memcached\nTue Nov 8 21:23:20 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop mongod\nTue Nov 8 21:23:20 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop neutron-dhcp-agent\nTue Nov 8 21:23:28 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop neutron-l3-agent\nTue Nov 8 21:23:37 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop neutron-metadata-agent\nTue Nov 8 21:23:38 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop neutron-netns-cleanup\nTue Nov 8 21:23:38 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop neutron-openvswitch-agent\nTue Nov 8 21:23:39 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop neutron-ovs-cleanup\nTue Nov 8 21:23:39 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop neutron-server\nTue Nov 8 21:24:12 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop openstack-aodh-evaluator\nTue Nov 8 21:24:31 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop openstack-aodh-listener\nTue Nov 8 21:24:32 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop openstack-aodh-notifier\nTue Nov 8 21:24:33 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop openstack-ceilometer-central\nTue Nov 8 21:24:33 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop openstack-ceilometer-collector\nTue Nov 8 21:24:34 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop openstack-ceilometer-notification\nTue Nov 8 21:24:47 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop openstack-cinder-api\nTue Nov 8 21:24:47 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop openstack-cinder-scheduler\nTue Nov 8 21:25:04 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop openstack-glance-api\nTue Nov 8 21:25:04 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop openstack-glance-registry\nTue Nov 8 21:25:04 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop openstack-gnocchi-metricd\nTue Nov 8 21:25:05 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop openstack-gnocchi-statsd\nTue Nov 8 21:25:05 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop openstack-heat-api-cfn\nTue Nov 8 21:25:06 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop openstack-heat-api\nTue Nov 8 21:25:06 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop openstack-heat-api-cloudwatch\nTue Nov 8 21:25:06 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop openstack-heat-engine\nTue Nov 8 21:25:06 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop openstack-nova-api\nTue Nov 8 21:25:06 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop openstack-nova-conductor\nTue Nov 8 21:25:07 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop openstack-nova-consoleauth\nTue Nov 8 21:25:15 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop openstack-nova-novncproxy\nTue Nov 8 21:25:15 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop openstack-nova-scheduler\nTue Nov 8 21:25:15 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop openstack-sahara-api\nTue Nov 8 21:25:15 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop openstack-sahara-engine\nTue Nov 8 21:25:16 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop openstack-swift-account-auditor.service\nTue Nov 8 21:25:16 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop openstack-swift-account-reaper.service\nTue Nov 8 21:25:16 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop openstack-swift-account-replicator.service\nTue Nov 8 21:25:16 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop openstack-swift-account.service\nTue Nov 8 21:25:16 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop openstack-swift-container-auditor.service\nTue Nov 8 21:25:16 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop openstack-swift-container-replicator.service\nTue Nov 8 21:25:17 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop openstack-swift-container-updater.service\nTue Nov 8 21:25:17 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop openstack-swift-container.service\nTue Nov 8 21:25:17 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop openstack-swift-object-auditor.service\nTue Nov 8 21:25:17 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop openstack-swift-object-replicator.service\nTue Nov 8 21:25:17 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop openstack-swift-object-updater.service\nTue Nov 8 21:25:17 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop openstack-swift-object.service\nTue Nov 8 21:25:17 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop openstack-swift-proxy.service\nTue Nov 8 21:25:18 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop openstack-swift-account-reaper\nTue Nov 8 21:25:18 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop openstack-swift-account-replicator\nTue Nov 8 21:25:18 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop openstack-swift-account\nTue Nov 8 21:25:18 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop openstack-swift-container-auditor\nTue Nov 8 21:25:18 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop openstack-swift-container-replicator\nTue Nov 8 21:25:18 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop openstack-swift-container-updater\nTue Nov 8 21:25:19 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop openstack-swift-container\nTue Nov 8 21:25:19 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop openstack-swift-object-auditor\nTue Nov 8 21:25:19 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop openstack-swift-object-replicator\nTue Nov 8 21:25:19 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop openstack-swift-object-updater\nTue Nov 8 21:25:19 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop openstack-swift-object\nTue Nov 8 21:25:19 UTC 2016 cc3c4825-4758-49e1-975a-de6d134cb3c6 tripleo-upgrade controller-2 Going to systemctl stop openstack-swift-proxy\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nERROR: cluster shutdown timed out\n", "deploy_stderr": "", "deploy_status_code": 1 }, "creation_time": "2016-11-08T21:21:59Z", "updated_time": "2016-11-08T21:55:23Z", "input_values": { "update_identifier": "", "deploy_identifier": "1478639139" }, "action": "CREATE", "status_reason": "deploy_status_code : Deployment exited with non-zero status code: 1", "id": "e5e0e6d6-eb1e-4aed-9a98-729233da648d" } Please see also SOS REPORT So here is the reason for the failure: Currently when we call the major-upgrade step we do the following: """ ... if [[ -n $(is_bootstrap_node) ]]; then check_clean_cluster fi ... if [[ -n $(is_bootstrap_node) ]]; then migrate_full_to_ng_ha fi ... for service in $(services_to_migrate); do manage_systemd_service stop "${service%%-clone}" ... done """ The problem with the above code is that it is open to the following race condition: 1. Code gets run first on a non-bootstrap controller node so we start stopping a bunch of services 2. Pacemaker notices will notice that services are down and will mark the service as stopped 3. Code gets run on the bootstrap node (controller-0) and the check_clean_cluster function will fail and exit 4. Eventually also the script on the non-bootstrap controller node will timeout and exit because the cluster never shut down (it never actually started the shutdown because we failed at 3) I attached a review that is fairly simply in concept: split major_upgrade_controller_pacemaker_1 in two so we can guarantee that no systemd service will be stopped before anything else. It is very simple in concept (split the file in two, so we move to 6 steps instead of 5), but it is an invasive change. Happy to discuss alternative approaches Mike, I prepared the newton backport in https://review.openstack.org/#/c/395460/. Can you try and give it a test as soon as you can? Since the change is simple but invasive I'd lke to get as much feedback as possible Thanks, Michele fix landed to stable/newton with https://review.openstack.org/#/c/395460/ moving POST Deploy RHOS 9 using latest puddle 2016-11-19.4/ Removed patch from upgrade to test it landed. Upgrade 9->10 No long see this issue seen. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-2948.html |