Description of problem: Doing a osp9/osp10 upgrade, at controller upgrade step: overcloud-UpdateWorkflow-q7u6tnoyd3ib-ControllerPacemakerUpgradeDeployment_Step1-pe3jsj7qivgb/546e7c5b-f286-4152-937e-f42469f949c9 The swift-proxy on the boostrap node seems to not being able to shutdown: Wed Nov 2 22:30:38 UTC 2016 16b52e4a-98f7-498b-9b96-d5e111fc1dd2 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-account-reaper.service Wed Nov 2 22:30:38 UTC 2016 16b52e4a-98f7-498b-9b96-d5e111fc1dd2 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-account-replicator.service Wed Nov 2 22:30:38 UTC 2016 16b52e4a-98f7-498b-9b96-d5e111fc1dd2 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-account.service Wed Nov 2 22:30:39 UTC 2016 16b52e4a-98f7-498b-9b96-d5e111fc1dd2 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-container-auditor.service Wed Nov 2 22:30:39 UTC 2016 16b52e4a-98f7-498b-9b96-d5e111fc1dd2 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-container-replicator.service Wed Nov 2 22:30:39 UTC 2016 16b52e4a-98f7-498b-9b96-d5e111fc1dd2 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-container-updater.service Wed Nov 2 22:30:39 UTC 2016 16b52e4a-98f7-498b-9b96-d5e111fc1dd2 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-container.service Wed Nov 2 22:30:39 UTC 2016 16b52e4a-98f7-498b-9b96-d5e111fc1dd2 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-object-auditor.service Wed Nov 2 22:30:39 UTC 2016 16b52e4a-98f7-498b-9b96-d5e111fc1dd2 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-object-replicator.service Wed Nov 2 22:30:40 UTC 2016 16b52e4a-98f7-498b-9b96-d5e111fc1dd2 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-object-updater.service Wed Nov 2 22:30:40 UTC 2016 16b52e4a-98f7-498b-9b96-d5e111fc1dd2 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-object.service Wed Nov 2 22:30:40 UTC 2016 16b52e4a-98f7-498b-9b96-d5e111fc1dd2 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-proxy.service Wed Nov 2 22:30:40 UTC 2016 16b52e4a-98f7-498b-9b96-d5e111fc1dd2 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-account-reaper Wed Nov 2 22:30:40 UTC 2016 16b52e4a-98f7-498b-9b96-d5e111fc1dd2 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-account-replicator Wed Nov 2 22:30:40 UTC 2016 16b52e4a-98f7-498b-9b96-d5e111fc1dd2 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-account Wed Nov 2 22:30:41 UTC 2016 16b52e4a-98f7-498b-9b96-d5e111fc1dd2 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-container-auditor Wed Nov 2 22:30:41 UTC 2016 16b52e4a-98f7-498b-9b96-d5e111fc1dd2 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-container-replicator Wed Nov 2 22:30:41 UTC 2016 16b52e4a-98f7-498b-9b96-d5e111fc1dd2 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-container-updater Wed Nov 2 22:30:41 UTC 2016 16b52e4a-98f7-498b-9b96-d5e111fc1dd2 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-container Wed Nov 2 22:30:41 UTC 2016 16b52e4a-98f7-498b-9b96-d5e111fc1dd2 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-object-auditor Wed Nov 2 22:30:41 UTC 2016 16b52e4a-98f7-498b-9b96-d5e111fc1dd2 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-object-replicator Wed Nov 2 22:30:41 UTC 2016 16b52e4a-98f7-498b-9b96-d5e111fc1dd2 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-object-updater Wed Nov 2 22:30:42 UTC 2016 16b52e4a-98f7-498b-9b96-d5e111fc1dd2 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-object Wed Nov 2 22:30:42 UTC 2016 16b52e4a-98f7-498b-9b96-d5e111fc1dd2 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-proxy active active active active ... ERROR: cluster shutdown timed out On the node we have ● openstack-swift-proxy.service - OpenStack Object Storage (swift) - Proxy Server Loaded: loaded (/usr/lib/systemd/system/openstack-swift-proxy.service; enabled; vendor preset: disabled) Active: active (running) since Wed 2016-11-02 20:39:03 UTC; 13h ago Main PID: 18232 (swift-proxy-ser) CGroup: /system.slice/openstack-swift-proxy.service └─18232 /usr/bin/python2 /usr/bin/swift-proxy-server /etc/swift/proxy-server.conf Nov 03 09:41:19 overcloud-controller-0.localdomain proxy-server[18232]: 10.19.105.15 10.19.105.15 03/Nov/2016/09/41/19 HEAD /v1/AUTH_e8c55b13b9744c38b9aea63e708bd04c HTTP/1.0 204 - python-swiftclient-3.0.0 eb02e5 e9f4a941b9... - - - tx5a18d681ea4844aab54d6-00581b063f - 0.0119 - - 1478166079.913808107 1478166079.925674915 - Nov 03 09:41:20 overcloud-controller-0.localdomain proxy-server[18232]: ERROR with Account server 192.168.200.13:6002/d1 re: Trying to GET /v1/AUTH_6a582ff797e841df89762ba3a63a1de3: Connection refused (txn: tx3fe f621123214658b235e-00581b0640) (client_ip: 10.19.105.15) Nov 03 09:41:20 overcloud-controller-0.localdomain proxy-server[18232]: ERROR with Account server 192.168.200.14:6002/d1 re: Trying to GET /v1/AUTH_6a582ff797e841df89762ba3a63a1de3: Connection refused (txn: tx3fe f621123214658b235e-00581b0640) (client_ip: 10.19.105.15) Nov 03 09:41:20 overcloud-controller-0.localdomain proxy-server[18232]: 10.19.105.15 10.19.105.15 03/Nov/2016/09/41/20 GET /v1/AUTH_6a582ff797e841df89762ba3a63a1de3%3Fformat%3Djson HTTP/1.0 200 - python-swiftclie nt-3.0.0 eb02e5e9f4a941b9... - 2 - tx3fef621123214658b235e-00581b0640 - 0.0118 - - 1478166080.043324947 1478166080.055134058 - Nov 03 09:41:20 overcloud-controller-0.localdomain proxy-server[18232]: ERROR with Account server 192.168.200.13:6002/d1 re: Trying to GET /v1/AUTH_e8c55b13b9744c38b9aea63e708bd04c: Connection refused (txn: tx548 df06c7135404aa6a85-00581b0640) (client_ip: 10.19.105.15) Nov 03 09:41:20 overcloud-controller-0.localdomain proxy-server[18232]: ERROR with Account server 192.168.200.14:6002/d1 re: Trying to GET /v1/AUTH_e8c55b13b9744c38b9aea63e708bd04c: Connection refused (txn: tx548 df06c7135404aa6a85-00581b0640) (client_ip: 10.19.105.15) Nov 03 09:41:20 overcloud-controller-0.localdomain proxy-server[18232]: 10.19.105.15 10.19.105.15 03/Nov/2016/09/41/20 GET /v1/AUTH_e8c55b13b9744c38b9aea63e708bd04c%3Fformat%3Djson HTTP/1.0 200 - python-swiftclie nt-3.0.0 eb02e5e9f4a941b9... - 2 - tx548df06c7135404aa6a85-00581b0640 - 0.0111 - - 1478166080.066014051 1478166080.077146053 - Nov 03 09:41:24 overcloud-controller-0.localdomain proxy-server[18232]: ERROR with Account server 192.168.200.14:6002/d1 re: Trying to GET /v1/AUTH_6a582ff797e841df89762ba3a63a1de3: Connection refused (txn: txb17 0e18c344b4912ad0cb-00581b0644) (client_ip: 10.19.105.13) Nov 03 09:41:24 overcloud-controller-0.localdomain proxy-server[18232]: ERROR with Account server 192.168.200.13:6002/d1 re: Trying to GET /v1/AUTH_6a582ff797e841df89762ba3a63a1de3: Connection refused (txn: txb17 0e18c344b4912ad0cb-00581b0644) (client_ip: 10.19.105.13) Nov 03 09:41:24 overcloud-controller-0.localdomain proxy-server[18232]: 10.19.105.13 10.19.105.13 03/Nov/2016/09/41/24 GET /v1/AUTH_6a582ff797e841df89762ba3a63a1de3%3Fformat%3Djson HTTP/1.0 200 - python-swiftclie nt-3.1.0 0ae5a6d614fc4e63... - 2 - txb170e18c344b4912ad0cb-00581b0644 - 0.0129 - - 1478166084.167977095 1478166084.180881023 - Version-Release number of selected component (if applicable): this is with latest puddle from nov 2th 2016 How reproducible: sometimes. It happened twice, but I've got a successful controller upgrade using the same puddle. Steps to Reproduce: 1. osp9/10 upgrade 2. controller upgrade step
Maybe make sure that swift-proxy is shutdown before swift-account ?
As discussed on Lifecycle scrum today, it seems this issue should be fixed by some combination of the two reviews linked above - only one of which is landed into stable/newton at time of writing so holding off on POST... We agreed we'd let this run through QA and if we continue to hit it we can revisit. Moving to ASSIGNED for now.
I just hit this issue doing a OSP9 to OSP10 upgrade. I am following the steps in the RHOSP10 Director traning. Running latest 10 puddle from rhos-release. https://gitlab.cee.redhat.com/roxenham/director-osp10/blob/master/content/lab5-upgrades.md openstack overcloud deploy --templates --ntp-server 10.16.255.1 \ --control-scale 1 --compute-scale 2 --neutron-tunnel-types vxlan --neutron-network-type vxlan \ --control-flavor control --compute-flavor compute -e \ /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-pacemaker.yaml [stack@undercloud ~]$ heat deployment-show 2d751553-9c01-4618-859a-3de8d0122728 WARNING (shell) "heat deployment-show" is deprecated, please use "openstack software deployment show" instead { "status": "FAILED", "server_id": "85e19444-7d3e-429b-ad24-df8fc1a27731", "config_id": "ef6ff4ac-448c-4c6f-b63e-8f873718c862", "output_values": { "deploy_stdout": "mysql upgrade required: 0\nWed Nov 16 18:54:58 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop httpd\nWed Nov 16 18:54:59 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop memcached\nWed Nov 16 18:54:59 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop mongod\nWed Nov 16 18:55:00 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop neutron-dhcp-agent\nWed Nov 16 18:55:09 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop neutron-l3-agent\nWed Nov 16 18:55:15 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop neutron-metadata-agent\nWed Nov 16 18:55:15 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop neutron-netns-cleanup\nWed Nov 16 18:55:15 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop neutron-openvswitch-agent\nWed Nov 16 18:55:16 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop neutron-ovs-cleanup\nWed Nov 16 18:55:16 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop neutron-server\nWed Nov 16 18:55:45 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-aodh-evaluator\nWed Nov 16 18:55:46 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-aodh-listener\nWed Nov 16 18:55:47 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-aodh-notifier\nWed Nov 16 18:55:48 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-ceilometer-central\nWed Nov 16 18:59:34 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-ceilometer-collector\nWed Nov 16 18:59:34 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-ceilometer-notification\nWed Nov 16 18:59:34 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-cinder-api\nWed Nov 16 18:59:34 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-cinder-scheduler\nWed Nov 16 18:59:34 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-glance-api\nWed Nov 16 18:59:34 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-glance-registry\nWed Nov 16 18:59:34 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-gnocchi-metricd\nWed Nov 16 18:59:34 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-gnocchi-statsd\nWed Nov 16 18:59:35 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-heat-api-cfn\nWed Nov 16 18:59:35 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-heat-api\nWed Nov 16 18:59:35 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-heat-api-cloudwatch\nWed Nov 16 18:59:35 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-heat-engine\nWed Nov 16 18:59:35 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-nova-api\nWed Nov 16 18:59:35 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-nova-conductor\nWed Nov 16 18:59:35 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-nova-consoleauth\nWed Nov 16 18:59:35 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-nova-novncproxy\nWed Nov 16 18:59:35 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-nova-scheduler\nWed Nov 16 18:59:35 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-sahara-api\nWed Nov 16 18:59:35 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-sahara-engine\nWed Nov 16 18:59:35 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-account-reaper\nWed Nov 16 18:59:35 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-account-replicator\nWed Nov 16 18:59:36 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-account\nWed Nov 16 18:59:36 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-container-auditor\nWed Nov 16 18:59:36 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-container-replicator\nWed Nov 16 18:59:36 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-container-updater\nWed Nov 16 18:59:36 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-container\nWed Nov 16 18:59:36 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-object-auditor\nWed Nov 16 18:59:36 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-object-replicator\nWed Nov 16 18:59:36 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-object-updater\nWed Nov 16 18:59:36 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-object\nWed Nov 16 18:59:36 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-proxy\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nERROR: cluster shutdown timed out\n", "deploy_stderr": "", "deploy_status_code": 1 }, "creation_time": "2016-11-16T15:54:22Z", "updated_time": "2016-11-16T16:30:12Z", "input_values": { "update_identifier": "", "deploy_identifier": "1479310036" }, "action": "CREATE", "status_reason": "deploy_status_code : Deployment exited with non-zero status code: 1", "id": "2d751553-9c01-4618-859a-3de8d0122728" } State of failed resources in Pacemaker Failed Actions: * memcached_monitor_60000 on overcloud-controller-2 'not running' (7): call=556, status=complete, exitreason='none', last-rc-change='Wed Nov 16 18:55:04 2016', queued=0ms, exec=0ms * mongod_monitor_60000 on overcloud-controller-2 'not running' (7): call=590, status=complete, exitreason='none', last-rc-change='Wed Nov 16 18:55:09 2016', queued=0ms, exec=0ms * openstack-aodh-evaluator_monitor_60000 on overcloud-controller-2 'not running' (7): call=741, status=complete, exitreason='none', last-rc-change='Wed Nov 16 18:56:21 2016', queued=0ms, exec=0ms * openstack-aodh-listener_monitor_60000 on overcloud-controller-2 'not running' (7): call=742, status=complete, exitreason='none', last-rc-change='Wed Nov 16 18:56:21 2016', queued=0ms, exec=0ms * openstack-aodh-notifier_monitor_60000 on overcloud-controller-2 'not running' (7): call=743, status=complete, exitreason='none', last-rc-change='Wed Nov 16 18:56:21 2016', queued=0ms, exec=0ms * neutron-dhcp-agent_monitor_60000 on overcloud-controller-2 'not running' (7): call=763, status=complete, exitreason='none', last-rc-change='Wed Nov 16 18:55:31 2016', queued=0ms, exec=0ms * neutron-openvswitch-agent_monitor_60000 on overcloud-controller-2 'not running' (7): call=764, status=complete, exitreason='none', last-rc-change='Wed Nov 16 18:55:31 2016', queued=0ms, exec=0ms * neutron-server_monitor_60000 on overcloud-controller-2 'not running' (7): call=765, status=complete, exitreason='none', last-rc-change='Wed Nov 16 18:56:32 2016', queued=0ms, exec=0ms * openstack-ceilometer-central_monitor_60000 on overcloud-controller-2 'OCF_PENDING' (196): call=766, status=complete, exitreason='none', last-rc-change='Wed Nov 16 18:56:32 2016', queued=0ms, exec=0ms * httpd_monitor_60000 on overcloud-controller-2 'not running' (7): call=767, status=complete, exitreason='none', last-rc-change='Wed Nov 16 18:55:32 2016', queued=0ms, exec=0ms * memcached_monitor_60000 on overcloud-controller-1 'not running' (7): call=560, status=complete, exitreason='none', last-rc-change='Wed Nov 16 18:55:04 2016', queued=0ms, exec=0ms * mongod_monitor_60000 on overcloud-controller-1 'not running' (7): call=589, status=complete, exitreason='none', last-rc-change='Wed Nov 16 18:55:09 2016', queued=0ms, exec=0ms * openstack-aodh-evaluator_monitor_60000 on overcloud-controller-1 'not running' (7): call=748, status=complete, exitreason='none', last-rc-change='Wed Nov 16 18:56:21 2016', queued=0ms, exec=0ms * openstack-aodh-listener_monitor_60000 on overcloud-controller-1 'not running' (7): call=751, status=complete, exitreason='none', last-rc-change='Wed Nov 16 18:56:21 2016', queued=0ms, exec=0ms * openstack-aodh-notifier_monitor_60000 on overcloud-controller-1 'not running' (7): call=753, status=complete, exitreason='none', last-rc-change='Wed Nov 16 18:56:21 2016', queued=0ms, exec=0ms * neutron-dhcp-agent_monitor_60000 on overcloud-controller-1 'not running' (7): call=755, status=complete, exitreason='none', last-rc-change='Wed Nov 16 18:55:21 2016', queued=0ms, exec=0ms * neutron-openvswitch-agent_monitor_60000 on overcloud-controller-1 'not running' (7): call=756, status=complete, exitreason='none', last-rc-change='Wed Nov 16 18:55:21 2016', queued=0ms, exec=0ms * neutron-server_monitor_60000 on overcloud-controller-1 'not running' (7): call=759, status=complete, exitreason='none', last-rc-change='Wed Nov 16 18:56:21 2016', queued=0ms, exec=0ms * openstack-ceilometer-central_monitor_60000 on overcloud-controller-1 'OCF_PENDING' (196): call=783, status=complete, exitreason='none', last-rc-change='Wed Nov 16 18:56:32 2016', queued=0ms, exec=0ms * httpd_monitor_60000 on overcloud-controller-1 'not running' (7): call=784, status=complete, exitreason='none', last-rc-change='Wed Nov 16 18:55:32 2016', queued=0ms, exec=0ms
Apparently you just have to run this step over and over again until it succeeds. Note that in our RHOSP10-director class, I am the only one that went through the upgrade with 3 controllers and had this issue. All the rest did single controllers. On my 5th run, I finally got an update complete. 2016-11-16 19:01:17Z [overcloud-UpdateWorkflow-fha4bedulma3]: UPDATE_COMPLETE Stack UPDATE completed successfully 2016-11-16 19:01:18Z [UpdateWorkflow]: UPDATE_COMPLETE state changed 2016-11-16 19:01:28Z [overcloud]: UPDATE_COMPLETE Stack UPDATE completed successfully Stack overcloud UPDATE_COMPLETE Started Mistral Workflow. Execution ID: b42b2c3d-db1c-4442-9a66-8fc7e0f03dfd Overcloud Endpoint: http://172.16.0.30:5000/v2.0 Overcloud Deployed
Jacob pointed out that I made a mistake when I entered comment 5. For that comment I copied the deploy command from the lab document which specifies one controller. The command I was running that was failing and eventually succeeded had control scale of 3. #!/usr/bin/env bash cd ~ source stackrc openstack overcloud deploy --templates --ntp-server 10.16.255.1 \ --control-scale 3 --compute-scale 2 --neutron-tunnel-types vxlan --neutron-network-type vxlan \ --control-flavor control --compute-flavor compute -e \ /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-pacemaker.yaml
(In reply to Kevin Jones from comment #7) > Apparently you just have to run this step over and over again until it > succeeds. heh :) not quite, although to be fair, since the steps are idempotent, when any one fails indeed the standard procedure is to recover and re-run that step. > > Note that in our RHOSP10-director class, I am the only one that went through > the upgrade with 3 controllers and had this issue. All the rest did single > controllers. > thanks for your testing Kevin, I think you are hitting a race condition, will follow up with another comment to clear the needinfo.
Hi Jacob and Kevin, To be clear, Kevin hits this bug in a 3 controller HA setup, i.e. comment #5 is only wrong in that '--control-scale 1' should be "--control-scale 3" because Kevin indeed has controller HA setup (based on comment #8 and comment #9). In this case it sounds to me like he *may* be hitting BZ 1389040 - disregard the 'SSL' in the title... see especially comments: BZ 1389040#c11 and BZ 1389040#c22 has an explanation of the race and the fix Michele put out at https://review.openstack.org/#/c/395460/ The fix for that BZ ^ landed in Newton but not into puddle, which is why we still carry it at https://gitlab.cee.redhat.com/sathlang/ospd-9-to-10-upgrade/blob/master/README.md - this is the working doc the Lifecycle team uses to document the *current* upgrade procedure meaning carrying any patches that didn't land into the puddle yet (and you can see for example application of that patch to the environment in https://gitlab.cee.redhat.com/sathlang/ospd-9-to-10-upgrade/blob/master/README.md#patches-workaround-1 ). So try applying this on your undercloud before starting any of the ugprade (i.e. before upgrade init): # controller and block storage upgrade pcs disruption: BZ 1389040 curl https://review.openstack.org/changes/395460/revisions/current/patch?download | \ base64 -d | \ sudo patch -d /usr/share/openstack-tripleo-heat-templates/ -p1 Hope that helps, please check with that fix applied - note though if you have trouble applying that it may be it landed into puddle last night (haven't checked yet this morning). thanks, marios
updating patch to stable/newton patch which is landed
According to our records the fixes for this have been included in already released packages puppet-tripleo-5.4.0-4.el7ost openstack-tripleo-heat-templates-5.1.0-7.el7ost
Unable to reproduce with latest z3 build.