Bug 1391447 - OSP-9/10 upgrade sometimes fails to shutdown swift-proxy at controller upgrade step.
Summary: OSP-9/10 upgrade sometimes fails to shutdown swift-proxy at controller upgrad...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 10.0 (Newton)
Assignee: Marios Andreou
QA Contact: Marius Cornea
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-11-03 10:50 UTC by Sofer Athlan-Guyot
Modified: 2017-05-18 17:56 UTC (History)
9 users (show)

Fixed In Version: puppet-tripleo-5.4.0-4.el7ost, openstack-tripleo-heat-templates-5.1.0-7.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-05-18 17:56:55 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 393760 0 None None None 2016-11-04 16:56:31 UTC
OpenStack gerrit 393770 0 None None None 2017-01-12 21:25:54 UTC

Description Sofer Athlan-Guyot 2016-11-03 10:50:26 UTC
Description of problem:  Doing a osp9/osp10 upgrade, at controller upgrade step:

 overcloud-UpdateWorkflow-q7u6tnoyd3ib-ControllerPacemakerUpgradeDeployment_Step1-pe3jsj7qivgb/546e7c5b-f286-4152-937e-f42469f949c9

The swift-proxy on the boostrap node seems to not being able to shutdown:
Wed Nov  2 22:30:38 UTC 2016 16b52e4a-98f7-498b-9b96-d5e111fc1dd2 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-account-reaper.service
Wed Nov  2 22:30:38 UTC 2016 16b52e4a-98f7-498b-9b96-d5e111fc1dd2 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-account-replicator.service
Wed Nov  2 22:30:38 UTC 2016 16b52e4a-98f7-498b-9b96-d5e111fc1dd2 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-account.service
Wed Nov  2 22:30:39 UTC 2016 16b52e4a-98f7-498b-9b96-d5e111fc1dd2 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-container-auditor.service
Wed Nov  2 22:30:39 UTC 2016 16b52e4a-98f7-498b-9b96-d5e111fc1dd2 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-container-replicator.service
Wed Nov  2 22:30:39 UTC 2016 16b52e4a-98f7-498b-9b96-d5e111fc1dd2 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-container-updater.service
Wed Nov  2 22:30:39 UTC 2016 16b52e4a-98f7-498b-9b96-d5e111fc1dd2 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-container.service
Wed Nov  2 22:30:39 UTC 2016 16b52e4a-98f7-498b-9b96-d5e111fc1dd2 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-object-auditor.service
Wed Nov  2 22:30:39 UTC 2016 16b52e4a-98f7-498b-9b96-d5e111fc1dd2 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-object-replicator.service
Wed Nov  2 22:30:40 UTC 2016 16b52e4a-98f7-498b-9b96-d5e111fc1dd2 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-object-updater.service
Wed Nov  2 22:30:40 UTC 2016 16b52e4a-98f7-498b-9b96-d5e111fc1dd2 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-object.service
Wed Nov  2 22:30:40 UTC 2016 16b52e4a-98f7-498b-9b96-d5e111fc1dd2 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-proxy.service
Wed Nov  2 22:30:40 UTC 2016 16b52e4a-98f7-498b-9b96-d5e111fc1dd2 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-account-reaper
Wed Nov  2 22:30:40 UTC 2016 16b52e4a-98f7-498b-9b96-d5e111fc1dd2 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-account-replicator
Wed Nov  2 22:30:40 UTC 2016 16b52e4a-98f7-498b-9b96-d5e111fc1dd2 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-account
Wed Nov  2 22:30:41 UTC 2016 16b52e4a-98f7-498b-9b96-d5e111fc1dd2 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-container-auditor
Wed Nov  2 22:30:41 UTC 2016 16b52e4a-98f7-498b-9b96-d5e111fc1dd2 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-container-replicator
Wed Nov  2 22:30:41 UTC 2016 16b52e4a-98f7-498b-9b96-d5e111fc1dd2 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-container-updater
Wed Nov  2 22:30:41 UTC 2016 16b52e4a-98f7-498b-9b96-d5e111fc1dd2 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-container
Wed Nov  2 22:30:41 UTC 2016 16b52e4a-98f7-498b-9b96-d5e111fc1dd2 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-object-auditor
Wed Nov  2 22:30:41 UTC 2016 16b52e4a-98f7-498b-9b96-d5e111fc1dd2 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-object-replicator
Wed Nov  2 22:30:41 UTC 2016 16b52e4a-98f7-498b-9b96-d5e111fc1dd2 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-object-updater
Wed Nov  2 22:30:42 UTC 2016 16b52e4a-98f7-498b-9b96-d5e111fc1dd2 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-object
Wed Nov  2 22:30:42 UTC 2016 16b52e4a-98f7-498b-9b96-d5e111fc1dd2 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-proxy
active
active
active
active
...
ERROR: cluster shutdown timed out

On the node we have

      ● openstack-swift-proxy.service - OpenStack Object Storage (swift) - Proxy Server
         Loaded: loaded (/usr/lib/systemd/system/openstack-swift-proxy.service; enabled; vendor preset: disabled)
         Active: active (running) since Wed 2016-11-02 20:39:03 UTC; 13h ago
       Main PID: 18232 (swift-proxy-ser)
         CGroup: /system.slice/openstack-swift-proxy.service
                 └─18232 /usr/bin/python2 /usr/bin/swift-proxy-server /etc/swift/proxy-server.conf

      Nov 03 09:41:19 overcloud-controller-0.localdomain proxy-server[18232]: 10.19.105.15 10.19.105.15 03/Nov/2016/09/41/19 HEAD /v1/AUTH_e8c55b13b9744c38b9aea63e708bd04c HTTP/1.0 204 - python-swiftclient-3.0.0 eb02e5
      e9f4a941b9... - - - tx5a18d681ea4844aab54d6-00581b063f - 0.0119 - - 1478166079.913808107 1478166079.925674915 -
      Nov 03 09:41:20 overcloud-controller-0.localdomain proxy-server[18232]: ERROR with Account server 192.168.200.13:6002/d1 re: Trying to GET /v1/AUTH_6a582ff797e841df89762ba3a63a1de3: Connection refused (txn: tx3fe
      f621123214658b235e-00581b0640) (client_ip: 10.19.105.15)
      Nov 03 09:41:20 overcloud-controller-0.localdomain proxy-server[18232]: ERROR with Account server 192.168.200.14:6002/d1 re: Trying to GET /v1/AUTH_6a582ff797e841df89762ba3a63a1de3: Connection refused (txn: tx3fe
      f621123214658b235e-00581b0640) (client_ip: 10.19.105.15)
      Nov 03 09:41:20 overcloud-controller-0.localdomain proxy-server[18232]: 10.19.105.15 10.19.105.15 03/Nov/2016/09/41/20 GET /v1/AUTH_6a582ff797e841df89762ba3a63a1de3%3Fformat%3Djson HTTP/1.0 200 - python-swiftclie
      nt-3.0.0 eb02e5e9f4a941b9... - 2 - tx3fef621123214658b235e-00581b0640 - 0.0118 - - 1478166080.043324947 1478166080.055134058 -
      Nov 03 09:41:20 overcloud-controller-0.localdomain proxy-server[18232]: ERROR with Account server 192.168.200.13:6002/d1 re: Trying to GET /v1/AUTH_e8c55b13b9744c38b9aea63e708bd04c: Connection refused (txn: tx548
      df06c7135404aa6a85-00581b0640) (client_ip: 10.19.105.15)
      Nov 03 09:41:20 overcloud-controller-0.localdomain proxy-server[18232]: ERROR with Account server 192.168.200.14:6002/d1 re: Trying to GET /v1/AUTH_e8c55b13b9744c38b9aea63e708bd04c: Connection refused (txn: tx548
      df06c7135404aa6a85-00581b0640) (client_ip: 10.19.105.15)
      Nov 03 09:41:20 overcloud-controller-0.localdomain proxy-server[18232]: 10.19.105.15 10.19.105.15 03/Nov/2016/09/41/20 GET /v1/AUTH_e8c55b13b9744c38b9aea63e708bd04c%3Fformat%3Djson HTTP/1.0 200 - python-swiftclie
      nt-3.0.0 eb02e5e9f4a941b9... - 2 - tx548df06c7135404aa6a85-00581b0640 - 0.0111 - - 1478166080.066014051 1478166080.077146053 -
      Nov 03 09:41:24 overcloud-controller-0.localdomain proxy-server[18232]: ERROR with Account server 192.168.200.14:6002/d1 re: Trying to GET /v1/AUTH_6a582ff797e841df89762ba3a63a1de3: Connection refused (txn: txb17
      0e18c344b4912ad0cb-00581b0644) (client_ip: 10.19.105.13)
      Nov 03 09:41:24 overcloud-controller-0.localdomain proxy-server[18232]: ERROR with Account server 192.168.200.13:6002/d1 re: Trying to GET /v1/AUTH_6a582ff797e841df89762ba3a63a1de3: Connection refused (txn: txb17
      0e18c344b4912ad0cb-00581b0644) (client_ip: 10.19.105.13)
      Nov 03 09:41:24 overcloud-controller-0.localdomain proxy-server[18232]: 10.19.105.13 10.19.105.13 03/Nov/2016/09/41/24 GET /v1/AUTH_6a582ff797e841df89762ba3a63a1de3%3Fformat%3Djson HTTP/1.0 200 - python-swiftclie
      nt-3.1.0 0ae5a6d614fc4e63... - 2 - txb170e18c344b4912ad0cb-00581b0644 - 0.0129 - - 1478166084.167977095 1478166084.180881023 -


Version-Release number of selected component (if applicable): this is with latest puddle from nov 2th 2016


How reproducible: sometimes.  It happened twice, but I've got a successful controller upgrade using the same puddle.


Steps to Reproduce:
1. osp9/10 upgrade
2. controller upgrade step

Comment 1 Sofer Athlan-Guyot 2016-11-03 10:56:22 UTC
Maybe make sure that swift-proxy is shutdown before swift-account ?

Comment 4 Marios Andreou 2016-11-04 16:56:32 UTC
As discussed on Lifecycle scrum today, it seems this issue should be fixed by some combination of the two reviews linked above - only one of which is landed into stable/newton at time of writing so holding off on POST... 

We agreed we'd let this run through QA and if we continue to hit it we can revisit.

Moving to ASSIGNED for now.

Comment 5 Kevin Jones 2016-11-16 20:17:09 UTC
I just hit this issue doing a OSP9 to OSP10 upgrade. I am following the steps in the RHOSP10 Director traning.

Running latest 10 puddle from rhos-release.

https://gitlab.cee.redhat.com/roxenham/director-osp10/blob/master/content/lab5-upgrades.md

openstack overcloud deploy --templates --ntp-server 10.16.255.1 \
    --control-scale 1 --compute-scale 2 --neutron-tunnel-types vxlan --neutron-network-type vxlan \
    --control-flavor control --compute-flavor compute -e \
    /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-pacemaker.yaml

[stack@undercloud ~]$ heat deployment-show 2d751553-9c01-4618-859a-3de8d0122728
WARNING (shell) "heat deployment-show" is deprecated, please use "openstack software deployment show" instead
{
  "status": "FAILED",
  "server_id": "85e19444-7d3e-429b-ad24-df8fc1a27731",
  "config_id": "ef6ff4ac-448c-4c6f-b63e-8f873718c862",
  "output_values": {
    "deploy_stdout": "mysql upgrade required: 0\nWed Nov 16 18:54:58 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop httpd\nWed Nov 16 18:54:59 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop memcached\nWed Nov 16 18:54:59 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop mongod\nWed Nov 16 18:55:00 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop neutron-dhcp-agent\nWed Nov 16 18:55:09 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop neutron-l3-agent\nWed Nov 16 18:55:15 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop neutron-metadata-agent\nWed Nov 16 18:55:15 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop neutron-netns-cleanup\nWed Nov 16 18:55:15 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop neutron-openvswitch-agent\nWed Nov 16 18:55:16 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop neutron-ovs-cleanup\nWed Nov 16 18:55:16 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop neutron-server\nWed Nov 16 18:55:45 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-aodh-evaluator\nWed Nov 16 18:55:46 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-aodh-listener\nWed Nov 16 18:55:47 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-aodh-notifier\nWed Nov 16 18:55:48 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-ceilometer-central\nWed Nov 16 18:59:34 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-ceilometer-collector\nWed Nov 16 18:59:34 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-ceilometer-notification\nWed Nov 16 18:59:34 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-cinder-api\nWed Nov 16 18:59:34 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-cinder-scheduler\nWed Nov 16 18:59:34 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-glance-api\nWed Nov 16 18:59:34 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-glance-registry\nWed Nov 16 18:59:34 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-gnocchi-metricd\nWed Nov 16 18:59:34 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-gnocchi-statsd\nWed Nov 16 18:59:35 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-heat-api-cfn\nWed Nov 16 18:59:35 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-heat-api\nWed Nov 16 18:59:35 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-heat-api-cloudwatch\nWed Nov 16 18:59:35 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-heat-engine\nWed Nov 16 18:59:35 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-nova-api\nWed Nov 16 18:59:35 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-nova-conductor\nWed Nov 16 18:59:35 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-nova-consoleauth\nWed Nov 16 18:59:35 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-nova-novncproxy\nWed Nov 16 18:59:35 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-nova-scheduler\nWed Nov 16 18:59:35 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-sahara-api\nWed Nov 16 18:59:35 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-sahara-engine\nWed Nov 16 18:59:35 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-account-reaper\nWed Nov 16 18:59:35 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-account-replicator\nWed Nov 16 18:59:36 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-account\nWed Nov 16 18:59:36 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-container-auditor\nWed Nov 16 18:59:36 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-container-replicator\nWed Nov 16 18:59:36 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-container-updater\nWed Nov 16 18:59:36 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-container\nWed Nov 16 18:59:36 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-object-auditor\nWed Nov 16 18:59:36 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-object-replicator\nWed Nov 16 18:59:36 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-object-updater\nWed Nov 16 18:59:36 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-object\nWed Nov 16 18:59:36 UTC 2016 ef6ff4ac-448c-4c6f-b63e-8f873718c862 tripleo-upgrade overcloud-controller-1 Going to systemctl stop openstack-swift-proxy\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nactive\nERROR: cluster shutdown timed out\n",
    "deploy_stderr": "",
    "deploy_status_code": 1
  },
  "creation_time": "2016-11-16T15:54:22Z",
  "updated_time": "2016-11-16T16:30:12Z",
  "input_values": {
    "update_identifier": "",
    "deploy_identifier": "1479310036"
  },
  "action": "CREATE",
  "status_reason": "deploy_status_code : Deployment exited with non-zero status code: 1",
  "id": "2d751553-9c01-4618-859a-3de8d0122728"
}


State of failed resources in Pacemaker


Failed Actions:
* memcached_monitor_60000 on overcloud-controller-2 'not running' (7): call=556, status=complete, exitreason='none',
    last-rc-change='Wed Nov 16 18:55:04 2016', queued=0ms, exec=0ms
* mongod_monitor_60000 on overcloud-controller-2 'not running' (7): call=590, status=complete, exitreason='none',
    last-rc-change='Wed Nov 16 18:55:09 2016', queued=0ms, exec=0ms
* openstack-aodh-evaluator_monitor_60000 on overcloud-controller-2 'not running' (7): call=741, status=complete, exitreason='none',
    last-rc-change='Wed Nov 16 18:56:21 2016', queued=0ms, exec=0ms
* openstack-aodh-listener_monitor_60000 on overcloud-controller-2 'not running' (7): call=742, status=complete, exitreason='none',
    last-rc-change='Wed Nov 16 18:56:21 2016', queued=0ms, exec=0ms
* openstack-aodh-notifier_monitor_60000 on overcloud-controller-2 'not running' (7): call=743, status=complete, exitreason='none',
    last-rc-change='Wed Nov 16 18:56:21 2016', queued=0ms, exec=0ms
* neutron-dhcp-agent_monitor_60000 on overcloud-controller-2 'not running' (7): call=763, status=complete, exitreason='none',
    last-rc-change='Wed Nov 16 18:55:31 2016', queued=0ms, exec=0ms
* neutron-openvswitch-agent_monitor_60000 on overcloud-controller-2 'not running' (7): call=764, status=complete, exitreason='none',
    last-rc-change='Wed Nov 16 18:55:31 2016', queued=0ms, exec=0ms
* neutron-server_monitor_60000 on overcloud-controller-2 'not running' (7): call=765, status=complete, exitreason='none',
    last-rc-change='Wed Nov 16 18:56:32 2016', queued=0ms, exec=0ms
* openstack-ceilometer-central_monitor_60000 on overcloud-controller-2 'OCF_PENDING' (196): call=766, status=complete, exitreason='none',
    last-rc-change='Wed Nov 16 18:56:32 2016', queued=0ms, exec=0ms
* httpd_monitor_60000 on overcloud-controller-2 'not running' (7): call=767, status=complete, exitreason='none',
    last-rc-change='Wed Nov 16 18:55:32 2016', queued=0ms, exec=0ms
* memcached_monitor_60000 on overcloud-controller-1 'not running' (7): call=560, status=complete, exitreason='none',
    last-rc-change='Wed Nov 16 18:55:04 2016', queued=0ms, exec=0ms
* mongod_monitor_60000 on overcloud-controller-1 'not running' (7): call=589, status=complete, exitreason='none',
    last-rc-change='Wed Nov 16 18:55:09 2016', queued=0ms, exec=0ms
* openstack-aodh-evaluator_monitor_60000 on overcloud-controller-1 'not running' (7): call=748, status=complete, exitreason='none',
    last-rc-change='Wed Nov 16 18:56:21 2016', queued=0ms, exec=0ms
* openstack-aodh-listener_monitor_60000 on overcloud-controller-1 'not running' (7): call=751, status=complete, exitreason='none',
    last-rc-change='Wed Nov 16 18:56:21 2016', queued=0ms, exec=0ms
* openstack-aodh-notifier_monitor_60000 on overcloud-controller-1 'not running' (7): call=753, status=complete, exitreason='none',
    last-rc-change='Wed Nov 16 18:56:21 2016', queued=0ms, exec=0ms
* neutron-dhcp-agent_monitor_60000 on overcloud-controller-1 'not running' (7): call=755, status=complete, exitreason='none',
    last-rc-change='Wed Nov 16 18:55:21 2016', queued=0ms, exec=0ms
* neutron-openvswitch-agent_monitor_60000 on overcloud-controller-1 'not running' (7): call=756, status=complete, exitreason='none',
    last-rc-change='Wed Nov 16 18:55:21 2016', queued=0ms, exec=0ms
* neutron-server_monitor_60000 on overcloud-controller-1 'not running' (7): call=759, status=complete, exitreason='none',
    last-rc-change='Wed Nov 16 18:56:21 2016', queued=0ms, exec=0ms
* openstack-ceilometer-central_monitor_60000 on overcloud-controller-1 'OCF_PENDING' (196): call=783, status=complete, exitreason='none',
    last-rc-change='Wed Nov 16 18:56:32 2016', queued=0ms, exec=0ms
* httpd_monitor_60000 on overcloud-controller-1 'not running' (7): call=784, status=complete, exitreason='none',
    last-rc-change='Wed Nov 16 18:55:32 2016', queued=0ms, exec=0ms

Comment 7 Kevin Jones 2016-11-16 22:06:07 UTC
Apparently you just have to run this step over and over again until it succeeds.

Note that in our RHOSP10-director class, I am the only one that went through the upgrade with 3 controllers and had this issue. All the rest did single controllers.

On my 5th run, I finally got an update complete.

2016-11-16 19:01:17Z [overcloud-UpdateWorkflow-fha4bedulma3]: UPDATE_COMPLETE  Stack UPDATE completed successfully
2016-11-16 19:01:18Z [UpdateWorkflow]: UPDATE_COMPLETE  state changed
2016-11-16 19:01:28Z [overcloud]: UPDATE_COMPLETE  Stack UPDATE completed successfully

 Stack overcloud UPDATE_COMPLETE

Started Mistral Workflow. Execution ID: b42b2c3d-db1c-4442-9a66-8fc7e0f03dfd
Overcloud Endpoint: http://172.16.0.30:5000/v2.0
Overcloud Deployed

Comment 9 Kevin Jones 2016-11-16 22:13:37 UTC
Jacob pointed out that I made a mistake when I entered comment 5. For that comment I copied the deploy command from the lab document which specifies one controller.

The command I was running that was failing and eventually succeeded had control scale of 3.

#!/usr/bin/env bash

cd ~

source stackrc

openstack overcloud deploy --templates --ntp-server 10.16.255.1 \
    --control-scale 3 --compute-scale 2 --neutron-tunnel-types vxlan --neutron-network-type vxlan \
    --control-flavor control --compute-flavor compute -e \
    /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-pacemaker.yaml

Comment 10 Marios Andreou 2016-11-17 07:43:06 UTC
(In reply to Kevin Jones from comment #7)
> Apparently you just have to run this step over and over again until it
> succeeds.

heh :) not quite, although to be fair, since the steps are idempotent, when any one fails indeed the standard procedure is to recover and re-run that step.

> 
> Note that in our RHOSP10-director class, I am the only one that went through
> the upgrade with 3 controllers and had this issue. All the rest did single
> controllers.
> 

thanks for your testing Kevin, I think you are hitting a race condition, will follow up with another comment to clear the needinfo.

Comment 11 Marios Andreou 2016-11-17 07:50:02 UTC
Hi Jacob and Kevin,

To be clear, Kevin hits this bug in a  3 controller HA setup, i.e. comment #5 is only wrong in that '--control-scale 1' should be "--control-scale 3" because Kevin indeed has controller HA setup (based on comment #8 and comment #9).

In this case it sounds to me like he *may* be hitting BZ 1389040 - disregard the 'SSL' in the title... see especially comments: BZ 1389040#c11 and BZ 1389040#c22 has an explanation of the race and the fix Michele put out at https://review.openstack.org/#/c/395460/

The fix for that BZ ^ landed in Newton but not into puddle, which is why we still carry it at https://gitlab.cee.redhat.com/sathlang/ospd-9-to-10-upgrade/blob/master/README.md - this is the working doc the Lifecycle team uses to document the *current* upgrade procedure meaning carrying any patches that didn't land into the puddle yet (and you can see for example application of that patch to the environment in https://gitlab.cee.redhat.com/sathlang/ospd-9-to-10-upgrade/blob/master/README.md#patches-workaround-1 ).

So try applying this on your undercloud before starting any of the ugprade (i.e. before upgrade init):

# controller and block storage upgrade pcs disruption: BZ 1389040
curl https://review.openstack.org/changes/395460/revisions/current/patch?download | \
    base64 -d | \
    sudo patch  -d /usr/share/openstack-tripleo-heat-templates/ -p1

Hope that helps, please check with that fix applied - note though if you have trouble applying that it may be it landed into puddle last night (haven't checked yet this morning).

thanks, marios

Comment 16 Jon Schlueter 2017-01-12 21:25:54 UTC
updating patch to stable/newton patch which is landed

Comment 17 Jon Schlueter 2017-01-12 21:28:55 UTC
According to our records the fixes for this have been included in already released packages

puppet-tripleo-5.4.0-4.el7ost
openstack-tripleo-heat-templates-5.1.0-7.el7ost

Comment 20 Amit Ugol 2017-04-05 10:15:34 UTC
Unable to reproduce with latest z3 build.


Note You need to log in before you can comment on or make changes to this bug.