Bug 1466725

Summary: M->N upgrade being stuck on major-upgrade-ceilometer-wsgi-mitaka-newton.yaml
Product: Red Hat OpenStack Reporter: Yolanda Robla <yroblamo>
Component: openstack-tripleoAssignee: Sofer Athlan-Guyot <sathlang>
Status: CLOSED DUPLICATE QA Contact: Marius Cornea <mcornea>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 10.0 (Newton)CC: mburns, rhel-osp-director-maint, sathlang, yroblamo
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-06-30 12:05:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Yolanda Robla 2017-06-30 10:32:02 UTC
Description of problem:

When executing the command to upgrade ceilometer

 openstack overcloud deploy --templates  --control-scale 3 --compute-scale 1  -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml  -e /usr/share/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml -e /home/stack/network-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-ceilometer-wsgi-mitaka-newton.yaml --ntp-server 135.248.16.241

If failed at first moment with

heat deployment-show d32f7b76-7c28-484d-82b4-9a77789b46c5
WARNING (shell) "heat deployment-show" is deprecated, please use "openstack software deployment show" instead
{
  "status": "FAILED", 
  "server_id": "728f11dd-1d78-4675-b007-549390f8d34d", 
  "config_id": "65098308-e0d3-4676-9d14-30cee9348512", 
  "output_values": {
    "deploy_stdout": "Thu Jun 29 15:27:03 UTC 2017 65098308-e0d3-4676-9d14-30cee9348512 tripleo-upgrade overcloud-controller-0 Going to pcs resource disable openstack-ceilometer-api\nThu Jun 29 15:27:04 UTC 2017 65098308-e0d3-4676-9d14-30cee9348512 tripleo-upgrade overcloud-controller-0 Node is bootstrap checking openstack-ceilometer-api to be stopped here\nERROR: cluster remained unstable for more than 600 seconds, exiting.\n", 
    "deploy_stderr": "", 
    "deploy_status_code": 1
  }, 
  "creation_time": "2017-06-29T14:49:23Z", 
  "updated_time": "2017-06-29T15:01:22Z", 
  "input_values": {}, 
  "action": "CREATE", 
  "status_reason": "deploy_status_code : Deployment exited with non-zero status code: 1", 
  "id": "d32f7b76-7c28-484d-82b4-9a77789b46c5"
}

The ceilometer services gave timeout on stopping them.

However, i entered in the controllers manually. I did a pcs resource cleanup, and now ceilometer services look at stopped.
However, if i re-run the ceilometer upgrade command, heat doesn't seem to be doing anything. It just fails fast, and doesn't give any new error, i just can see the same entries in heat stack-list and heat deployment-list.

Also looking at journalctl -u os-collect-config in the controller, heat is not doing anything.

Heat engine log: https://da.gd/2Qhu -> https://paste.fedoraproject.org/paste/5LAi56pk9mSvMhECEsQ-Yw/

Comment 3 Sofer Athlan-Guyot 2017-06-30 12:05:08 UTC
Hi Yolanda,

last time I heard, 10z4 is planned for 16-Aug: Release 10z4, but Mike Burns may have more current info (mburns on #rhos-mgt).

I'm closing this one as duplicate of 1443186.

*** This bug has been marked as a duplicate of bug 1443186 ***