Bug 1330885

Summary: openstack overcloud deploy always failed after config instance ha
Product: Red Hat OpenStack Reporter: jwang
Component: rhosp-directorAssignee: Angus Thomas <athomas>
Status: CLOSED NOTABUG QA Contact: Arik Chernetsky <achernet>
Severity: medium Docs Contact:
Priority: high    
Version: 7.0 (Kilo)CC: aschultz, dbecker, ipilcher, jniu, mburns, mcornea, morazi, pliu, raywang, rhel-osp-director-maint, rscarazz
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-10-03 15:36:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1342229    

Description jwang 2016-04-27 08:43:08 UTC
Description of problem:
openstack overcloud deploy always failed after config instance ha. 

Version-Release number of selected component (if applicable):
$ rpm -qa | grep openstack-triple
openstack-tripleo-image-elements-0.9.6-10.el7ost.noarch
openstack-tripleo-0.0.7-0.1.1664e566.el7ost.noarch
openstack-tripleo-common-0.0.1.dev6-6.git49b57eb.el7ost.noarch
openstack-tripleo-puppet-elements-0.0.1-5.el7ost.noarch
openstack-tripleo-heat-templates-0.8.6-123.el7ost.noarch

How reproducible:
run 'openstack overcloud deploy' after config instance ha

Steps to Reproduce:
1. openstack overcloud deploy
2. config instance ha manually
3. openstack overcloud deploy

Actual results:
openstack overcloud deploy fail

Expected results:
openstack overcloud deploy success

Additional info:
sh -x overcloud-deployment-cmri-extra-config.sh
+ openstack overcloud deploy --templates templates/openstack-tripleo-heat-templates/ -e templates/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e templates/network-environment.yaml -e templates/firstboot/firstboot.yaml -e templates/extra-config.yaml --control-flavor control --compute-flavor compute --control-scale 2 --compute-scale 2 --ceph-storage-scale 0 --neutron-network-type vlan --neutron-bridge-mappings datacentre:br-ex,tenantplane:br-tenant --neutron-network-vlan-ranges datacentre,tenantplane:3011:3020 --neutron-disable-tunneling --ntp-server 172.16.0.1
Deploying templates in the directory /home/stack/templates/openstack-tripleo-heat-templates
Stack failed with status: resources.ControllerNodesPostDeployment: resources.ControllerPostPuppet: resources.ControllerPostPuppetRestartDeployment: Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 1
ERROR: openstack Heat Stack update failed.

real    21m6.692s
user    0m1.344s
sys     0m0.210s

heat resource-list -n3 overcloud | grep -v COMP                                                     Wed Apr 27 05:54:07 2016

+-----------------------------------------------+-----------------------------------------------+----------------------------------------
-----------+-----------------+----------------------+-----------------------------------------------+
| resource_name                                 | physical_resource_id                          | resource_type
           | resource_status | updated_time         | parent_resource                               |
+-----------------------------------------------+-----------------------------------------------+----------------------------------------
-----------+-----------------+----------------------+-----------------------------------------------+
| ControllerNodesPostDeployment                 | 33562114-2be8-4a97-9688-b28a8765505c          | OS::TripleO::ControllerPostDeployment
           | UPDATE_FAILED   | 2016-04-27T07:57:15Z |                                               |
| ControllerPostPuppet                          | 25962495-a15d-455a-a5a7-d0bd8938f496          | OS::TripleO::Tasks::ControllerPostPuppe
t          | UPDATE_FAILED   | 2016-04-27T08:06:19Z | ControllerNodesPostDeployment                 |
| ControllerPostPuppetRestartDeployment         | 13d302e1-864e-4c7c-9ec8-bf81fe5ed28f          | OS::Heat::SoftwareDeployments
           | UPDATE_FAILED   | 2016-04-27T08:07:13Z | ControllerPostPuppet                          |
| 0                                             | 17eb8c45-06ed-4924-8a3c-4b367db1b2c4          | OS::Heat::SoftwareDeployment
           | CREATE_FAILED   | 2016-04-27T08:07:15Z | ControllerPostPuppetRestartDeployment         |
+-----------------------------------------------+-----------------------------------------------+----------------------------------------
-----------+-----------------+----------------------+-----------------------------------------------+

heat deployment-output-show 17eb8c45-06ed-4924-8a3c-4b367db1b2c4 deploy_stderr | tail -20 
+ ((  1461741002 < 1461741006  ))
++ pcs status --full
++ grep openstack-keystone
++ grep -v Clone
+ node_states='     openstack-keystone  (systemd:openstack-keystone):   Started overcloud-controller-1
     openstack-keystone (systemd:openstack-keystone):   Started overcloud-controller-0
     openstack-keystone (systemd:openstack-keystone):   Stopped
     openstack-keystone (systemd:openstack-keystone):   Stopped'
+ echo '     openstack-keystone (systemd:openstack-keystone):   Started overcloud-controller-1
     openstack-keystone (systemd:openstack-keystone):   Started overcloud-controller-0
     openstack-keystone (systemd:openstack-keystone):   Stopped
     openstack-keystone (systemd:openstack-keystone):   Stopped'
+ grep -q Stopped
+ echo 'openstack-keystone not yet started, sleeping 3 seconds.'
+ sleep 3
++ date +%s
+ ((  1461741006 < 1461741006  ))
+ echo 'openstack-keystone never started after 300 seconds'
+ exit 1

The reason is clear that if config instance ha then keystone is Started on 2 controller node and Stopped on 2 Compute Node. It's normal.

$ nova list 
+--------------------------------------+------------------------+--------+------------+-------------+-----------------------+
| ID                                   | Name                   | Status | Task State | Power State | Networks              |
+--------------------------------------+------------------------+--------+------------+-------------+-----------------------+
| 96b99175-7472-4b68-b6c5-eccc8390a7f8 | overcloud-compute-0    | ACTIVE | -          | Running     | ctlplane=172.16.0.114 |
| 914074ca-113f-4413-9bb8-a86063ab25aa | overcloud-compute-1    | ACTIVE | -          | Running     | ctlplane=172.16.0.113 |
| 9e2eeb4b-3a9e-47f9-a9b5-933e656a2bd2 | overcloud-controller-0 | ACTIVE | -          | Running     | ctlplane=172.16.0.112 |
| 3a1dd5be-4406-4a67-9b94-211fd9fdeb63 | overcloud-controller-1 | ACTIVE | -          | Running     | ctlplane=172.16.0.115 |
+--------------------------------------+------------------------+--------+------------+-------------+-----------------------+

[stack@undercloud ~]$ ssh heat-admin.0.112
[heat-admin@overcloud-controller-0 ~]$ sudo su - 
[root@overcloud-controller-0 ~]# pcs status | grep keystone -A2 
 Clone Set: openstack-keystone-clone [openstack-keystone]
     Started: [ overcloud-controller-0 overcloud-controller-1 ]
     Stopped: [ overcloud-compute-0 overcloud-compute-1 ]

Comment 2 Raoul Scarazzini 2017-10-03 15:36:58 UTC
Hi,
as you can see from the instance HA documentation [1], inside the requirements is stated that "No overcloud stack updates will be run following the configuration of instance HA".
This means that if you want to do upgrades or updates after applying the steps you need first to remove all the configuration.
This can be done via Ansible playbooks but starting from OSP8 [2], so in OSP7 it is still a manual process.

[1] https://access.redhat.com/articles/1544823
[2] https://access.redhat.com/documentation/en/red-hat-openstack-platform/8/single/high-availability-for-compute-instances/