1306633 – Heat related pacemaker resources are stopped

Bug 1306633 - Heat related pacemaker resources are stopped

Summary: Heat related pacemaker resources are stopped

Keywords:
Status:	CLOSED DUPLICATE of bug 1568037
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	rhosp-director
Sub Component:
Version:	7.0 (Kilo)
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	y3
Target Release:	7.0 (Kilo)
Assignee:	Ben Nemec
QA Contact:	Marius Cornea
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-02-11 13:50 UTC by Marius Cornea
Modified:	2019-12-16 05:24 UTC (History)
CC List:	17 users (show)
Fixed In Version:
Doc Type:	Release Note
Doc Text:	In order to configure SSL on the overcloud with an external loadbalancer, it is necessary to pass an environment file containing _only_ the EndpointMap from environments/enable-tls.yaml. The resulting environment file should look something like: parameter_defaults: EndpointMap: CeilometerAdmin: {protocol: 'http', port: '8777', host: 'IP_ADDRESS'} CeilometerInternal: {protocol: 'http', port: '8777', host: 'IP_ADDRESS'} CeilometerPublic: {protocol: 'https', port: '13777', host: 'CLOUDNAME'} [entries for all the endpoints] Nothing else from enable-tls.yaml should be included. Otherwise the same information regarding things like self-signed certificates and DNS vs. direct IP endpoints applies to an external loadbalancer SSL setup as a regular SSL deployment. Note that the external loadbalancer must be configured to listen on the ports defined in the EndpointMap.
Clone Of:
Environment:
Last Closed:	2018-07-20 19:42:00 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Marius Cornea 2016-02-11 13:50:54 UTC

Description of problem:
I am doing a 3 ctrl + 1 compute node deployment with an external ssl enabled load balancer on ipv4. After deployment is done all the pacemaker Heat resources are stopped thus the heat service is not accessible.

Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-0.8.6-119.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
openstack overcloud deploy --templates ~/templates/my-overcloud  \
-e ~/templates/my-overcloud/environments/network-isolation.yaml \
-e ~/templates/network-environment.yaml \
-e ~/templates/enable-tls-external-lb.yaml \
-e ~/templates/inject-trust-anchor.yaml  \
-e ~/templates/my-overcloud/environments/external-loadbalancer-vip.yaml \
-e ~/templates/external-lb.yaml \
-e /home/stack/templates/firstboot-environment.yaml \
--control-scale 3 \
--compute-scale 1 \
--ntp-server 10.5.26.10 \
--libvirt-type qemu 

stack@instack:~>>> cat ~/templates/network-environment.yaml
resource_registry:
  OS::TripleO::BlockStorage::Net::SoftwareConfig: /home/stack/templates/nic-configs/cinder-storage.yaml
  OS::TripleO::Compute::Net::SoftwareConfig: /home/stack/templates/nic-configs/compute.yaml
  OS::TripleO::Controller::Net::SoftwareConfig: /home/stack/templates/nic-configs/controller.yaml
  OS::TripleO::ObjectStorage::Net::SoftwareConfig: /home/stack/templates/nic-configs/swift-storage.yaml
  OS::TripleO::CephStorage::Net::SoftwareConfig: /home/stack/templates/nic-configs/ceph-storage.yaml

parameter_defaults:
  InternalApiNetCidr: 172.16.20.0/24
  StorageNetCidr: 172.16.21.0/24
  StorageMgmtNetCidr: 172.16.19.0/24
  TenantNetCidr: 172.16.22.0/24
  ExternalNetCidr: 172.16.23.0/24
  InternalApiAllocationPools: [{'start': '172.16.20.10', 'end': '172.16.20.100'}]
  StorageAllocationPools: [{'start': '172.16.21.10', 'end': '172.16.21.100'}]
  StorageMgmtAllocationPools: [{'start': '172.16.19.10', 'end': '172.16.19.100'}]
  TenantAllocationPools: [{'start': '172.16.22.10', 'end': '172.16.22.100'}]
  ExternalAllocationPools: [{'start': '172.16.23.10', 'end': '172.16.23.100'}]
  ExternalInterfaceDefaultRoute: 172.16.23.251
  NeutronExternalNetworkBridge: "''"
  ControlPlaneSubnetCidr: "24"
  ControlPlaneDefaultRoute: 192.0.2.1
  EC2MetadataIp: 192.0.2.1
  DnsServers: ["10.16.36.29","10.11.5.19"]  
  CloudName: rxtx.ro
stack@instack:~>>> cat ~/templates/external-lb.yaml
parameters:
  ServiceNetMap:
    NeutronTenantNetwork: tenant
    CeilometerApiNetwork: internal_api
    MongoDbNetwork: internal_api
    CinderApiNetwork: internal_api
    CinderIscsiNetwork: storage
    GlanceApiNetwork: storage
    GlanceRegistryNetwork: internal_api
    KeystoneAdminApiNetwork: internal_api
    KeystonePublicApiNetwork: internal_api
    NeutronApiNetwork: internal_api
    HeatApiNetwork: internal_api
    NovaApiNetwork: internal_api
    NovaMetadataNetwork: internal_api
    NovaVncProxyNetwork: internal_api
    SwiftMgmtNetwork: storage_mgmt
    SwiftProxyNetwork: storage
    HorizonNetwork: internal_api
    MemcachedNetwork: internal_api
    RabbitMqNetwork: internal_api
    RedisNetwork: internal_api
    MysqlNetwork: internal_api
    CephClusterNetwork: storage_mgmt
    CephPublicNetwork: storage
    ControllerHostnameResolveNetwork: internal_api
    ComputeHostnameResolveNetwork: internal_api
    BlockStorageHostnameResolveNetwork: internal_api
    ObjectStorageHostnameResolveNetwork: internal_api
    CephStorageHostnameResolveNetwork: storage

parameter_defaults:
  ControlPlaneIP: 192.0.2.250
  ExternalNetworkVip: 172.16.23.250
  InternalApiNetworkVip: 172.16.20.250
  StorageNetworkVip: 172.16.21.250
  StorageMgmtNetworkVip: 172.16.19.250
  ServiceVips:
    redis: 172.16.20.249
  ControllerIPs:
    external_cidr: "24"
    internal_api_cidr: "24"
    storage_cidr: "24"
    storage_mgmt_cidr: "24"
    tenant_cidr: "24"
    external:
    - 172.16.23.150
    - 172.16.23.151
    - 172.16.23.152
    internal_api:
    - 172.16.20.150
    - 172.16.20.151
    - 172.16.20.152
    storage:
    - 172.16.21.150
    - 172.16.21.151
    - 172.16.21.152
    storage_mgmt:
    - 172.16.19.150
    - 172.16.19.151
    - 172.16.19.152
    tenant:
    - 172.16.22.150
    - 172.16.22.151
    - 172.16.22.152


Actual results:

[root@overcloud-controller-0 ~]# pcs status | grep -A1 heat
 Clone Set: openstack-heat-engine-clone [openstack-heat-engine]
     Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
--
 Clone Set: openstack-heat-api-clone [openstack-heat-api]
     Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
--
 Clone Set: openstack-heat-api-cloudwatch-clone [openstack-heat-api-cloudwatch]
     Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
--
 Clone Set: openstack-heat-api-cfn-clone [openstack-heat-api-cfn]
     Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]


Expected results:
The heat services are running.

Comment 2 Marius Cornea 2016-02-11 14:15:11 UTC

I think this is related to BZ#1306623 as ceilometer fails to start and there is the following constraint:

  start openstack-ceilometer-notification-clone then start openstack-heat-api-clone (kind:Mandatory)

Comment 3 Zane Bitter 2016-02-11 14:38:31 UTC

I'm going to close this as a duplicate of bz1306623 then, feel free to reopen it if it turns out that is not the cause.

*** This bug has been marked as a duplicate of bug 1306623 ***

Comment 4 Marius Cornea 2016-02-12 18:59:55 UTC

Reopening this one - after deployment is finished heat-engine failed to start:

[root@overcloud-controller-2 ~]# pcs status | grep -A1 heat-engine
 Clone Set: openstack-heat-engine-clone [openstack-heat-engine]
     Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
--
* openstack-heat-engine_start_0 on overcloud-controller-1 'not running' (7): call=392, status=complete, exitreason='none',
    last-rc-change='Fri Feb 12 18:35:14 2016', queued=0ms, exec=2086ms
* openstack-heat-engine_start_0 on overcloud-controller-2 'not running' (7): call=381, status=complete, exitreason='none',
    last-rc-change='Fri Feb 12 18:35:14 2016', queued=0ms, exec=2058ms
* openstack-heat-engine_start_0 on overcloud-controller-0 'not running' (7): call=397, status=complete, exitreason='none',
    last-rc-change='Fri Feb 12 18:35:14 2016', queued=0ms, exec=2022ms

I updated to sosreports in the location mentioned in comment#1.

Comment 5 Marius Cornea 2016-02-12 19:02:41 UTC

Workaround to get it started:

pcs resource cleanup openstack-heat-engine
pcs resource restart openstack-heat-engine

Comment 6 Marios Andreou 2016-02-15 14:29:22 UTC

Been poking at the sos reports for this. It isn't afaics related to  BZ#1306623 as per comment 3 since ceilometer is running here and only heat-engine isn't. There is another similar 'heat-engine crashed' bz at 1307125 which I thought may be related but heat-engine fails for a different reason here (not because of a DBConnectionError). Also to clarify since in the description it says "all the pacemaker Heat resources are stopped" afaics only heat-engine is stopped. I *think* there may be an issue with the ssl certificates, more info below.

From both controller-0 and controller-1 I see a problem with cloud-init:

Feb 12 17:59:20 overcloud-controller-1.localdomain cloud-init[2926]: [CLOUDINIT] stages.py[DEBUG]: Running module ssh-authkey-fingerprints (<module 'cloudinit.config.cc_ssh_authkey_fingerprints' from '/usr/lib/python2.7/site-packages/cloudinit/config/cc_ssh_authkey_fingerprints.pyc'>) with frequency once-per-instance
Feb 12 17:59:20 overcloud-controller-1.localdomain cloud-init[2926]: [CLOUDINIT] util.py[DEBUG]: Writing to /var/lib/cloud/instances/4515569f-12ad-498e-bb0a-853fff2c8c0b/sem/config_ssh_authkey_fingerprints - wb: [420] 20 bytes
Feb 12 17:59:20 overcloud-controller-1.localdomain cloud-init[2926]: [CLOUDINIT] util.py[DEBUG]: Restoring selinux mode for /var/lib/cloud/instances/4515569f-12ad-498e-bb0a-853fff2c8c0b/sem/config_ssh_authkey_fingerprints (recursive=False)
Feb 12 17:59:20 overcloud-controller-1.localdomain cloud-init[2926]: [CLOUDINIT] util.py[DEBUG]: Restoring selinux mode for /var/lib/cloud/instances/4515569f-12ad-498e-bb0a-853fff2c8c0b/sem/config_ssh_authkey_fingerprints (recursive=False)
Feb 12 17:59:20 overcloud-controller-1.localdomain cloud-init[2926]: [CLOUDINIT] helpers.py[DEBUG]: Running config-ssh-authkey-fingerprints using lock (<FileLock using file '/var/lib/cloud/instances/4515569f-12ad-498e-bb0a-853fff2c8c0b/sem/config_ssh_authkey_fingerprints'>)
Feb 12 17:59:20 overcloud-controller-1.localdomain cloud-init[2926]: [CLOUDINIT] util.py[DEBUG]: Reading from /etc/ssh/sshd_config (quiet=False)
Feb 12 17:59:20 overcloud-controller-1.localdomain cloud-init[2926]: [CLOUDINIT] util.py[DEBUG]: Read 4359 bytes from /etc/ssh/sshd_config
Feb 12 17:59:20 overcloud-controller-1.localdomain cloud-init[2926]: [CLOUDINIT] util.py[DEBUG]: Restoring selinux mode for /home/heat-admin/.ssh (recursive=True)
Feb 12 17:59:20 overcloud-controller-1.localdomain cloud-init[2926]: [CLOUDINIT] util.py[DEBUG]: Reading from /home/heat-admin/.ssh/authorized_keys (quiet=False)
Feb 12 17:59:20 overcloud-controller-1.localdomain cloud-init[2926]: [CLOUDINIT] util.py[DEBUG]: Read 407 bytes from /home/heat-admin/.ssh/authorized_keys
Feb 12 17:59:20 overcloud-controller-1.localdomain cloud-init[2926]: 2016-02-12 12:59:20,144 - util.py[WARNING]: Running module ssh-authkey-fingerprints (<module 'cloudinit.config.cc_ssh_authkey_fingerprints' from '/usr/lib/python2.7/site-packages/cloudinit/config/cc_ssh_authkey_fingerprints.pyc'>) failed
Feb 12 17:59:20 overcloud-controller-1.localdomain cloud-init[2926]: [CLOUDINIT] util.py[WARNING]: Running module ssh-authkey-fingerprints (<module 'cloudinit.config.cc_ssh_authkey_fingerprints' from '/usr/lib/python2.7/site-packages/cloudinit/config/cc_ssh_authkey_fingerprints.pyc'>) failed
Feb 12 17:59:20 overcloud-controller-1.localdomain cloud-init[2926]: [CLOUDINIT] util.py[DEBUG]: Running module ssh-authkey-fingerprints (<module 'cloudinit.config.cc_ssh_authkey_fingerprints' from '/usr/lib/python2.7/site-packages/cloudinit/config/cc_ssh_authkey_fingerprints.pyc'>) failed
                                                                     Traceback (most recent call last):
                                                                       File "/usr/lib/python2.7/site-packages/cloudinit/stages.py", line 660, in _run_modules
                                                                         cc.run(run_name, mod.handle, func_args, freq=freq)
                                                                       File "/usr/lib/python2.7/site-packages/cloudinit/cloud.py", line 63, in run
                                                                         return self._runners.run(name, functor, args, freq, clear_on_fail)
                                                                       File "/usr/lib/python2.7/site-packages/cloudinit/helpers.py", line 197, in run
                                                                         results = functor(*args)
                                                                       File "/usr/lib/python2.7/site-packages/cloudinit/config/cc_ssh_authkey_fingerprints.py", line 105, in handle
                                                                         key_entries, hash_meth)
                                                                       File "/usr/lib/python2.7/site-packages/cloudinit/config/cc_ssh_authkey_fingerprints.py", line 91, in _pprint_key_entries
                                                                         stderr=False, console=True)
                                                                       File "/usr/lib/python2.7/site-packages/cloudinit/util.py", line 346, in multi_log
                                                                         wfh.flush()
                                                                     IOError: [Errno 5] Input/output error


==============================================

From pcs status, *only* heat-engine is down (Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]). There is nothing interesting in heat-engine.log itself - in its entirety like:

 2016-02-12 18:18:36.947 1244 WARNING heat.common.config [-] stack_user_domain_id or stack_user_domain_name not set in heat.conf falling back to using default
2016-02-12 18:18:38.506 1244 WARNING oslo_config.cfg [req-88da6c7f-af50-48e6-b4ef-51e0628b94af - -] Option "db_backend" from group "DEFAULT" is deprecated. Use option "backend" from group "database".
2016-02-12 18:24:29.207 19065 WARNING oslo_config.cfg [req-7656b9c2-49ea-4f8d-95c4-360911b4be0e - -] Option "db_backend" from group "DEFAULT" is deprecated. Use option "backend" from group "database".
2016-02-12 18:35:20.780 20580 WARNING oslo_config.cfg [req-429ee551-ea41-4ff4-aefa-7c99a8347d9b - -] Option "db_backend" from group "DEFAULT" is deprecated. Use option "backend" from group "database".


but in heat-api.log there is an error like

2016-02-12 18:38:59.951 20289 ERROR heat.common.wsgi [req-14f06bed-09fb-49ac-a44a-f8443152c4a2 c57fcbb893e2455bb9192861a8785083 1dea66563aba4c51be0618285f906fe8] Unexpected error occurred serving API: Timed out waiting for a reply to message ID 96fbcd0f8f9a4206bac74f2881f358ab

==============================================

From os-collect-config, heat-config fails with 'unable to load certificate...':

Feb 12 18:01:44 overcloud-controller-0.localdomain os-collect-config[8315]: [2016-02-12 13:01:44,327] (heat-config) [INFO] {"key_modulus": "d41d8cd98f00b204e9800998ecf8427e\n", "deploy_stdout": "", "deploy_stderr": "unable to load certificate\n139749505574816:error:0906D06C:PEM routines:PEM_read_bio:no start line:pem_lib.c:703:Expecting: TRUSTED CERTIFICATE\nunable to load Private Key\n140472523294624:error:0906D06C:PEM routines:PEM_read_bio:no start line:pem_lib.c:703:Expecting: ANY PRIVATE KEY\n", "chain_md5sum": "68b329da9893e34099c7d8ad5cb9c940  /etc/pki/tls/private/overcloud_endpoint.pem\n", "cert_modulus": "d41d8cd98f00b204e9800998ecf8427e\n", "deploy_status_code": 0}

Feb 12 18:01:44 overcloud-controller-0.localdomain os-collect-config[8315]: 139749505574816:error:0906D06C:PEM routines:PEM_read_bio:no start line:pem_lib.c:703:Expecting: TRUSTED CERTIFICATE
Feb 12 18:01:44 overcloud-controller-0.localdomain os-collect-config[8315]: unable to load Private Key
Feb 12 18:01:44 overcloud-controller-0.localdomain os-collect-config[8315]: 140472523294624:error:0906D06C:PEM routines:PEM_read_bio:no start line:pem_lib.c:703:Expecting: ANY PRIVATE KEY
Feb 12 18:01:44 overcloud-controller-0.localdomain os-collect-config[8315]: [2016-02-12 13:01:44,32


@mcornea, is the TLS cert being set correctly with the -e ~/templates/enable-tls-external-lb.yaml \
-e ~/templates/inject-trust-anchor.yaml  \   (i.e. can you sanity check the certificate data being passed).

Comment 7 Marius Cornea 2016-02-15 15:30:47 UTC

Indeed I am passing an empty string for the certificates and key in the enable-tls-external-lb.yaml:

parameter_defaults:
  SSLCertificate: ""
  SSLIntermediateCertificate: ""
  SSLKey: ""

This is intended though as the certificates get configured on the external load balancer and communication between the external load balancer and the controllers is unencrypted. 

I can try a deployment by passing all the certs and key data in enable-tls-external-lb.yaml to rule out if it is caused by this. Will update asap.

Comment 8 Ben Nemec 2016-02-15 16:45:33 UTC

I am assuming enable-tls-external-lb.yaml is based on the enable-tls.yaml environment in tree, and that probably won't work.  The _only_ thing that needs to be set for external loadbalancer ssl is the EndpointMap.  The rest of enable-tls.yaml is either meaningless or downright harmful.  The environment file should look something like this:

parameter_defaults:
  EndpointMap:
    [endpoint map entries from enable-tls.yaml]

Nothing else should be needed there.  The only other thing that might need to be passed is the inject-trust-anchor.yaml if the certificate is self-signed.  In that case the included environment file can be used as-is (with the appropriate value filled in of course).

Comment 14 Ben Nemec 2016-02-17 20:38:46 UTC

Added release note doctext explaining how this should work.

Comment 15 Ben Nemec 2016-02-17 20:40:16 UTC

Trying again without the collision due to not refreshing the page...

Comment 16 Marius Cornea 2016-02-18 11:51:45 UTC

Passing the enable-tls.yaml below results in:

Deploying templates in the directory /home/stack/templates/my-overcloud
Stack failed with status: Resource CREATE failed: resources[1]: resources.Controller.Property error: resources.NodeTLSData.properties: Property SSLKey not assigned
ERROR: openstack Heat Stack create failed.

parameter_defaults:
  EndpointMap:
    CeilometerAdmin: {protocol: 'http', port: '8777', host: 'IP_ADDRESS'}
    CeilometerInternal: {protocol: 'http', port: '8777', host: 'IP_ADDRESS'}
    CeilometerPublic: {protocol: 'https', port: '13777', host: 'CLOUDNAME'}
    CinderAdmin: {protocol: 'http', port: '8776', host: 'IP_ADDRESS'}
    CinderInternal: {protocol: 'http', port: '8776', host: 'IP_ADDRESS'}
    CinderPublic: {protocol: 'https', port: '13776', host: 'CLOUDNAME'}
    GlanceAdmin: {protocol: 'http', port: '9292', host: 'IP_ADDRESS'}
    GlanceInternal: {protocol: 'http', port: '9292', host: 'IP_ADDRESS'}
    GlancePublic: {protocol: 'https', port: '13292', host: 'CLOUDNAME'}
    GlanceRegistryAdmin: {protocol: 'http', port: '9191', host: 'IP_ADDRESS'}
    GlanceRegistryInternal: {protocol: 'http', port: '9191', host: 'IP_ADDRESS'}
    GlanceRegistryPublic: {protocol: 'https', port: '9191', host: 'IP_ADDRESS'} # Not set on the loadbalancer yet.
    HeatAdmin: {protocol: 'http', port: '8004', host: 'IP_ADDRESS'}
    HeatInternal: {protocol: 'http', port: '8004', host: 'IP_ADDRESS'}
    HeatPublic: {protocol: 'https', port: '13004', host: 'CLOUDNAME'}
    HorizonPublic: {protocol: 'https', port: '443', host: 'CLOUDNAME'}
    KeystoneAdmin: {protocol: 'http', port: '35357', host: 'IP_ADDRESS'}
    KeystoneInternal: {protocol: 'http', port: '5000', host: 'IP_ADDRESS'}
    KeystonePublic: {protocol: 'https', port: '13000', host: 'CLOUDNAME'}
    NeutronAdmin: {protocol: 'http', port: '9696', host: 'IP_ADDRESS'}
    NeutronInternal: {protocol: 'http', port: '9696', host: 'IP_ADDRESS'}
    NeutronPublic: {protocol: 'https', port: '13696', host: 'CLOUDNAME'}
    NovaAdmin: {protocol: 'http', port: '8774', host: 'IP_ADDRESS'}
    NovaInternal: {protocol: 'http', port: '8774', host: 'IP_ADDRESS'}
    NovaPublic: {protocol: 'https', port: '13774', host: 'CLOUDNAME'}
    NovaEC2Admin: {protocol: 'http', port: '8773', host: 'IP_ADDRESS'}
    NovaEC2Internal: {protocol: 'http', port: '8773', host: 'IP_ADDRESS'}
    NovaEC2Public: {protocol: 'https', port: '13773', host: 'CLOUDNAME'}
    NovaVNCProxyAdmin: {protocol: 'http', port: '6080', host: 'IP_ADDRESS'}
    NovaVNCProxyInternal: {protocol: 'http', port: '6080', host: 'IP_ADDRESS'}
    NovaVNCProxyPublic: {protocol: 'https', port: '13080', host: 'CLOUDNAME'}
    SwiftAdmin: {protocol: 'http', port: '8080', host: 'IP_ADDRESS'}
    SwiftInternal: {protocol: 'http', port: '8080', host: 'IP_ADDRESS'}
    SwiftPublic: {protocol: 'https', port: '13808', host: 'CLOUDNAME'}

resource_registry:
  OS::TripleO::NodeTLSData: /home/stack/templates/my-overcloud/puppet/extraconfig/tls/tls-cert-inject.yaml

Comment 17 Angus Thomas 2016-02-18 13:12:49 UTC

Hi Marius,

Does the workaround from Comment 5 still work?


Angus

Comment 18 Marius Cornea 2016-02-18 13:17:50 UTC

(In reply to Angus Thomas from comment #17)
> Hi Marius,
> 
> Does the workaround from Comment 5 still work?
> 
> 
> Angus

Not when using an enable-tls.yaml according to comment#16. It fails early so the overcloud doesn't get deployed. 

I'm currently trying to pass all the certificates and key in the enable-tls.yaml environment and get back with the result.

Comment 19 Marius Cornea 2016-02-18 19:33:59 UTC

I did some more testing and I reached the conclusion that the issue was caused by an undersized virtual host where the environment was running. I ran the same tests on a beefier hardware and I wasn't able to reproduce it. So I'm closing this as not a bug.

The discussion about the right contents of enable-tls.yaml remains open but let's move it to the docs BZ:
https://bugzilla.redhat.com/show_bug.cgi?id=1307045

Comment 29 Ben Nemec 2017-03-15 15:11:32 UTC

Since the customer case this was reopened for is now closed, I'm going to re-close the bug.

Comment 30 Attila Fazekas 2018-04-13 13:30:38 UTC

The issue happening on 32GB memory controller with SSD .

I guess pcs try to start the openstack-heat-engine in wrong time, when not everything is ready ..

The pcs status (from CI not from a live system):

+ pcs status
Cluster name: tripleo_cluster
Stack: corosync
Current DC: controller-0 (version 1.1.16-12.el7_4.5-94ff4df) - partition with quorum
Last updated: Fri Apr 13 10:12:43 2018
Last change: Fri Apr 13 09:40:52 2018 by root via cibadmin on controller-0

1 node configured
42 resources configured

Online: [ controller-0 ]

Full list of resources:

 ip-172.17.1.10	(ocf::heartbeat:IPaddr2):	Started controller-0
 ip-172.17.4.10	(ocf::heartbeat:IPaddr2):	Started controller-0
 ip-192.168.24.6	(ocf::heartbeat:IPaddr2):	Started controller-0
 Clone Set: haproxy-clone [haproxy]
     Started: [ controller-0 ]
 ip-172.17.3.10	(ocf::heartbeat:IPaddr2):	Started controller-0
 ip-10.0.0.101	(ocf::heartbeat:IPaddr2):	Started controller-0
 Master/Slave Set: galera-master [galera]
     Masters: [ controller-0 ]
 ip-172.17.1.11	(ocf::heartbeat:IPaddr2):	Started controller-0
 Master/Slave Set: redis-master [redis]
     Masters: [ controller-0 ]
...
...

 Clone Set: openstack-ceilometer-alarm-notifier-clone [openstack-ceilometer-alarm-notifier]
     Started: [ controller-0 ]
 Clone Set: openstack-heat-engine-clone [openstack-heat-engine]
     Stopped: [ controller-0 ]
 Clone Set: openstack-ceilometer-api-clone [openstack-ceilometer-api]
     Started: [ controller-0 ]
 Clone Set: neutron-metadata-agent-clone [neutron-metadata-agent]
     Started: [ controller-0 ]
...
...
     Started: [ controller-0 ]
 Clone Set: openstack-heat-api-cfn-clone [openstack-heat-api-cfn]
     Started: [ controller-0 ]
 openstack-cinder-volume	(systemd:openstack-cinder-volume):	Started controller-0
 Clone Set: openstack-nova-conductor-clone [openstack-nova-conductor]
     Started: [ controller-0 ]

Failed Actions:
* openstack-heat-engine_start_0 on controller-0 'not running' (7): call=181, status=complete, exitreason='none',
    last-rc-change='Fri Apr 13 09:33:45 2018', queued=0ms, exec=2115ms


heat-engine does not have log file, the package is installed.

Comment 36 Michele Baldessari 2018-07-20 19:42:00 UTC

Since there is already a BZ to track this doc, I close this one as a duplicate in the meantime

*** This bug has been marked as a duplicate of bug 1568037 ***

Note You need to log in before you can comment on or make changes to this bug.