Bug 1387408

Summary: Only the CephOSD service is deployed on the compute nodes when using hyperconverged-ceph.yaml
Product: Red Hat OpenStack Reporter: John Fulton <johfulto>
Component: openstack-tripleo-heat-templatesAssignee: Giulio Fidente <gfidente>
Status: CLOSED CURRENTRELEASE QA Contact: Yogev Rabl <yrabl>
Severity: high Docs Contact: Derek <dcadzow>
Priority: unspecified    
Version: 10.0 (Newton)CC: dmatthew, gcharot, gfidente, hbrock, jjoyce, jomurphy, jschluet, jslagle, mburns, mcornea, ramishra, rhel-osp-director-maint, sbaker, sclewis, scohen, shardy, srevivo, therve
Target Milestone: ---Keywords: TestOnly, Triaged, ZStream
Target Release: 10.0 (Newton)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-5.0.0-1.5.el7ost openstack-heat-7.0.0-5.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1410170 (view as bug list) Environment:
Last Closed: 2017-05-18 17:57:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1410170    

Description John Fulton 2016-10-20 19:55:39 UTC
* Description of problem:

When using the environments/hyperconverged-ceph.yaml [1] to deploy an HCI (nova/ceph on one node) overcloud, compute nodes are deployed with working OSDs, but nova compute services are not configured. For example, nova.conf is empty, nova processes are not running on the compute nodes, and nova's systemd unit files show nova services to be disabled. The deploy completes successfully but nova doesn't work. Other services, e.g. glance and cinder (backed by Ceph) do work. 

It's as if the parameter_merge_strategies merge strategy [1] is behaving like overwrite instead [2]. 

[1] https://github.com/openstack/tripleo-heat-templates/blob/stable/newton/environments/hyperconverged-ceph.yaml#L10-L11
[2] http://docs.openstack.org/developer/heat/template_guide/environment.html#environment-merging 

* Version-Release number of selected component (if applicable):
python-tripleoclient-5.2.0-2.el7ost.noarch
python-heatclient-1.5.0-1.el7ost.noarch

* How reproducible:
Seems deterministic. Reproduced 3 times using puddles from 18th and 19th of Oct

* Steps to Reproduce:
Do an HCI deployment with environments/hyperconverged-ceph.yaml 

* Actual results:
Overcloud without Nova compute services

* Expected results:
Overcloud with Nova compute services

* Additional info:

Using "Scenario 2 - hyperconverged ceph deployment" as per http://hardysteven.blogspot.com/2016/08/tripleo-composable-services-101.html  produces the needed result.

Comment 1 John Fulton 2016-10-20 19:57:31 UTC
[stack@hci-director ~]$ swift download overcloud user-environment.yaml
user-environment.yaml [auth 0.369s, headers 0.666s, total 0.666s, 0.014 MB/s]
[stack@hci-director ~]$ more user-environment.yaml 
parameter_merge_strategies: {ComputeServices: merge}
resource_registry: {'OS::TripleO::BlockStorage::Ports::ExternalPort': network/ports/noop.yaml
  'OS::TripleO::BlockStorage::Ports::InternalApiPort': network/ports/internal_api.yaml,
  'OS::TripleO::BlockStorage::Ports::StorageMgmtPort': network/ports/storage_mgmt.yaml,
  'OS::TripleO::BlockStorage::Ports::StoragePort': network/ports/storage.yaml, 'OS::TripleO::
  'OS::TripleO::CephStorage::Ports::ExternalPort': network/ports/noop.yaml, 'OS::TripleO::Cep
  'OS::TripleO::CephStorage::Ports::StorageMgmtPort': network/ports/storage_mgmt.yaml,
  'OS::TripleO::CephStorage::Ports::StoragePort': network/ports/storage.yaml, 'OS::TripleO::C
  'OS::TripleO::Compute::Net::SoftwareConfig': user-files/2ad0c78891e7891f89696d44ff44b9c5-co
  'OS::TripleO::Compute::Ports::ExternalPort': network/ports/noop.yaml, 'OS::TripleO::Compute
  'OS::TripleO::Compute::Ports::StorageMgmtPort': network/ports/storage_mgmt.yaml,
  'OS::TripleO::Compute::Ports::StoragePort': network/ports/storage.yaml, 'OS::TripleO::Compu
  'OS::TripleO::Controller::Net::SoftwareConfig': user-files/c538381d5c820ef0f4caf2385fd0548d
  'OS::TripleO::Controller::Ports::ExternalPort': network/ports/external.yaml, 'OS::TripleO::
  'OS::TripleO::Controller::Ports::RedisVipPort': network/ports/vip.yaml, 'OS::TripleO::Contr
  'OS::TripleO::Controller::Ports::StoragePort': network/ports/storage.yaml, 'OS::TripleO::Co
  'OS::TripleO::ControllerConfig': puppet/controller-config-pacemaker.yaml, 'OS::TripleO::Net
  'OS::TripleO::Network::InternalApi': network/internal_api.yaml, 'OS::TripleO::Network::Port
  'OS::TripleO::Network::Ports::InternalApiVipPort': network/ports/internal_api.yaml,
  'OS::TripleO::Network::Ports::RedisVipPort': network/ports/vip.yaml, 'OS::TripleO::Network:
  'OS::TripleO::Network::Ports::StorageVipPort': network/ports/storage.yaml, 'OS::TripleO::Ne
  'OS::TripleO::Network::StorageMgmt': network/storage_mgmt.yaml, 'OS::TripleO::Network::Tena
  'OS::TripleO::NodeExtraConfigPost': user-files/e541a433b6110f5b9ef4218a1de4443a-post-deploy
  'OS::TripleO::NodeUserData': user-files/03e8fd271d16960f8955efecb6447694-first-boot-templat
  'OS::TripleO::Services::CephClient': puppet/services/ceph-client.yaml, 'OS::TripleO::Servic
  'OS::TripleO::Services::CephOSD': puppet/services/ceph-osd.yaml, 'OS::TripleO::Services::Ci
  'OS::TripleO::Services::HAproxy': puppet/services/pacemaker/haproxy.yaml, 'OS::TripleO::Ser
  'OS::TripleO::Services::Pacemaker': puppet/services/pacemaker.yaml, 'OS::TripleO::Services:
  'OS::TripleO::Services::Redis': puppet/services/pacemaker/database/redis.yaml, 'OS::TripleO
  'OS::TripleO::SwiftStorage::Ports::InternalApiPort': network/ports/internal_api.yaml,
  'OS::TripleO::SwiftStorage::Ports::StorageMgmtPort': network/ports/storage_mgmt.yaml,
  'OS::TripleO::SwiftStorage::Ports::StoragePort': network/ports/storage.yaml, 'OS::TripleO::
  'OS::TripleO::Tasks::ControllerPostPuppet': extraconfig/tasks/post_puppet_pacemaker.yaml,
  'OS::TripleO::Tasks::ControllerPostPuppetRestart': extraconfig/tasks/post_puppet_pacemaker_
  'OS::TripleO::Tasks::ControllerPrePuppet': extraconfig/tasks/pre_puppet_pacemaker.yaml}
[stack@hci-director ~]$

Comment 2 Gregory Charot 2016-10-20 20:07:34 UTC
Hi,

parameter_merge_strategies:
  ComputeServices: deep_merge

Solves the issue as if something changed in template depth level :

deep_merge
Json values are deep merged. Not useful for other types like comma delimited lists and strings. If specified for them, it falls back to merge.

openstack hypervisor list
+----+---------------------------------+
| ID | Hypervisor Hostname             |
+----+---------------------------------+
|  1 | overcloud-compute-2.localdomain |
|  2 | overcloud-compute-1.localdomain |
|  3 | overcloud-compute-0.localdomain |
+----+---------------------------------+

Cheers,
Greg

Comment 3 Giulio Fidente 2016-10-20 22:48:04 UTC
(In reply to Gregory Charot from comment #2)
> Hi,
> 
> parameter_merge_strategies:
>   ComputeServices: deep_merge
> 
> Solves the issue as if something changed in template depth level :

it does not solve the issue for me; also the params we are merging are still lists so 'merge' should work as he used to do before

I am still investigating what the root cause could be

Comment 4 Giulio Fidente 2016-10-20 22:51:16 UTC
Dougal, can you help us with this?

As per comment #1 user-environment.yaml seems to preserve parameter_merge_strategies

Is there any chance we are not passing it at deployment time?

Comment 5 Dougal Matthews 2016-10-21 10:49:47 UTC
Discussed with Giulio, he is tracking down a potential issue in tripleo-common.

Comment 6 Giulio Fidente 2016-10-21 13:01:51 UTC
I am debugging the POST request and it looks like tripleoclient is passing via heatclient parameter_merge_strategies as intended:

2016-10-21 08:43:27.954 10409 DEBUG heatclient.common.http [-] curl -g -i -X POST -H 'X-Auth-User: admin' -H 'X-Auth-Token: {SHA1}5b4a7dc7215ce4ce0f124f0f454a3d1672c9fc9b' -H 'X-Region-Name: regionOne' -H 'Accept: application/json' -H 'User-Agent: python-heatclient' -H 'Content-Type: application/json' -d '{"stack_name": "overcloud", "environment": {"parameter_defaults": {"MysqlMaxConnections": 8192, "ControllerCount": 3, ... }, "parameter_merge_strategies": {"ComputeServices": "merge"}, "resource_registry": {"OS::TripleO::Services::Timezone": "http://192.168.1.1:8080/v1/AUTH_b0ffd5e578ee44ebb9bfcc9a5425426a/overcloud/puppet/services/time/timezone.yaml", ...}}

the list of parameter_defaults and resource_registry is longer but parameter_merge_strategies seems to be there as wanted.

Comment 8 Giulio Fidente 2016-10-28 00:36:45 UTC
Given we can already merge _defaults from different environment files, an alternative approach to the Heat change is to pass the services list via tripleo registry which is used as env file at deployment.

Comment 11 Rabi Mishra 2016-11-03 13:40:30 UTC
We agreed to go with the THT fix[1] i.e. move the the template defaults to the base env file, rather than making any changes in heat. There is an related upstream fix in heat[2] to avoid duplicates if merge_strategies are specificed in the base env file. However, we probably would not need to backport it to newton.

[1] https://review.openstack.org/#/c/391064/
[2] https://review.openstack.org/#/c/390064/

Comment 12 John Fulton 2016-11-04 12:24:47 UTC
This fix is blocked by broken upstream CI job gate-tripleo-ci-centos-7-ovb-ha.

Comment 13 John Fulton 2016-11-07 18:37:17 UTC
Status: THT fix Merged upstream in master (ocata). Awaiting for same fix to be +2'd into backport to Newton [2]. 

[1] https://review.openstack.org/#/c/391064/
[2] https://review.openstack.org/#/c/394442/

Comment 16 John Fulton 2016-11-10 18:44:25 UTC
I am still having the reported issue with my first test. The nova.conf on my computes is empty [0] after the deploy Ceph OSD services are running however.

I have checked that I have the versions that should contain the fix [1]. I am double checking that I lined up my deploy environment overrides correctly [2]. Also, sharing my user-environment.yaml from swift [3]. 

  John

[0] 

[stack@hci-director ~]$ ansible osds -b -m shell -a "wc -l /etc/nova/nova.conf"
192.168.1.29 | SUCCESS | rc=0 >>
0 /etc/nova/nova.conf

192.168.1.34 | SUCCESS | rc=0 >>
0 /etc/nova/nova.conf

192.168.1.36 | SUCCESS | rc=0 >>
0 /etc/nova/nova.conf

192.168.1.32 | SUCCESS | rc=0 >>
0 /etc/nova/nova.conf


[1] Desired fixes are in: 

[stack@hci-director ~]$ sudo rpm -qa | grep openstack-tripleo-heat
openstack-tripleo-heat-templates-5.0.0-1.5.el7ost.noarch
[stack@hci-director ~]$ 

[root@hci-director ~]# rpm -q python-heatclient
python-heatclient-1.5.0-1.el7ost.noarch
[root@hci-director ~]# 

[stack@hci-director ~]$ sudo rpm -qa | grep openstack-heat
openstack-heat-templates-0-0.6.1e6015dgit.el7ost.noarch
openstack-heat-engine-7.0.0-5.el7ost.noarch
openstack-heat-api-cfn-7.0.0-5.el7ost.noarch
openstack-heat-api-7.0.0-5.el7ost.noarch
openstack-heat-common-7.0.0-5.el7ost.noarch
[stack@hci-director ~]$ 

openstack-heat-7.0.0-5.el7ost is the NVR and openstack-heat-{engine,api-cfn,api,common}-7.0.0-5.el7ost.noarch are the RPMs. 

[2] My deploy command:

time openstack overcloud deploy --templates \
-e /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/hyperconverged-ceph.yaml \
-e ~/custom-templates/network.yaml \
-e ~/custom-templates/ceph.yaml \
--control-flavor control \
--control-scale 3 \
--compute-flavor compute \
--compute-scale 4 \

Where network.yaml and ceph.yaml are at https://github.com/RHsyseng/hci/tree/master/custom-templates. 

[3] swift download overcloud user-environment.yaml

parameter_merge_strategies: {ComputeServices: merge}
resource_registry: {'OS::TripleO::BlockStorage::Ports::ExternalPort': network/ports/noop.yaml,
  'OS::TripleO::BlockStorage::Ports::InternalApiPort': network/ports/internal_api.yaml,
  'OS::TripleO::BlockStorage::Ports::StorageMgmtPort': network/ports/storage_mgmt.yaml,
  'OS::TripleO::BlockStorage::Ports::StoragePort': network/ports/storage.yaml, 'OS::TripleO::BlockStorage::Ports::TenantPort': network/ports/noop.yaml,
  'OS::TripleO::CephStorage::Ports::ExternalPort': network/ports/noop.yaml, 'OS::TripleO::CephStorage::Ports::InternalApiPort': network/ports/noop.yaml,
  'OS::TripleO::CephStorage::Ports::StorageMgmtPort': network/ports/storage_mgmt.yaml,
  'OS::TripleO::CephStorage::Ports::StoragePort': network/ports/storage.yaml, 'OS::TripleO::CephStorage::Ports::TenantPort': network/ports/noop.yaml,
  'OS::TripleO::Compute::Net::SoftwareConfig': user-files/home/stack/custom-templates/nic-configs/compute-nics.yaml,
  'OS::TripleO::Compute::Ports::ExternalPort': network/ports/noop.yaml, 'OS::TripleO::Compute::Ports::InternalApiPort': network/ports/internal_api.yaml,
  'OS::TripleO::Compute::Ports::StorageMgmtPort': network/ports/storage_mgmt.yaml,
  'OS::TripleO::Compute::Ports::StoragePort': network/ports/storage.yaml, 'OS::TripleO::Compute::Ports::TenantPort': network/ports/tenant.yaml,
  'OS::TripleO::Controller::Net::SoftwareConfig': user-files/home/stack/custom-templates/nic-configs/controller-nics.yaml,
  'OS::TripleO::Controller::Ports::ExternalPort': network/ports/external.yaml, 'OS::TripleO::Controller::Ports::InternalApiPort': network/ports/internal_api.yaml,
  'OS::TripleO::Controller::Ports::RedisVipPort': network/ports/vip.yaml, 'OS::TripleO::Controller::Ports::StorageMgmtPort': network/ports/storage_mgmt.yaml,
  'OS::TripleO::Controller::Ports::StoragePort': network/ports/storage.yaml, 'OS::TripleO::Controller::Ports::TenantPort': network/ports/tenant.yaml,
  'OS::TripleO::ControllerConfig': puppet/controller-config-pacemaker.yaml, 'OS::TripleO::Network::External': network/external.yaml,
  'OS::TripleO::Network::InternalApi': network/internal_api.yaml, 'OS::TripleO::Network::Ports::ExternalVipPort': network/ports/external.yaml,
  'OS::TripleO::Network::Ports::InternalApiVipPort': network/ports/internal_api.yaml,
  'OS::TripleO::Network::Ports::RedisVipPort': network/ports/vip.yaml, 'OS::TripleO::Network::Ports::StorageMgmtVipPort': network/ports/storage_mgmt.yaml,
  'OS::TripleO::Network::Ports::StorageVipPort': network/ports/storage.yaml, 'OS::TripleO::Network::Storage': network/storage.yaml,
  'OS::TripleO::Network::StorageMgmt': network/storage_mgmt.yaml, 'OS::TripleO::Network::Tenant': network/tenant.yaml,
  'OS::TripleO::NodeExtraConfigPost': user-files/home/stack/custom-templates/post-deploy-template.yaml,
  'OS::TripleO::NodeUserData': user-files/home/stack/custom-templates/first-boot-template.yaml,
  'OS::TripleO::Services::CephClient': puppet/services/ceph-client.yaml, 'OS::TripleO::Services::CephMon': puppet/services/ceph-mon.yaml,
  'OS::TripleO::Services::CephOSD': puppet/services/ceph-osd.yaml, 'OS::TripleO::Services::CinderVolume': puppet/services/pacemaker/cinder-volume.yaml,
  'OS::TripleO::Services::HAproxy': puppet/services/pacemaker/haproxy.yaml, 'OS::TripleO::Services::MySQL': puppet/services/pacemaker/database/mysql.yaml,
  'OS::TripleO::Services::Pacemaker': puppet/services/pacemaker.yaml, 'OS::TripleO::Services::RabbitMQ': puppet/services/pacemaker/rabbitmq.yaml,
  'OS::TripleO::Services::Redis': puppet/services/pacemaker/database/redis.yaml, 'OS::TripleO::SwiftStorage::Ports::ExternalPort': network/ports/noop.yaml,
  'OS::TripleO::SwiftStorage::Ports::InternalApiPort': network/ports/internal_api.yaml,
  'OS::TripleO::SwiftStorage::Ports::StorageMgmtPort': network/ports/storage_mgmt.yaml,
  'OS::TripleO::SwiftStorage::Ports::StoragePort': network/ports/storage.yaml, 'OS::TripleO::SwiftStorage::Ports::TenantPort': network/ports/noop.yaml,
  'OS::TripleO::Tasks::ControllerPostPuppet': extraconfig/tasks/post_puppet_pacemaker.yaml,
  'OS::TripleO::Tasks::ControllerPostPuppetRestart': extraconfig/tasks/post_puppet_pacemaker_restart.yaml,
  'OS::TripleO::Tasks::ControllerPrePuppet': extraconfig/tasks/pre_puppet_pacemaker.yaml}

Comment 19 Giulio Fidente 2016-11-25 10:30:22 UTC
The actual bug in tripleoclient/tripleo-common is tracked upstream via LP bug 1635409.

Until that is fixed, we can put the entire list of services needed on the Compute role in the environment file, so that the user experience does not change (people just need to deploy passing the additional environment file) and when the upstream fix is finished we'll go back to using merge_strategies.

Comment 22 Jon Schlueter 2017-01-05 06:22:01 UTC
According to our records, this should be resolved by openstack-tripleo-heat-templates-5.1.0-7.el7ost.  This build is available now.

Comment 23 Jon Schlueter 2017-01-05 06:22:10 UTC
According to our records, this should be resolved by openstack-heat-7.0.0-7.el7ost.  This build is available now.

Comment 26 Yogev Rabl 2017-04-04 19:08:39 UTC
verified on openstack-tripleo-heat-templates-5.2.0-9.el7ost.noarch