Bug 1269005

Summary: rhe-osp-director: HA overcloud deployment with 5 controllers fails.
Product: Red Hat OpenStack Reporter: Alexander Chuzhoy <sasha>
Component: rhosp-directorAssignee: James Slagle <jslagle>
Status: CLOSED CURRENTRELEASE QA Contact: Alexander Chuzhoy <sasha>
Severity: high Docs Contact:
Priority: urgent    
Version: 7.0 (Kilo)CC: dhill, dmacpher, jcoufal, jliberma, jtaleric, mburns, mcornea, mfuruta, michele, mlopes, morazi, ohochman, racedoro, rhel-osp-director-maint, sasha, sclewis, tvvcox, vcojot
Target Milestone: ---Keywords: Triaged
Target Release: 10.0 (Newton)   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Known Issue
Doc Text:
In this release, RHEL OpenStack Platform director only supports a High Availability (HA) overcloud deployment using three controller nodes.
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-10-04 19:03:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
/var/log/messages from controller none

Description Alexander Chuzhoy 2015-10-05 22:23:37 UTC
rhe-osp-director: HA overcloud deployment with 5 controllers fails.


Environment:
instack-undercloud-2.1.2-29.el7ost.noarch


Steps to reproduce:
Attempt to deploy overcloud with 5 controllers:

openstack overcloud deploy --templates --control-scale 5 --compute-scale 1 --ceph-storage-scale 1 -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml -e /home/stack/network-environment.yaml --ntp-server x.x.x.x  --timeout 90


Result:
Stack failed with status: Resource CREATE failed: Error: resources.ControllerNodesPostDeployment.resources.ControllerServicesBaseDeployment_Step2.resources[1]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 6                                                                                                                                                                           
ERROR: openstack Heat Stack create failed.   

See this repeating message on controllers:
Oct  5 16:10:31 localhost galera(galera)[38739]: ERROR: MySQL is not running

Comment 2 Alexander Chuzhoy 2015-10-05 22:27:23 UTC
Created attachment 1080084 [details]
/var/log/messages from controller

Comment 5 Jaromir Coufal 2016-01-07 09:37:05 UTC
The doc_text is wrong we have to support 5 controllers, it is just not recommended due to performance issues.

Comment 6 jliberma@redhat.com 2016-01-15 03:56:58 UTC
Jarda where can I learn more about the performance issues?  Is this the Galera database replication overhead issue, where 3 controllers seems to be the sweet spot?

Comment 7 Jaromir Coufal 2016-01-27 11:36:07 UTC
Hey Jacob, sorry for late answer. I would reach for performance team. I am sure there are multiple constraints - DB would be one of them.

Comment 9 Mike Burns 2016-04-07 20:54:03 UTC
This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.

Comment 10 Joe Talerico 2016-04-14 13:51:49 UTC
Jarda / Jacob - The Performance issues we saw were >=  7 controllers, however this was when we still were deploying with OFI. We have yet to do a deployment with > 3 controllers with director. The DB became problematic because we do not dynamically configure max_connections with mariadb based on the # of controllers/services.

Comment 13 David Hill 2016-04-17 22:53:12 UTC
This can be configured through Pacemaker by modifying:
/usr/share/openstack-tripleo-heat-templates/puppet/manifests/overcloud_controller_pacemaker.pp
and replacing:
meta_params     => "master-max=3 ordered=true",
by:
meta_params     => "master-max=5 ordered=true",

I'm trying this right now and I'll keep this case updated.

Comment 15 David Hill 2016-04-18 00:13:11 UTC
This can be configured through Pacemaker by modifying:
/usr/share/openstack-tripleo-heat-templates/puppet/manifests/overcloud_controller_pacemaker.pp
and replacing:
meta_params     => "master-max=3 ordered=true",
by:
meta_params     => "master-max=5 ordered=true",

I tested it and everything's working as expected.    In my test environment, I don't see a major performance hit .

Comment 16 Michele Baldessari 2016-05-24 12:23:38 UTC
Note that this is fixed upstream where we have:
puppet/manifests/overcloud_controller_pacemaker.pp:      meta_params     => "master-max=${galera_nodes_count} ordered=true"

I *think* this was post-kilo, but I can't find old git history in the tht repo.

Comment 17 Jaromir Coufal 2016-10-04 19:03:40 UTC
Thanks Michele!