Bug 1368214

Summary: Overcloud deploy fails with Mysql/Galera errors
Product: Red Hat OpenStack Reporter: Sai Sindhur Malleni <smalleni>
Component: rhosp-directorAssignee: Michele Baldessari <michele>
Status: CLOSED CURRENTRELEASE QA Contact: Omri Hochman <ohochman>
Severity: high Docs Contact:
Priority: high    
Version: 10.0 (Newton)CC: dbecker, dciabrin, fdinitto, jcoufal, jschluet, mburns, morazi, rhel-osp-director-maint, royoung, smalleni
Target Milestone: ---Keywords: AutomationBlocker, Triaged
Target Release: 10.0 (Newton)   
Hardware: Unspecified   
OS: Unspecified   
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-09-27 15:21:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Description Flags
os-collect-config on controller-0 in HA and non-HA none

Description Sai Sindhur Malleni 2016-08-18 17:19:27 UTC
Created attachment 1191947 [details]
os-collect-config on controller-0 in HA and non-HA

Description of problem: Overcloud deployment fails with MySQL errors on the controller in the case of HA as well as non-HA. In case of HA this error is only seen on controller-0, and not on other controllers. After multiple attempts with HA and non-HA I was able to get one deployment successfully with the overcloud endpoints created but I did not change anything in the way I deployed. In this case, although the deployment succeeded I can see the same MySQL error on the controller-0 using sudo journalctl -u os-collect-config. 
Also, in some cases overcloud stack create succeeded but the endpoints weren't created.

ERROR message in os-collect-config:

Aug 18 15:30:00 overcloud-controller-0.localdomain os-collect-config[4372]: Error: Could not find dependency Exec[galera-ready] for Class[Aodh::Db::Mysql] at /var/lib/heat-config/heat-config-puppet/f9ae5663-6c41-4edd-84a6-6223f233bd03.pp:41
Aug 18 15:30:00 overcloud-controller-0.localdomain os-collect-config[4372]: [2016-08-18 15:30:00,806] (heat-config) [ERRORROError running /var/lib/heat-config/heat-config-puppet/f9ae5663-6c41-4edd-84a6-6223f233bd03.pp. [1]

Version-Release number of selected component (if applicable):

How reproducible:
I'd say close to 90%

Steps to Reproduce:
1. Deploy undercloud
2. Deploy overcloud

Actual results:
Overcloud deploy fails in some cases with stack create failing, some cases stack being created but endpoints not being created.

Expected results:
Overcloud should deploy successfully

Additional info:
Attaching logs of controller-0 for both HA and non-HA deployments 
Deploy command:for HA
time openstack overcloud deploy --templates -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e network-environment.yaml  -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml --control-scale 3 --compute-scale 1 --control-flavor control --compute-flavor compute --ntp-server --neutron-network-type vxlan --neutron-tunnel-types vxlan -t 60

Comment 2 Rob Young 2016-09-27 13:09:50 UTC
Looking at the error it appears that Galera is attempting to start before MariaDB is up and available.

Comment 3 Fabio Massimo Di Nitto 2016-09-27 14:37:13 UTC
Please provide sosreports from the nodes.

Comment 4 Sai Sindhur Malleni 2016-09-27 14:50:34 UTC
Sorry, this was BZed more than a month ago. I do not have access to the environment anymore.

Comment 5 Fabio Massimo Di Nitto 2016-09-27 14:54:00 UTC
(In reply to Sindhur from comment #4)
> Sorry, this was BZed more than a month ago. I do not have access to the
> environment anymore.

Can you still reproduce the problem? somehow this bug was assigned to the wrong team and went unnoticed till today.

If not, I think we can close it, otherwise try to reproduce it and collect sosreports?

Comment 6 Sai Sindhur Malleni 2016-09-27 14:57:33 UTC
As the BZ mentions this is not 100% reproducible and I haven't seen it lately. Would be good to ask if QE can reproduce this.

Comment 7 Michele Baldessari 2016-09-27 15:21:30 UTC
This is likely due to just old puppet-tripleo and tht. Around August when this BZ was filed, a lot of stuff was being merged around that. As a matter of fact now no service depends on galera-ready.

I am closing this one. Feel free to reopen if there is any recent data and I will happily take a look at it