Bug 1368214 - Overcloud deploy fails with Mysql/Galera errors
Summary: Overcloud deploy fails with Mysql/Galera errors
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 10.0 (Newton)
Assignee: Michele Baldessari
QA Contact: Omri Hochman
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-08-18 17:19 UTC by Sai Sindhur Malleni
Modified: 2016-09-27 15:21 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-09-27 15:21:30 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
os-collect-config on controller-0 in HA and non-HA (1.89 MB, application/x-tar)
2016-08-18 17:19 UTC, Sai Sindhur Malleni
no flags Details

Description Sai Sindhur Malleni 2016-08-18 17:19:27 UTC
Created attachment 1191947 [details]
os-collect-config on controller-0 in HA and non-HA

Description of problem: Overcloud deployment fails with MySQL errors on the controller in the case of HA as well as non-HA. In case of HA this error is only seen on controller-0, and not on other controllers. After multiple attempts with HA and non-HA I was able to get one deployment successfully with the overcloud endpoints created but I did not change anything in the way I deployed. In this case, although the deployment succeeded I can see the same MySQL error on the controller-0 using sudo journalctl -u os-collect-config. 
Also, in some cases overcloud stack create succeeded but the endpoints weren't created.

ERROR message in os-collect-config:

Aug 18 15:30:00 overcloud-controller-0.localdomain os-collect-config[4372]: Error: Could not find dependency Exec[galera-ready] for Class[Aodh::Db::Mysql] at /var/lib/heat-config/heat-config-puppet/f9ae5663-6c41-4edd-84a6-6223f233bd03.pp:41
Aug 18 15:30:00 overcloud-controller-0.localdomain os-collect-config[4372]: [2016-08-18 15:30:00,806] (heat-config) [ERRORROError running /var/lib/heat-config/heat-config-puppet/f9ae5663-6c41-4edd-84a6-6223f233bd03.pp. [1]


Version-Release number of selected component (if applicable):
OSP-10

How reproducible:
I'd say close to 90%

Steps to Reproduce:
1. Deploy undercloud
2. Deploy overcloud
3.

Actual results:
Overcloud deploy fails in some cases with stack create failing, some cases stack being created but endpoints not being created.

Expected results:
Overcloud should deploy successfully

Additional info:
Attaching logs of controller-0 for both HA and non-HA deployments 
Deploy command:for HA
time openstack overcloud deploy --templates -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e network-environment.yaml  -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml --control-scale 3 --compute-scale 1 --control-flavor control --compute-flavor compute --ntp-server 10.5.26.10 --neutron-network-type vxlan --neutron-tunnel-types vxlan -t 60

Comment 2 Rob Young 2016-09-27 13:09:50 UTC
Looking at the error it appears that Galera is attempting to start before MariaDB is up and available.

Comment 3 Fabio Massimo Di Nitto 2016-09-27 14:37:13 UTC
Please provide sosreports from the nodes.

Comment 4 Sai Sindhur Malleni 2016-09-27 14:50:34 UTC
Sorry, this was BZed more than a month ago. I do not have access to the environment anymore.

Comment 5 Fabio Massimo Di Nitto 2016-09-27 14:54:00 UTC
(In reply to Sindhur from comment #4)
> Sorry, this was BZed more than a month ago. I do not have access to the
> environment anymore.

Can you still reproduce the problem? somehow this bug was assigned to the wrong team and went unnoticed till today.

If not, I think we can close it, otherwise try to reproduce it and collect sosreports?

Comment 6 Sai Sindhur Malleni 2016-09-27 14:57:33 UTC
As the BZ mentions this is not 100% reproducible and I haven't seen it lately. Would be good to ask if QE can reproduce this.

Comment 7 Michele Baldessari 2016-09-27 15:21:30 UTC
This is likely due to just old puppet-tripleo and tht. Around August when this BZ was filed, a lot of stuff was being merged around that. As a matter of fact now no service depends on galera-ready.

I am closing this one. Feel free to reopen if there is any recent data and I will happily take a look at it


Note You need to log in before you can comment on or make changes to this bug.