Created attachment 1191947 [details]
os-collect-config on controller-0 in HA and non-HA
Description of problem: Overcloud deployment fails with MySQL errors on the controller in the case of HA as well as non-HA. In case of HA this error is only seen on controller-0, and not on other controllers. After multiple attempts with HA and non-HA I was able to get one deployment successfully with the overcloud endpoints created but I did not change anything in the way I deployed. In this case, although the deployment succeeded I can see the same MySQL error on the controller-0 using sudo journalctl -u os-collect-config.
Also, in some cases overcloud stack create succeeded but the endpoints weren't created.
ERROR message in os-collect-config:
Aug 18 15:30:00 overcloud-controller-0.localdomain os-collect-config: Error: Could not find dependency Exec[galera-ready] for Class[Aodh::Db::Mysql] at /var/lib/heat-config/heat-config-puppet/f9ae5663-6c41-4edd-84a6-6223f233bd03.pp:41
Aug 18 15:30:00 overcloud-controller-0.localdomain os-collect-config: [2016-08-18 15:30:00,806] (heat-config) [ERRORROError running /var/lib/heat-config/heat-config-puppet/f9ae5663-6c41-4edd-84a6-6223f233bd03.pp. 
Version-Release number of selected component (if applicable):
I'd say close to 90%
Steps to Reproduce:
1. Deploy undercloud
2. Deploy overcloud
Overcloud deploy fails in some cases with stack create failing, some cases stack being created but endpoints not being created.
Overcloud should deploy successfully
Attaching logs of controller-0 for both HA and non-HA deployments
Deploy command:for HA
time openstack overcloud deploy --templates -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e network-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml --control-scale 3 --compute-scale 1 --control-flavor control --compute-flavor compute --ntp-server 10.5.26.10 --neutron-network-type vxlan --neutron-tunnel-types vxlan -t 60
Looking at the error it appears that Galera is attempting to start before MariaDB is up and available.
Please provide sosreports from the nodes.
Sorry, this was BZed more than a month ago. I do not have access to the environment anymore.
(In reply to Sindhur from comment #4)
> Sorry, this was BZed more than a month ago. I do not have access to the
> environment anymore.
Can you still reproduce the problem? somehow this bug was assigned to the wrong team and went unnoticed till today.
If not, I think we can close it, otherwise try to reproduce it and collect sosreports?
As the BZ mentions this is not 100% reproducible and I haven't seen it lately. Would be good to ask if QE can reproduce this.
This is likely due to just old puppet-tripleo and tht. Around August when this BZ was filed, a lot of stuff was being merged around that. As a matter of fact now no service depends on galera-ready.
I am closing this one. Feel free to reopen if there is any recent data and I will happily take a look at it