Created attachment 1191947 [details] os-collect-config on controller-0 in HA and non-HA Description of problem: Overcloud deployment fails with MySQL errors on the controller in the case of HA as well as non-HA. In case of HA this error is only seen on controller-0, and not on other controllers. After multiple attempts with HA and non-HA I was able to get one deployment successfully with the overcloud endpoints created but I did not change anything in the way I deployed. In this case, although the deployment succeeded I can see the same MySQL error on the controller-0 using sudo journalctl -u os-collect-config. Also, in some cases overcloud stack create succeeded but the endpoints weren't created. ERROR message in os-collect-config: Aug 18 15:30:00 overcloud-controller-0.localdomain os-collect-config[4372]: Error: Could not find dependency Exec[galera-ready] for Class[Aodh::Db::Mysql] at /var/lib/heat-config/heat-config-puppet/f9ae5663-6c41-4edd-84a6-6223f233bd03.pp:41 Aug 18 15:30:00 overcloud-controller-0.localdomain os-collect-config[4372]: [2016-08-18 15:30:00,806] (heat-config) [ERRORROError running /var/lib/heat-config/heat-config-puppet/f9ae5663-6c41-4edd-84a6-6223f233bd03.pp. [1] Version-Release number of selected component (if applicable): OSP-10 How reproducible: I'd say close to 90% Steps to Reproduce: 1. Deploy undercloud 2. Deploy overcloud 3. Actual results: Overcloud deploy fails in some cases with stack create failing, some cases stack being created but endpoints not being created. Expected results: Overcloud should deploy successfully Additional info: Attaching logs of controller-0 for both HA and non-HA deployments Deploy command:for HA time openstack overcloud deploy --templates -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e network-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml --control-scale 3 --compute-scale 1 --control-flavor control --compute-flavor compute --ntp-server 10.5.26.10 --neutron-network-type vxlan --neutron-tunnel-types vxlan -t 60
Looking at the error it appears that Galera is attempting to start before MariaDB is up and available.
Please provide sosreports from the nodes.
Sorry, this was BZed more than a month ago. I do not have access to the environment anymore.
(In reply to Sindhur from comment #4) > Sorry, this was BZed more than a month ago. I do not have access to the > environment anymore. Can you still reproduce the problem? somehow this bug was assigned to the wrong team and went unnoticed till today. If not, I think we can close it, otherwise try to reproduce it and collect sosreports?
As the BZ mentions this is not 100% reproducible and I haven't seen it lately. Would be good to ask if QE can reproduce this.
This is likely due to just old puppet-tripleo and tht. Around August when this BZ was filed, a lot of stuff was being merged around that. As a matter of fact now no service depends on galera-ready. I am closing this one. Feel free to reopen if there is any recent data and I will happily take a look at it