Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1425207

Summary: deployments sometimes fail and sometimes succeeds.
Product: Red Hat OpenStack Reporter: Jeremy <jmelvin>
Component: puppet-tripleoAssignee: Michele Baldessari <michele>
Status: CLOSED ERRATA QA Contact: nlevinki <nlevinki>
Severity: high Docs Contact:
Priority: high    
Version: 10.0 (Newton)CC: aschultz, chhudson, chjones, dbecker, dhill, djuran, jjoyce, jschluet, mburns, mcornea, michele, morazi, pkomarov, rhel-osp-director-maint, slinaber, sreichar, tvignaud
Target Milestone: z7Keywords: Triaged, ZStream
Target Release: 10.0 (Newton)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: puppet-tripleo-5.6.4-3.el7ost Doc Type: Bug Fix
Doc Text:
Prior to this update, a missing puppet relationship could have caused puppet to create a mysql user before the database was fully up. Consequently, deployment would fail with the following error: `Error: Could not prefetch mysql_user provider 'mysql'`. This update adds a proper puppet relationship so that users are only created when the database is fully up.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-02-27 16:50:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jeremy 2017-02-20 20:29:39 UTC
Description of problem:https://access.redhat.com/documentation/en/red-hat-openstack-platform/10/single/advanced-overcloud-customization/#creating_hyper_converged_compute_and_ceph_services

 The deployment using hyper converged fails about 1/3 times. When we see the deployment failure we see these galera errors on the controllers.


Feb 17 01:59:52 neutonoc-controller-0.localdomain os-collect-config[9648]: b[1;31mWarning: Scope(Haproxy::Config[haproxy]): haproxy: The $merge_options parameter will default to true in the next major release. Please review the documentation regarding the implications.\u001b[0m\n\u001b[1;31mError: Could not prefetch mysql_user provider 'mysql': Execution of '/usr/bin/mysql -NBe SELECT CONCAT(User, '@',Host) AS User FROM mysql.user' returned 1: ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)\u001b[0m\n", "deploy_status_code": 0}

### it seems galera has problems , however I noticed galera ultimately seems to start up later in the deploy. 

Version-Release number of selected component (if applicable):
openstack-tripleo-0.0.8-0.2.4de13b3git.el7ost.noarch  


How reproducible:
30%

Steps to Reproduce:
1. deploy with templates attached to case
2.
3.

Actual results:
random deploy failure

Expected results:
100% deploy success

Additional info:

[stack@ucsb-monster ~]$ cat hyperconverged.sh 
#!/bin/bash
source ~/stackrc
time openstack overcloud deploy --templates \
-r ~/custom-templates/custom-roles.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml \
-e ~/custom-templates/network.yaml \
-e ~/custom-templates/ceph.yaml \
-e ~/custom-templates/compute.yaml \
-e ~/custom-templates/layout.yaml \
--stack neutonoc \
--verbose --debug \
--log-file overcloudDeploy_$(date +%m_%d_%y__%H_%M_%S).log
[stack@ucsb-monster ~]$

Comment 1 Red Hat Bugzilla Rules Engine 2017-02-20 20:29:49 UTC
This bugzilla has been removed from the release and needs to be reviewed and Triaged for another Target Release.

Comment 4 Chris Jones 2017-05-24 11:49:00 UTC
*** Bug 1446797 has been marked as a duplicate of this bug. ***

Comment 5 David Juran 2017-05-31 15:18:25 UTC
Since bug 1446797 was closed as a duplicate of this one, do note that the issue does not require HyperConverged, or even Ceph to occur.

Comment 6 Steve Reichard 2017-08-16 16:16:49 UTC
Also see this, captured sosreport at:

http://refarch.cloud.lab.eng.bos.redhat.com/pub/tmp/bz1446797/

Unsure if this matters, but this config is vlan based versus vxlan.

Comment 7 Steve Reichard 2017-09-21 17:25:50 UTC
FYI, using the same equipment and equivalent config files have not seen with with OSP11 on 7.4.

Comment 13 pkomarov 2018-02-25 11:07:43 UTC
Verified , 

#puppet-tripleo version check : 

$ ansible overcloud -m shell -b -a 'rpm -qa|grep puppet-tripleo'

overcloud-compute-0 | SUCCESS | rc=0 >>
puppet-tripleo-5.6.4-3.el7ost.noarch

overcloud-compute-1 | SUCCESS | rc=0 >>
puppet-tripleo-5.6.4-3.el7ost.noarch

overcloud-controller-0 | SUCCESS | rc=0 >>
puppet-tripleo-5.6.4-3.el7ost.noarch

overcloud-controller-1 | SUCCESS | rc=0 >>
puppet-tripleo-5.6.4-3.el7ost.noarch

overcloud-controller-2 | SUCCESS | rc=0 >>
puppet-tripleo-5.6.4-3.el7ost.noarch


#code check : 

$ ansible overcloud -m shell -b -a 'grep Mysql_user /usr/share/openstack-puppet/modules/tripleo/manifests/profile/pacemaker/database/mysql.pp'


overcloud-compute-1 | SUCCESS | rc=0 >>
      Exec['galera-ready'] -> Mysql_user<||>

overcloud-controller-1 | SUCCESS | rc=0 >>
      Exec['galera-ready'] -> Mysql_user<||>

overcloud-controller-0 | SUCCESS | rc=0 >>
      Exec['galera-ready'] -> Mysql_user<||>

overcloud-compute-0 | SUCCESS | rc=0 >>
      Exec['galera-ready'] -> Mysql_user<||>

overcloud-controller-2 | SUCCESS | rc=0 >>
      Exec['galera-ready'] -> Mysql_user<||>


#Re-deployed 3 times with the alternating features : 
- 1 X default roles for 3 controller and 2 computes 
- 2 X default roles for 3 controllers with  hyperconverged-ceph role for the computes.

results : 
- success rate of 100% on all three deployments : 

#from overcloud_install.log : 

2018-02-22 13:19:10 |
2018-02-22 13:19:10 |  Stack overcloud CREATE_COMPLETE
2018-02-22 13:19:10 |
2018-02-22 13:19:10 | Overcloud Endpoint: http://10.0.0.12:5000/v2.0
2018-02-22 13:19:10 | Overcloud Deployed
2018-02-22 13:19:10 | + status_code=0

Comment 16 errata-xmlrpc 2018-02-27 16:50:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0364