Bug 1419911

Summary: OSP8 Overcloud installation sometime fails.
Product: Red Hat OpenStack Reporter: Sofer Athlan-Guyot <sathlang>
Component: openstack-tripleo-heat-templatesAssignee: Sofer Athlan-Guyot <sathlang>
Status: CLOSED CANTFIX QA Contact: Arik Chernetsky <achernet>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 8.0 (Liberty)CC: fdinitto, mburns, rhel-osp-director-maint
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-02-22 11:57:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sofer Athlan-Guyot 2017-02-07 11:51:41 UTC
Description of problem: Install OSP8 3controller on slow architecture (VMs)

    Notice: Pacemaker has reported quorum achieved\u001b[0m
    \u001b[mNotice: /Stage[main]/Pacemaker::Corosync/Notify[pacemaker settled]/message: defined 'message' as 'Pacemaker has reported quorum achieved'\u001b[0m
    \u001b[mNotice: /Stage[main]/Tripleo::Loadbalancer/Haproxy::Listen[redis]/Concat::Fragment[haproxy-redis_listen_block]/File[/var/lib/puppet/concat/_etc_haproxy_haproxy.cfg/fragments/20-redis-00_haproxy-redis_listen_block]/ensure: defined content as '{md5}90ded811b382d2253efe5e4d2169e5f6'\u001b[0m
    \u001b[mNotice: /Stage[main]/Haproxy/Haproxy::Instance[haproxy]/Haproxy::Config[haproxy]/Concat[/etc/haproxy/haproxy.cfg]/Exec[concat_/etc/haproxy/haproxy.cfg]/returns: executed successfully\u001b[0m
    \u001b[mNotice: /Stage[main]/Haproxy/Haproxy::Instance[haproxy]/Haproxy::Config[haproxy]/Concat[/etc/haproxy/haproxy.cfg]/Exec[concat_/etc/haproxy/haproxy.cfg]: Triggered 'refresh' from 41 events\u001b[0m
    \u001b[mNotice: /Stage[main]/Haproxy/Haproxy::Instance[haproxy]/Haproxy::Config[haproxy]/Concat[/etc/haproxy/haproxy.cfg]/File[/etc/haproxy/haproxy.cfg]/content: content changed '{md5}1f337186b0e1ba5ee82760cb437fb810' to '{md5}c0e7fc0c9740dba37ec9b287f4542821'\u001b[0m
    \u001b[mNotice: /File[/etc/haproxy/haproxy.cfg]/seluser: seluser changed 'unconfined_u' to 'system_u'\u001b[0m
    \u001b[mNotice: Finished catalog run in 58.30 seconds\u001b[0m
    ", "deploy_stderr": "Could not retrieve fact='apache_version', resolution='<anonymous>': undefined method `[]' for nil:NilClass
    Could not retrieve fact='apache_version', resolution='<anonymous>': undefined method `[]' for nil:NilClass
    \u001b[1;31mWarning: Scope(Class[Mongodb::Server]): Replset specified, but no replset_members or replset_config provided.\u001b[0m
    \u001b[1;31mWarning: Scope(Haproxy::Config[haproxy]): haproxy: The $merge_options parameter will default to true in the next major release. Please review the documentation regarding the implications.\u001b[0m
    \u001b[1;31mError: Could not prefetch mysql_user provider 'mysql': Execution of '/usr/bin/mysq

    Feb 07 01:53:28 overcloud-controller-0.localdomain os-collect-config[3383]: l -NBe SELECT CONCAT(User, '@',Host) AS User FROM mysql.user' returned 1: ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)\u001b[0m
    \u001b[1;31mError: Could not prefetch mysql_database provider 'mysql': Execution of '/usr/bin/mysql -NBe show databases' returned 1: ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)\u001b[0m
    ", "deploy_status_code": 0}


How reproducible: Nearly all the time, given the vms are slow enough.

There is a race condition between puppet that want to get mysql
information and the galera cluster being ready behind the haproxy.

The galera cluster show up to late (configured by pacemaker) making
the puppet run at step 2 fails.

Comment 1 Fabio Massimo Di Nitto 2017-02-22 11:57:08 UTC
Installations need to follow the minimal hw configuration. If the VMs are >= of minimal hw recommendation, then this is a bug, otherwise there is not much we can do.