Bug 1379177

Summary: When deploying a new overcloud, db_sync will fail to run properly
Product: Red Hat OpenStack Reporter: David Hill <dhill>
Component: rhosp-directorAssignee: Angus Thomas <athomas>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Omri Hochman <ohochman>
Severity: high Docs Contact:
Priority: high    
Version: 9.0 (Mitaka)CC: dbecker, dhill, dmaley, jslagle, mburns, morazi, rhel-osp-director-maint
Target Milestone: ---Keywords: Reopened, ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-03-01 17:03:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description David Hill 2016-09-25 19:14:53 UTC
Description of problem:
When deploying a new overcloud, db_sync will fail to run properly because mysql doesn't seem to be available as per this error message:

Sep 25 18:25:47 overcloud-controller-0.localdomain os-collect-config[8404]: in 71.99 seconds\u001b[0m\n", "deploy_stderr": "\u001b[1;31mWarning: Scope(Class[Mongodb::Server]): Replset specified, but no replset_members or replset_config provided.\u001b[0m\n\u001b[1;31mWarning: Scope(Haproxy::Config[haproxy]): haproxy: The $merge_options parameter will default to true in the next major release. Please review the documentation regarding the implications.\u001b[0m\n\u001b[1;31mError: Could not prefetch mysql_user provider 'mysql': Execution of '/usr/bin/mysql -NBe SELECT CONCAT(User, '@',Host) AS User FROM mysql.user' returned 1: ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)\u001b[0m\n\u001b[1;31mError: Could not prefetch mysql_database provider 'mysql': Execution of '/usr/bin/mysql -NBe show databases' returned 1: ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)\u001b[0m\n", "deploy_status_code": 0}


Version-Release number of selected component (if applicable):


How reproducible:
Almost everytime

Steps to Reproduce:
1. Deploy a new overcloud
2.
3.

Actual results:
Fails at step2 of controller deployment

Expected results:
Succeeds

Additional info:

Comment 1 David Hill 2016-09-27 17:50:07 UTC
This issue starts with RHOSP 8.x.   As soon as the undercloud VM is under a bit of stress, step4 of overcloudcontrollerdeployment almost always fails to succeed.

Comment 2 James Slagle 2017-03-01 16:05:32 UTC
can't reproduce based on the steps:
1. Deploy a new overcloud

please reopen if this is reproducable

Comment 3 David Hill 2017-03-01 17:03:27 UTC
2. Make sure that you fail to deploy at least one compute node and that nova/ironic will put a state in error, delete the compute and retry creating one.

I found a work-around to this and it was to delete the computes/controller that had "deleted_at" != null in the "instances" table of the "nova" database.

You might have to do this in an older version of RHOSP (Like 8) because the original environment was RHOSP 8 which was updated to RHOSP 9 and then RHOSP 10.