Description of problem:
An installation of OSP12 with three controllers, three ceph, and two computes is failing. The failure appears to be due to cinder's db_sync (full failure attached to this bug):
"Error: /Stage[main]/Cinder::Db::Sync/Exec[cinder-manage db_sync]: Failed to call refresh: Command exceeded timeout",
"Error: /Stage[main]/Cinder::Db::Sync/Exec[cinder-manage db_sync]: Command exceeded timeout",
However upon further investigation, cinder-manage.log complains about not being able to connect to an IP resource that one should expect to be managed by pacemaker/corosync:
2018-04-05 18:11:48.722 83824 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. -23 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '172.16.2.14' ([Errno 113] No route to host)")
This resource (172.16.2.14) is not managed in the cluster (crm_mon output attached to this bug).
This deployment is initiated by Director UI, but I'm not certain that is influencing the failure.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Create a deployment in Director UI
2. Assign three Ceph, three Controller, and two Compute nodes
3. Include the following elements: Base resources configuration, Ceph Storage Backend, Containerized Deployment, environments/ceph-ansible/ceph-ansible.yaml, environments/containers-default-parameters.yaml, environments/docker-ha.yaml, environments/ssl/inject-trust-anchor.yaml, Multiple NICs, Network Isolation, SSL on OpenStack Public Endpoints
4. Populate inject-trust-anchor.yaml, "Multiple NICs", "Network Isolation", and "SSL on OpenStack Public Endpoints" parameters (plan-environment.yaml is attached to this bug showing configuration
5. Attempt deployment
The configuration for this environment is specified in the RHOSP QE Deployment Matrix, Test RHELOSP-30275
Created attachment 1417891 [details]
output from crm_mon showing that the expected vip resource does not exist
Created attachment 1417892 [details]
openstack stack failures list
Created attachment 1417893 [details]
Created attachment 1418226 [details]
After some additional discussion, it looks as if the templates used to import the plan in the UI did not include network_virtual_ips data. This would be necessary to assign an External, or Provider, IP address to the controller resource that represents the active controller in a deployment with HA controllers.
It was suspected that Director UI did not properly export files as part of the deployment plan that contained jinja2 data but my testing proved that to not be the case.
This appears to be the result of a misconfigured template tarball that was modified after exporting from Director UI.