1564266 – Cinder manage_db_sync fails

Bug 1564266 - Cinder manage_db_sync fails

Summary: Cinder manage_db_sync fails

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-tripleo
Sub Component:
Version:	13.0 (Queens)
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	beta
Target Release:	---
Assignee:	Dan Trainor
QA Contact:	Arik Chernetsky
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-04-05 20:51 UTC by Dan Trainor
Modified:	2018-04-11 03:51 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-04-11 03:51:06 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
output from crm_mon showing that the expected vip resource does not exist (12.98 KB, text/plain) 2018-04-05 20:52 UTC, Dan Trainor	no flags	Details
openstack stack failures list (821.72 KB, text/plain) 2018-04-05 20:53 UTC, Dan Trainor	no flags	Details
plan-environment.yaml (17.01 KB, text/plain) 2018-04-05 20:54 UTC, Dan Trainor	no flags	Details
ceph-install-workflow.log (571.55 KB, text/plain) 2018-04-06 15:45 UTC, Dan Trainor	no flags	Details
View All

Description Dan Trainor 2018-04-05 20:51:58 UTC

Description of problem:

An installation of OSP12 with three controllers, three ceph, and two computes is failing.  The failure appears to be due to cinder's db_sync (full failure attached to this bug):

"Error: /Stage[main]/Cinder::Db::Sync/Exec[cinder-manage db_sync]: Failed to call refresh: Command exceeded timeout", 
"Error: /Stage[main]/Cinder::Db::Sync/Exec[cinder-manage db_sync]: Command exceeded timeout", 

However upon further investigation, cinder-manage.log complains about not being able to connect to an IP resource that one should expect to be managed by pacemaker/corosync:

2018-04-05 18:11:48.722 83824 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. -23 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '172.16.2.14' ([Errno 113] No route to host)")

This resource (172.16.2.14) is not managed in the cluster (crm_mon output attached to this bug).

This deployment is initiated by Director UI, but I'm not certain that is influencing the failure.




Version-Release number of selected component (if applicable):
RHOSP12


How reproducible:
Always


Steps to Reproduce:
1.  Create a deployment in Director UI
2.  Assign three Ceph, three Controller, and two Compute nodes
3.  Include the following elements:  Base resources configuration, Ceph Storage Backend, Containerized Deployment, environments/ceph-ansible/ceph-ansible.yaml, environments/containers-default-parameters.yaml, environments/docker-ha.yaml, environments/ssl/inject-trust-anchor.yaml, Multiple NICs, Network Isolation, SSL on OpenStack Public Endpoints
4.  Populate inject-trust-anchor.yaml, "Multiple NICs", "Network Isolation", and "SSL on OpenStack Public Endpoints" parameters (plan-environment.yaml is attached to this bug showing configuration
5.  Attempt deployment

Actual results:
Deployment fails


Expected results:
Deployment succeeds


Additional info:
The configuration for this environment is specified in the RHOSP QE Deployment Matrix, Test RHELOSP-30275

Comment 1 Dan Trainor 2018-04-05 20:52:41 UTC

Created attachment 1417891 [details]
output from crm_mon showing that the expected vip resource does not exist

Comment 2 Dan Trainor 2018-04-05 20:53:08 UTC

Created attachment 1417892 [details]
openstack stack failures list

Comment 3 Dan Trainor 2018-04-05 20:54:21 UTC

Created attachment 1417893 [details]
plan-environment.yaml

Comment 10 Dan Trainor 2018-04-06 15:45:52 UTC

Created attachment 1418226 [details]
ceph-install-workflow.log

Comment 16 Dan Trainor 2018-04-11 03:51:06 UTC

After some additional discussion, it looks as if the templates used to import the plan in the UI did not include network_virtual_ips data.  This would be necessary to assign an External, or Provider, IP address to the controller resource that represents the active controller in a deployment with HA controllers.

It was suspected that Director UI did not properly export files as part of the deployment plan that contained jinja2 data but my testing proved that to not be the case.

This appears to be the result of a misconfigured template tarball that was modified after exporting from Director UI.

Note You need to log in before you can comment on or make changes to this bug.