Same situation as BZ#1434279 occurs with Neutron too. Need to increase timeout in Puppet module to account for lower spec'd systems.
Workaround in director is the following environment file:
+++ This bug was initially created as a clone of Bug #1434279 +++
During a deployment on lower spec'd systems, the "nova-manage db sync" can take longer than five minutes. However, when deploying via the director, the Nova Puppet module has a db_sync_timeout of 300 seconds. This can cause director-based deployments failures. For example, here's the Puppet log during the nova-manage db sync of my test:
Error: /Stage[main]/Nova::Db::Sync/Exec[nova-db-sync]: Failed to call refresh: Command exceeded timeout
Error: /Stage[main]/Nova::Db::Sync/Exec[nova-db-sync]: Command exceeded timeout
And the nova schema changes in the future, it might be a good idea to bump the timeout to something higher.
As a workaround, you can set the timeout to something larger via an environment file using the nova::db::sync::db_sync_timeout hieradata. For example:
--- Additional comment from Dan Macpherson on 2017-03-21 17:35:56 EST ---
"And the nova schema changes in the future"
Meant to say:
"As the nova schema changes in the future"
--- Additional comment from Dan Macpherson on 2017-03-21 18:27:57 EST ---
Just to note I'm testing this on a set of 3 VMs for Controller nodes, each with 2 vCPUs and 10Gb of memory.
Here's a head and tail of nova-manage.log:
[root@overcloud-controller-0 nova]# head -n4 nova-manage.log
2017-03-21 08:11:01.472 45523 INFO migrate.versioning.api [-] 0 -> 1...
2017-03-21 08:11:02.346 45523 INFO migrate.versioning.api [-] done
2017-03-21 08:11:02.346 45523 INFO migrate.versioning.api [-] 1 -> 2...
2017-03-21 08:11:03.501 45523 INFO migrate.versioning.api [-] done
[root@overcloud-controller-0 nova]# tail -n4 nova-manage.log
2017-03-21 08:19:48.633 49098 INFO migrate.versioning.api [req-9f48372f-ab93-4286-9f21-7dd10662282c - - - - -] 345 -> 346...
2017-03-21 08:19:51.867 49098 INFO migrate.versioning.api [req-9f48372f-ab93-4286-9f21-7dd10662282c - - - - -] done
2017-03-21 08:19:51.868 49098 INFO migrate.versioning.api [req-9f48372f-ab93-4286-9f21-7dd10662282c - - - - -] 346 -> 347...
2017-03-21 08:19:52.477 49098 INFO migrate.versioning.api [req-9f48372f-ab93-4286-9f21-7dd10662282c - - - - -] done
Total time for db sync is 8 minutes and 51 seconds.
Granted, enterprise environments will have higher specs and mean faster db sync, but I can see a lot of people testing out with lower spec PoCs that will encounter this issue.
Assigned to Or Idgar, who doesn't have a Bugzilla account yet. I suggest that we bump the default timeout significantly.
To which value should I update the timeout - 600 or 900?
I think 900 would cover all future db sync scenarios significantly and would cover a variety of hardware types, including low spec test environments.
People in upstream argue that this change is not reasonable and that 300 seconds should be more than enough.
I'm not that familiar with the low spec environments which encountered that issue and reached timeout.
1. Could you provide the specs that exhibit this issue, particular the storage specs?
2. Where can I find more information about tripleo db sync actions?
I have a similar issue with the Neutron database sync failing due to a timeout when trying to use tripleo-quickstart to deploy OpenStack using nested KVM virtual machines.
The storage on the hypervisor VM is a QCOW2 image with preallocated metadata, no drive cache, and VirtIO drivers. This nested hypervisor VM has 20 cores and 30GB of RAM.
All of the timeouts seem to be defined in the puppet-<OPENSTACK_SERVICE> GitHub repositories for the upstream TripleO. For example, Neutron's default $db_sync_timeout is defined here:
All you need to do is to take low spec environment (with emphasis on storage low performance) and run overcloud deployment with director.
for the command you will need to add an environment file as a parameter.
for example: "openstack overcloud deploy -e /usr/share/openstack-tripleo-heat-templates/environments/low-memory-usage.yaml"
without this environment file, on low spec environment the deployment should fail.
Let me know if you need additional help
Or thanks for the details.
Since low spec environment without this yaml will fail, is it documented to run it with the low-memory-usage.yaml?
There isn't any documentation about it. we use it mainly in non production environments (devenvs, CI, etc.).
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.