Bug 1434520

Summary: Neutron database sync times out during deployment
Product: Red Hat OpenStack Reporter: Dan Macpherson <dmacpher>
Component: openstack-tripleo-heat-templatesAssignee: Or Idgar <oidgar>
Status: CLOSED ERRATA QA Contact: Toni Freger <tfreger>
Severity: high Docs Contact:
Priority: low    
Version: 11.0 (Ocata)CC: amuller, dmacpher, eglynn, ekultails, jjoyce, jschluet, mburns, nlevinki, oidgar, rhel-osp-director-maint, rhos-maint, slinaber, tvignaud
Target Milestone: gaKeywords: Triaged
Target Release: 12.0 (Pike)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-7.0.0-0.20170628002128.el7ost.noarch Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1434279 Environment:
Last Closed: 2017-12-13 21:18:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1433535, 1434279    
Bug Blocks:    

Description Dan Macpherson 2017-03-21 16:08:54 UTC
Same situation as BZ#1434279 occurs with Neutron too. Need to increase timeout in Puppet module to account for lower spec'd systems.

Workaround in director is the following environment file:

parameter_defaults:
  ExtraConfig:
    neutron::db::sync::db_sync_timeout: 900


+++ This bug was initially created as a clone of Bug #1434279 +++

During a deployment on lower spec'd systems, the "nova-manage db sync" can take longer than five minutes. However, when deploying via the director, the Nova Puppet module has a db_sync_timeout of 300 seconds. This can cause director-based deployments failures. For example, here's the Puppet log during the nova-manage db sync of my test:

Error: /Stage[main]/Nova::Db::Sync/Exec[nova-db-sync]: Failed to call refresh: Command exceeded timeout
Error: /Stage[main]/Nova::Db::Sync/Exec[nova-db-sync]: Command exceeded timeout

And the nova schema changes in the future, it might be a good idea to bump the timeout to something higher.

As a workaround, you can set the timeout to something larger via an environment file using the nova::db::sync::db_sync_timeout hieradata. For example:

parameter_defaults:
  ExtraConfig:
    nova::db::sync::db_sync_timeout: 600

--- Additional comment from Dan Macpherson on 2017-03-21 17:35:56 EST ---

"And the nova schema changes in the future"

Meant to say:

"As the nova schema changes in the future"

--- Additional comment from Dan Macpherson on 2017-03-21 18:27:57 EST ---

Just to note I'm testing this on a set of 3 VMs for Controller nodes, each with 2 vCPUs and 10Gb of memory. 

Here's a head and tail of nova-manage.log:

[root@overcloud-controller-0 nova]# head -n4 nova-manage.log 
2017-03-21 08:11:01.472 45523 INFO migrate.versioning.api [-] 0 -> 1... 
2017-03-21 08:11:02.346 45523 INFO migrate.versioning.api [-] done
2017-03-21 08:11:02.346 45523 INFO migrate.versioning.api [-] 1 -> 2... 
2017-03-21 08:11:03.501 45523 INFO migrate.versioning.api [-] done
[root@overcloud-controller-0 nova]# tail -n4 nova-manage.log 
2017-03-21 08:19:48.633 49098 INFO migrate.versioning.api [req-9f48372f-ab93-4286-9f21-7dd10662282c - - - - -] 345 -> 346... 
2017-03-21 08:19:51.867 49098 INFO migrate.versioning.api [req-9f48372f-ab93-4286-9f21-7dd10662282c - - - - -] done
2017-03-21 08:19:51.868 49098 INFO migrate.versioning.api [req-9f48372f-ab93-4286-9f21-7dd10662282c - - - - -] 346 -> 347... 
2017-03-21 08:19:52.477 49098 INFO migrate.versioning.api [req-9f48372f-ab93-4286-9f21-7dd10662282c - - - - -] done

Total time for db sync is 8 minutes and 51 seconds.

Granted, enterprise environments will have higher specs and mean faster db sync, but I can see a lot of people testing out with lower spec PoCs that will encounter this issue.

Comment 1 Assaf Muller 2017-04-28 19:46:31 UTC
Assigned to Or Idgar, who doesn't have a Bugzilla account yet. I suggest that we bump the default timeout significantly.

Comment 2 Or Idgar 2017-05-04 14:49:11 UTC
To which value should I update the timeout - 600 or 900?

Comment 3 Dan Macpherson 2017-05-04 15:44:43 UTC
I think 900 would cover all future db sync scenarios significantly and would cover a variety of hardware types, including low spec test environments.

Comment 4 Or Idgar 2017-05-11 08:01:31 UTC
Hi Dan,
People in upstream argue that this change is not reasonable and that 300 seconds should be more than enough.
I'm not that familiar with the low spec environments which encountered that issue and reached timeout.

1. Could you provide the specs that exhibit this issue, particular the storage specs?
2. Where can I find more information about tripleo db sync actions?

Comment 6 ekultails 2017-09-07 04:11:40 UTC
I have a similar issue with the Neutron database sync failing due to a timeout when trying to use tripleo-quickstart to deploy OpenStack using nested KVM virtual machines.

https://bugs.launchpad.net/tripleo/+bug/1712901

The storage on the hypervisor VM is a QCOW2 image with preallocated metadata, no drive cache, and VirtIO drivers. This nested hypervisor VM has 20 cores and 30GB of RAM.

All of the timeouts seem to be defined in the puppet-<OPENSTACK_SERVICE> GitHub repositories for the upstream TripleO. For example, Neutron's default $db_sync_timeout is defined here:

https://github.com/openstack/puppet-neutron/blob/master/manifests/db/sync.pp

Comment 8 Or Idgar 2017-11-22 06:53:08 UTC
Hi Toni,
All you need to do is to take low spec environment (with emphasis on storage low performance) and run overcloud deployment with director.
for the command you will need to add an environment file as a parameter.
for example: "openstack overcloud deploy -e /usr/share/openstack-tripleo-heat-templates/environments/low-memory-usage.yaml"

without this environment file, on low spec environment the deployment should fail.

Let me know if you need additional help

Comment 9 Toni Freger 2017-11-22 08:23:45 UTC
Or thanks for the details.

Since low spec environment without this yaml will fail, is it documented to run it with the low-memory-usage.yaml?

Comment 10 Or Idgar 2017-11-22 13:08:22 UTC

There isn't any documentation about it. we use it mainly in non production environments (devenvs, CI, etc.).

Comment 15 errata-xmlrpc 2017-12-13 21:18:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3462