Bug 1422671 - Seeding timeout when creating region in external database
Summary: Seeding timeout when creating region in external database
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: Appliance
Version: 5.7.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: GA
: 5.9.0
Assignee: Keenan Brock
QA Contact: luke couzens
URL:
Whiteboard: black
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-02-15 20:05 UTC by luke couzens
Modified: 2018-03-01 13:09 UTC (History)
10 users (show)

Fixed In Version: 5.9.0.1
Doc Type: Enhancement
Doc Text:
Feature: Expose environment variable SEEDING_LOCK_TIMEOUT to increase time allowed for seeding the database. Reason: High latency db connections were causing the initial seeding process to take longer than the previously allowed 10 minute window. Result: Customer can set the timeout window when seeding the database and populate the database over a WAN
Clone Of:
Environment:
Last Closed: 2018-03-01 13:09:52 UTC
Category: ---
Cloudforms Team: ---
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:0380 normal SHIPPED_LIVE Moderate: Red Hat CloudForms security, bug fix, and enhancement update 2018-03-01 18:37:12 UTC

Description luke couzens 2017-02-15 20:05:14 UTC
Description of problem:Creating a region in external database fails to start evm workers because of seeding timeout issues with appliances on different ip ranges.


Version-Release number of selected component (if applicable):5.7.1.1


How reproducible:100%


Steps to Reproduce:
1.Provision two appliances within different subnets
2.setup dedicated db on appliance 1
3.create external region using appliance 2 and pointing it at appliance 1
4.run on appliance 2 watch 'systemctl status evmserverd' 

Actual results:evm workers do not start


Expected results:evm workers start up correctly


Additional info:
Looking at the logs we can see that the seeding process times out after exactly 10 min's which in these cases does not seem to be long enough. Perhaps we should increase this limit?

[----] I, [2017-02-15T11:38:42.895136 #33343:421134]  INFO -- : MIQ(EvmDatabase.seed) Seeding...

[----] I, [2017-02-15T11:48:42.861183 #33343:421134]  INFO -- : MIQ(MiqAeYamlImportFs#add_instance) Importing instance:  <event_enforce_policy>

MIQ(Vmdb::Initializer.init) - Program Name: /opt/rh/cfme-gemset/bin/rake, PID: 33718, ENV['MIQ_GUID']: , ENV['EVMSERVER']: 
[----] I, [2017-02-15T11:48:53.806148 #33718:3e9130]  INFO -- : MIQ(EvmApplication.stop) EVM Shutdown initiated

Comment 2 Joe Rafaniello 2017-02-20 19:37:25 UTC
Hi Luke,

The 10 minutes is specified here:
https://github.com/ManageIQ/manageiq/blob/8f883cd5809a8146c5a45139d7187abd8b55deb2/lib/evm_database.rb#L68

Would you be able to measure the latency using db ping?

ruby tools/db_ping_remote.rb <host> <port> <username> [database] [adapter]

10 minutes is a long time.  Do you have logs from this run?  It would be helpful to see if there is one or two seedings that is taking 80% of the 10 minutes.

Comment 3 Dave Johnson 2017-02-22 19:23:21 UTC
Alex, have you seen any issues like this in your testing?

Comment 6 Joe Rafaniello 2017-02-27 20:55:29 UTC
Based on Luke's findings, it looks like the 42 ms db ping is causing seeding to timeout, with the following 3 class seeding as the worst:

MiqAeDatastore: times out after 6+ minutes
MiqEventDefinition:  90 seconds
Classification: 50 seconds

Below is a grep of just the seeding with some context:
grep -B 1 -E "\.seed\) Seeding|Timeout" log/evm.log



[----] I, [2017-02-27T15:29:48.385807 #41996:78f140]  INFO -- : MIQ(EvmDatabase.seed) Seeding...
[----] I, [2017-02-27T15:29:48.477654 #41996:78f140]  INFO -- : MIQ(EvmDatabase.seed) Seeding RssFeed
--
[----] I, [2017-02-27T15:29:54.817295 #41996:78f140]  INFO -- : MIQ(RssFeed.sync_from_yml_file) [vmware_vms] file has been added to disk, adding to model
[----] I, [2017-02-27T15:29:55.194695 #41996:78f140]  INFO -- : MIQ(EvmDatabase.seed) Seeding MiqWidget
--
[----] I, [2017-02-27T15:30:07.842308 #41996:78f140]  INFO -- : Widget Set: [Default Dashboard] file has been added to disk, adding to model
[----] I, [2017-02-27T15:30:09.364112 #41996:78f140]  INFO -- : MIQ(EvmDatabase.seed) Seeding MiqAction
--
[----] I, [2017-02-27T15:30:15.635406 #41996:78f140]  INFO -- : MIQ(MiqAction.create_or_update) Creating [custom_automation]
[----] I, [2017-02-27T15:30:15.780666 #41996:78f140]  INFO -- : MIQ(EvmDatabase.seed) Seeding MiqEventDefinition
--
[----] I, [2017-02-27T15:31:44.316531 #41996:78f140]  INFO -- : MIQ(MiqEventDefinition.seed_default_definitions) Creating [disk_usage_50]
[----] I, [2017-02-27T15:31:44.432447 #41996:78f140]  INFO -- : MIQ(EvmDatabase.seed) Seeding MiqPolicySet
[----] I, [2017-02-27T15:31:50.584933 #41996:78f140]  INFO -- : MIQ(EvmDatabase.seed) Seeding ArbitrationSetting
[----] I, [2017-02-27T15:31:50.808425 #41996:78f140]  INFO -- : MIQ(ArbitrationSetting.seed) Created arbitration setting with parameters {:name=>"vm_images_meta_json", :display_name=>"Default Images URL"}
[----] I, [2017-02-27T15:31:50.811176 #41996:78f140]  INFO -- : MIQ(EvmDatabase.seed) Seeding BlacklistedEvent
[----] I, [2017-02-27T15:31:56.236590 #41996:78f140]  INFO -- : MIQ(EvmDatabase.seed) Seeding ChargebackRate
--
[----] I, [2017-02-27T15:32:02.339626 #41996:78f140]  INFO -- : MIQ(ChargebackRate.seed_chargeback_rate) Creating [Default Container Image Rate] with guid=[b47a0ef0-4335-11df-aba8-001d09066d99]
[----] I, [2017-02-27T15:32:03.333906 #41996:78f140]  INFO -- : MIQ(EvmDatabase.seed) Seeding Classification
--
[----] I, [2017-02-27T15:32:53.593958 #41996:78f140]  INFO -- : MIQ(Classification.seed) Creating lifecycle
[----] I, [2017-02-27T15:32:54.293848 #41996:78f140]  INFO -- : MIQ(EvmDatabase.seed) Seeding CustomizationTemplate
--
[----] I, [2017-02-27T15:32:54.828319 #41996:78f140]  INFO -- : MIQ(CustomizationTemplate.seed) Creating {:name=>"SSH key addition template", :type=>"CustomizationTemplateCloudInit", :description=>"This template enables placing ssh public key in authorized keys"}
[----] I, [2017-02-27T15:32:54.883039 #41996:78f140]  INFO -- : MIQ(EvmDatabase.seed) Seeding Dialog
[----] I, [2017-02-27T15:32:58.705665 #41996:78f140]  INFO -- : MIQ(EvmDatabase.seed) Seeding MiqAlert
--
[----] I, [2017-02-27T15:33:07.542731 #41996:78f140]  INFO -- : MIQ(MiqAlert.seed) Added sample Alert: VM Environment Tag <> Host Environment Tag
[----] I, [2017-02-27T15:33:07.678053 #41996:78f140]  INFO -- : MIQ(EvmDatabase.seed) Seeding MiqDialog
--
[----] I, [2017-02-27T15:33:10.528254 #41996:78f140]  INFO -- : MIQ(MiqDialog.sync_from_file) [vm_migrate_dialogs] file has been added to disk, adding to model
[----] I, [2017-02-27T15:33:10.636381 #41996:78f140]  INFO -- : MIQ(EvmDatabase.seed) Seeding MiqEventDefinitionSet
[----] I, [2017-02-27T15:33:11.153350 #41996:78f140]  INFO -- : MIQ(EvmDatabase.seed) Seeding MiqPolicy
[----] I, [2017-02-27T15:33:11.196692 #41996:78f140]  INFO -- : MIQ(EvmDatabase.seed) Seeding MiqSearch
--
[----] I, [2017-02-27T15:33:26.727626 #41996:78f140]  INFO -- : MIQ(MiqSearch.seed) Creating [default_Analysis Successful]
[----] I, [2017-02-27T15:33:26.824767 #41996:78f140]  INFO -- : MIQ(EvmDatabase.seed) Seeding MiqShortcut
--
[----] I, [2017-02-27T15:33:31.785642 #41996:78f140]  INFO -- : MIQ(MiqShortcut.seed) Creating {:name=>"redhat_access_insights", :description=>"Red Hat Access Insights", :url=>"/dashboard/iframe?sid=accessInsights", :rbac_feature_name=>"redhat_access_insights", :startup=>true, :sequence=>52}
[----] I, [2017-02-27T15:33:31.916665 #41996:78f140]  INFO -- : MIQ(EvmDatabase.seed) Seeding MiqWidgetSet
[----] I, [2017-02-27T15:33:32.006523 #41996:78f140]  INFO -- : MIQ(EvmDatabase.seed) Seeding NotificationType
[----] I, [2017-02-27T15:33:33.794009 #41996:78f140]  INFO -- : MIQ(EvmDatabase.seed) Seeding OrchestrationTemplate
[----] I, [2017-02-27T15:33:34.147497 #41996:78f140]  INFO -- : MIQ(EvmDatabase.seed) Seeding PxeImageType
--
[----] I, [2017-02-27T15:33:34.550854 #41996:78f140]  INFO -- : MIQ(PxeImageType.seed) Creating {:name=>"Windows", :provision_type=>"vm"}
[----] I, [2017-02-27T15:33:34.649884 #41996:78f140]  INFO -- : MIQ(EvmDatabase.seed) Seeding ScanItem
--
[----] I, [2017-02-27T15:33:35.399252 #41996:78f140]  INFO -- : Scan Item: [sample_registry] file has been added to disk, adding to model
[----] I, [2017-02-27T15:33:37.953427 #41996:78f140]  INFO -- : MIQ(EvmDatabase.seed) Seeding TimeProfile
[----] I, [2017-02-27T15:33:38.136550 #41996:78f140]  INFO -- : MIQ(TimeProfile.seed) Creating global time profile
[----] I, [2017-02-27T15:33:38.340116 #41996:78f140]  INFO -- : MIQ(MiqQueue.put) Message id: [1000000000047],  id: [], Zone: [default], Role: [], Server: [], Ident: [generic], Target id: [], Instance id: [1000000000046], Task id: [], Command: [TimeProfile.rebuild_daily_metrics], Timeout: [3600], Priority: [100], State: [ready], Deliver On: [], Data: [], Args: []
[----] I, [2017-02-27T15:33:38.345261 #41996:78f140]  INFO -- : MIQ(EvmDatabase.seed) Seeding MiqAeDatastore
[----] I, [2017-02-27T15:33:38.454821 #41996:78f140]  INFO -- : MIQ(MiqAeDatastore.seed) Seeding ManageIQ domain...
--
[----] I, [2017-02-27T15:39:47.937642 #41996:78f140]  INFO -- : MIQ(MiqAeYamlImportFs#add_method) Importing method:    <validate_quota>
/opt/rh/cfme-gemset/gems/activerecord-5.0.0.1/lib/active_record/connection_adapters/postgresql_adapter.rb:598:in `async_exec': execution expired (Timeout::Error)

Comment 7 Joe Rafaniello 2017-02-27 21:12:25 UTC
Note, this bug is for INITIAL creation of a region on an external DB, so it's the initial seeding that is really slow on a somewhat slower connection.  It's probably also worth investigating how long a seeding of an already seeded external DB takes with the same 42 ms latency since this happens much more frequently.

Comment 8 Keenan Brock 2017-02-28 04:10:31 UTC
the initial run is 10,425 queries and 1,355 rows
the second run is 1,853 queries and 5,201 rows

there is an environment variable SKIP_PRIMORDIAL_SEED which when run
the run is 1 query and 0 rows (counts don't show up)

Comment 9 Keenan Brock 2017-03-03 01:34:00 UTC
Looking into what changes will be necessary if we move the seeding out of the startup process.

Comment 12 CFME Bot 2017-08-01 14:11:35 UTC
New commit detected on ManageIQ/manageiq/master:
https://github.com/ManageIQ/manageiq/commit/d082c47618bdf6f6436a4ce30a29c8228cbacde0

commit d082c47618bdf6f6436a4ce30a29c8228cbacde0
Author:     Keenan Brock <kbrock@redhat.com>
AuthorDate: Mon Jul 17 14:40:13 2017 -0400
Commit:     Keenan Brock <kbrock@redhat.com>
CommitDate: Tue Aug 1 09:33:09 2017 -0400

    Add $SEEDING_LOCK_TIMEOUT for high latency seeding
    
    Seeding over a WAN can take longer than 10 minutes.
    This allows the timeout to be increased temporarily.
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1422671

 lib/evm_database.rb | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

Comment 13 Keenan Brock 2017-08-03 13:39:59 UTC
An environment variable SEEDING_LOCK_TIMEOUT was added to allow seeding to take longer.

In development I use it the following way:

    vmdb
    SEEDING_LOCK_TIMEOUT=1800 bundle exec rake db:seed

This allows me to populate the server with a timeout of 30 minutes

All subsequent boots will faster and the 10 minute timeout should work fine.


see also: https://github.com/ManageIQ/manageiq/pull/15595


Just to reiterate: this product was designed for the appliances to run near the database. Running with a high latency db connection may cause all sorts of issues, some of which will take not be very obvious and difficult to debug.

Comment 14 luke couzens 2017-10-12 12:38:48 UTC
Verified in 5.9.0.2

Comment 17 errata-xmlrpc 2018-03-01 13:09:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:0380


Note You need to log in before you can comment on or make changes to this bug.