Description of problem:Creating a region in external database fails to start evm workers because of seeding timeout issues with appliances on different ip ranges. Version-Release number of selected component (if applicable):5.7.1.1 How reproducible:100% Steps to Reproduce: 1.Provision two appliances within different subnets 2.setup dedicated db on appliance 1 3.create external region using appliance 2 and pointing it at appliance 1 4.run on appliance 2 watch 'systemctl status evmserverd' Actual results:evm workers do not start Expected results:evm workers start up correctly Additional info: Looking at the logs we can see that the seeding process times out after exactly 10 min's which in these cases does not seem to be long enough. Perhaps we should increase this limit? [----] I, [2017-02-15T11:38:42.895136 #33343:421134] INFO -- : MIQ(EvmDatabase.seed) Seeding... [----] I, [2017-02-15T11:48:42.861183 #33343:421134] INFO -- : MIQ(MiqAeYamlImportFs#add_instance) Importing instance: <event_enforce_policy> MIQ(Vmdb::Initializer.init) - Program Name: /opt/rh/cfme-gemset/bin/rake, PID: 33718, ENV['MIQ_GUID']: , ENV['EVMSERVER']: [----] I, [2017-02-15T11:48:53.806148 #33718:3e9130] INFO -- : MIQ(EvmApplication.stop) EVM Shutdown initiated
Hi Luke, The 10 minutes is specified here: https://github.com/ManageIQ/manageiq/blob/8f883cd5809a8146c5a45139d7187abd8b55deb2/lib/evm_database.rb#L68 Would you be able to measure the latency using db ping? ruby tools/db_ping_remote.rb <host> <port> <username> [database] [adapter] 10 minutes is a long time. Do you have logs from this run? It would be helpful to see if there is one or two seedings that is taking 80% of the 10 minutes.
Alex, have you seen any issues like this in your testing?
Based on Luke's findings, it looks like the 42 ms db ping is causing seeding to timeout, with the following 3 class seeding as the worst: MiqAeDatastore: times out after 6+ minutes MiqEventDefinition: 90 seconds Classification: 50 seconds Below is a grep of just the seeding with some context: grep -B 1 -E "\.seed\) Seeding|Timeout" log/evm.log [----] I, [2017-02-27T15:29:48.385807 #41996:78f140] INFO -- : MIQ(EvmDatabase.seed) Seeding... [----] I, [2017-02-27T15:29:48.477654 #41996:78f140] INFO -- : MIQ(EvmDatabase.seed) Seeding RssFeed -- [----] I, [2017-02-27T15:29:54.817295 #41996:78f140] INFO -- : MIQ(RssFeed.sync_from_yml_file) [vmware_vms] file has been added to disk, adding to model [----] I, [2017-02-27T15:29:55.194695 #41996:78f140] INFO -- : MIQ(EvmDatabase.seed) Seeding MiqWidget -- [----] I, [2017-02-27T15:30:07.842308 #41996:78f140] INFO -- : Widget Set: [Default Dashboard] file has been added to disk, adding to model [----] I, [2017-02-27T15:30:09.364112 #41996:78f140] INFO -- : MIQ(EvmDatabase.seed) Seeding MiqAction -- [----] I, [2017-02-27T15:30:15.635406 #41996:78f140] INFO -- : MIQ(MiqAction.create_or_update) Creating [custom_automation] [----] I, [2017-02-27T15:30:15.780666 #41996:78f140] INFO -- : MIQ(EvmDatabase.seed) Seeding MiqEventDefinition -- [----] I, [2017-02-27T15:31:44.316531 #41996:78f140] INFO -- : MIQ(MiqEventDefinition.seed_default_definitions) Creating [disk_usage_50] [----] I, [2017-02-27T15:31:44.432447 #41996:78f140] INFO -- : MIQ(EvmDatabase.seed) Seeding MiqPolicySet [----] I, [2017-02-27T15:31:50.584933 #41996:78f140] INFO -- : MIQ(EvmDatabase.seed) Seeding ArbitrationSetting [----] I, [2017-02-27T15:31:50.808425 #41996:78f140] INFO -- : MIQ(ArbitrationSetting.seed) Created arbitration setting with parameters {:name=>"vm_images_meta_json", :display_name=>"Default Images URL"} [----] I, [2017-02-27T15:31:50.811176 #41996:78f140] INFO -- : MIQ(EvmDatabase.seed) Seeding BlacklistedEvent [----] I, [2017-02-27T15:31:56.236590 #41996:78f140] INFO -- : MIQ(EvmDatabase.seed) Seeding ChargebackRate -- [----] I, [2017-02-27T15:32:02.339626 #41996:78f140] INFO -- : MIQ(ChargebackRate.seed_chargeback_rate) Creating [Default Container Image Rate] with guid=[b47a0ef0-4335-11df-aba8-001d09066d99] [----] I, [2017-02-27T15:32:03.333906 #41996:78f140] INFO -- : MIQ(EvmDatabase.seed) Seeding Classification -- [----] I, [2017-02-27T15:32:53.593958 #41996:78f140] INFO -- : MIQ(Classification.seed) Creating lifecycle [----] I, [2017-02-27T15:32:54.293848 #41996:78f140] INFO -- : MIQ(EvmDatabase.seed) Seeding CustomizationTemplate -- [----] I, [2017-02-27T15:32:54.828319 #41996:78f140] INFO -- : MIQ(CustomizationTemplate.seed) Creating {:name=>"SSH key addition template", :type=>"CustomizationTemplateCloudInit", :description=>"This template enables placing ssh public key in authorized keys"} [----] I, [2017-02-27T15:32:54.883039 #41996:78f140] INFO -- : MIQ(EvmDatabase.seed) Seeding Dialog [----] I, [2017-02-27T15:32:58.705665 #41996:78f140] INFO -- : MIQ(EvmDatabase.seed) Seeding MiqAlert -- [----] I, [2017-02-27T15:33:07.542731 #41996:78f140] INFO -- : MIQ(MiqAlert.seed) Added sample Alert: VM Environment Tag <> Host Environment Tag [----] I, [2017-02-27T15:33:07.678053 #41996:78f140] INFO -- : MIQ(EvmDatabase.seed) Seeding MiqDialog -- [----] I, [2017-02-27T15:33:10.528254 #41996:78f140] INFO -- : MIQ(MiqDialog.sync_from_file) [vm_migrate_dialogs] file has been added to disk, adding to model [----] I, [2017-02-27T15:33:10.636381 #41996:78f140] INFO -- : MIQ(EvmDatabase.seed) Seeding MiqEventDefinitionSet [----] I, [2017-02-27T15:33:11.153350 #41996:78f140] INFO -- : MIQ(EvmDatabase.seed) Seeding MiqPolicy [----] I, [2017-02-27T15:33:11.196692 #41996:78f140] INFO -- : MIQ(EvmDatabase.seed) Seeding MiqSearch -- [----] I, [2017-02-27T15:33:26.727626 #41996:78f140] INFO -- : MIQ(MiqSearch.seed) Creating [default_Analysis Successful] [----] I, [2017-02-27T15:33:26.824767 #41996:78f140] INFO -- : MIQ(EvmDatabase.seed) Seeding MiqShortcut -- [----] I, [2017-02-27T15:33:31.785642 #41996:78f140] INFO -- : MIQ(MiqShortcut.seed) Creating {:name=>"redhat_access_insights", :description=>"Red Hat Access Insights", :url=>"/dashboard/iframe?sid=accessInsights", :rbac_feature_name=>"redhat_access_insights", :startup=>true, :sequence=>52} [----] I, [2017-02-27T15:33:31.916665 #41996:78f140] INFO -- : MIQ(EvmDatabase.seed) Seeding MiqWidgetSet [----] I, [2017-02-27T15:33:32.006523 #41996:78f140] INFO -- : MIQ(EvmDatabase.seed) Seeding NotificationType [----] I, [2017-02-27T15:33:33.794009 #41996:78f140] INFO -- : MIQ(EvmDatabase.seed) Seeding OrchestrationTemplate [----] I, [2017-02-27T15:33:34.147497 #41996:78f140] INFO -- : MIQ(EvmDatabase.seed) Seeding PxeImageType -- [----] I, [2017-02-27T15:33:34.550854 #41996:78f140] INFO -- : MIQ(PxeImageType.seed) Creating {:name=>"Windows", :provision_type=>"vm"} [----] I, [2017-02-27T15:33:34.649884 #41996:78f140] INFO -- : MIQ(EvmDatabase.seed) Seeding ScanItem -- [----] I, [2017-02-27T15:33:35.399252 #41996:78f140] INFO -- : Scan Item: [sample_registry] file has been added to disk, adding to model [----] I, [2017-02-27T15:33:37.953427 #41996:78f140] INFO -- : MIQ(EvmDatabase.seed) Seeding TimeProfile [----] I, [2017-02-27T15:33:38.136550 #41996:78f140] INFO -- : MIQ(TimeProfile.seed) Creating global time profile [----] I, [2017-02-27T15:33:38.340116 #41996:78f140] INFO -- : MIQ(MiqQueue.put) Message id: [1000000000047], id: [], Zone: [default], Role: [], Server: [], Ident: [generic], Target id: [], Instance id: [1000000000046], Task id: [], Command: [TimeProfile.rebuild_daily_metrics], Timeout: [3600], Priority: [100], State: [ready], Deliver On: [], Data: [], Args: [] [----] I, [2017-02-27T15:33:38.345261 #41996:78f140] INFO -- : MIQ(EvmDatabase.seed) Seeding MiqAeDatastore [----] I, [2017-02-27T15:33:38.454821 #41996:78f140] INFO -- : MIQ(MiqAeDatastore.seed) Seeding ManageIQ domain... -- [----] I, [2017-02-27T15:39:47.937642 #41996:78f140] INFO -- : MIQ(MiqAeYamlImportFs#add_method) Importing method: <validate_quota> /opt/rh/cfme-gemset/gems/activerecord-5.0.0.1/lib/active_record/connection_adapters/postgresql_adapter.rb:598:in `async_exec': execution expired (Timeout::Error)
Note, this bug is for INITIAL creation of a region on an external DB, so it's the initial seeding that is really slow on a somewhat slower connection. It's probably also worth investigating how long a seeding of an already seeded external DB takes with the same 42 ms latency since this happens much more frequently.
the initial run is 10,425 queries and 1,355 rows the second run is 1,853 queries and 5,201 rows there is an environment variable SKIP_PRIMORDIAL_SEED which when run the run is 1 query and 0 rows (counts don't show up)
Looking into what changes will be necessary if we move the seeding out of the startup process.
https://github.com/ManageIQ/manageiq/pull/15582
https://github.com/ManageIQ/manageiq/pull/15595
New commit detected on ManageIQ/manageiq/master: https://github.com/ManageIQ/manageiq/commit/d082c47618bdf6f6436a4ce30a29c8228cbacde0 commit d082c47618bdf6f6436a4ce30a29c8228cbacde0 Author: Keenan Brock <kbrock> AuthorDate: Mon Jul 17 14:40:13 2017 -0400 Commit: Keenan Brock <kbrock> CommitDate: Tue Aug 1 09:33:09 2017 -0400 Add $SEEDING_LOCK_TIMEOUT for high latency seeding Seeding over a WAN can take longer than 10 minutes. This allows the timeout to be increased temporarily. https://bugzilla.redhat.com/show_bug.cgi?id=1422671 lib/evm_database.rb | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
An environment variable SEEDING_LOCK_TIMEOUT was added to allow seeding to take longer. In development I use it the following way: vmdb SEEDING_LOCK_TIMEOUT=1800 bundle exec rake db:seed This allows me to populate the server with a timeout of 30 minutes All subsequent boots will faster and the 10 minute timeout should work fine. see also: https://github.com/ManageIQ/manageiq/pull/15595 Just to reiterate: this product was designed for the appliances to run near the database. Running with a high latency db connection may cause all sorts of issues, some of which will take not be very obvious and difficult to debug.
Verified in 5.9.0.2
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:0380