Description of problem: replication failing in large multi-region environment when using rubyrep replication at the 5.5.3.4 code level Version-Release number of selected component (if applicable):5.5.3.4 How reproducible: not yet determined Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Created attachment 1170295 [details] Line graph showing count of rubyrep child processes by parent process id from CFME archive log set
Created attachment 1170298 [details] standard top reports for replication applaince showing growth of processes counts, and swap invasion with very low CPU utilization
Created attachment 1170301 [details] pid 30741 replication process logs
Created attachment 1170302 [details] process 31738 ruby replication log
Created attachment 1170305 [details] process 4425 log from ruby replication process
Output when running replication on the command line via rake task: root@vc4fh1lpdb01:/var/www/miq/vmdb # rake evm:dbsync:replicate Replicating Region (20) to remote database... /opt/rh/cfme-gemset/bundler/gems/rubyrep- 07f85029dcb2/lib/rubyrep/table_spec_resolver.rb:91: warning: encoding option is ignored - U unknown OID 0: failed to recognize type of 'version'. It will be treated as String. /opt/rh/cfme-gemset/bundler/gems/rubyrep- 07f85029dcb2/lib/rubyrep/table_spec_resolver.rb:91: warning: encoding option is ignored - U Verifying RubyRep tables Checking for and removing rubyrep triggers from unconfigured tables /opt/rh/cfme-gemset/bundler/gems/rubyrep- 07f85029dcb2/lib/rubyrep/table_spec_resolver.rb:91: warning: encoding option is ignored - U /opt/rh/cfme-gemset/bundler/gems/rubyrep- 07f85029dcb2/lib/rubyrep/table_spec_resolver.rb:91: warning: encoding option is ignored - U /opt/rh/cfme-gemset/bundler/gems/rubyrep- 07f85029dcb2/lib/rubyrep/table_spec_resolver.rb:91: warning: encoding option is ignored - U Exception caught: PG::ReadOnlySqlTransaction: ERROR: cannot execute DELETE in a read-only transaction : delete from rr20_sync_state where table_name = 'schema_migrations' /opt/rh/cfme-gemset/gems/activerecord- 4.2.6/lib/active_record/connection_adapters/postgresql/database_stateme nts.rb:155:in `async_exec' /opt/rh/cfme-gemset/gems/activerecord- 4.2.6/lib/active_record/connection_adapters/postgresql/database_stateme nts.rb:155:in `block in execute' /opt/rh/cfme-gemset/gems/activerecord- 4.2.6/lib/active_record/connection_adapters/abstract_adapter.rb:472:in `block in log' /opt/rh/cfme-gemset/gems/activesupport- 4.2.6/lib/active_support/notifications/instrumenter.rb:20:in `instrument' /opt/rh/cfme-gemset/gems/activerecord- 4.2.6/lib/active_record/connection_adapters/abstract_adapter.rb:466:in `log' /opt/rh/cfme-gemset/gems/activerecord- 4.2.6/lib/active_record/connection_adapters/postgresql/database_stateme nts.rb:154:in `execute' /opt/rh/cfme-gemset/bundler/gems/rubyrep- 07f85029dcb2/lib/rubyrep/replication_initializer.rb:327:in `block in restore_specified_tables' /opt/rh/cfme-gemset/bundler/gems/rubyrep- 07f85029dcb2/lib/rubyrep/replication_initializer.rb:319:in `each' /opt/rh/cfme-gemset/bundler/gems/rubyrep- 07f85029dcb2/lib/rubyrep/replication_initializer.rb:319:in `restore_specified_tables' /opt/rh/cfme-gemset/bundler/gems/rubyrep- 07f85029dcb2/lib/rubyrep/replication_initializer.rb:304:in `block in restore_unconfigured_tables' /opt/rh/cfme-gemset/bundler/gems/rubyrep- 07f85029dcb2/lib/rubyrep/replication_initializer.rb:301:in `each' /opt/rh/cfme-gemset/bundler/gems/rubyrep- 07f85029dcb2/lib/rubyrep/replication_initializer.rb:301:in `restore_unconfigured_tables' /opt/rh/cfme-gemset/bundler/gems/rubyrep- 07f85029dcb2/lib/rubyrep/replication_initializer.rb:352:in `prepare_replication' /opt/rh/cfme-gemset/bundler/gems/rubyrep- 07f85029dcb2/lib/rubyrep/replication_runner.rb:118:in `prepare_replication' /opt/rh/cfme-gemset/bundler/gems/rubyrep- 07f85029dcb2/lib/rubyrep/replication_runner.rb:139:in `execute' /opt/rh/cfme-gemset/bundler/gems/rubyrep- 07f85029dcb2/lib/rubyrep/replication_runner.rb:165:in `run' /opt/rh/cfme-gemset/bundler/gems/rubyrep- 07f85029dcb2/lib/rubyrep/command_runner.rb:102:in `run' /var/www/miq/vmdb/lib/tasks/evm_dbsync.rake:192:in `run_rr_command' /var/www/miq/vmdb/lib/tasks/evm_dbsync.rake:66:in `block (3 levels) in <top (required)>' /opt/rh/cfme-gemset/gems/rake-10.5.0/lib/rake/task.rb:240:in `call' /opt/rh/cfme-gemset/gems/rake-10.5.0/lib/rake/task.rb:240:in `block in execute' /opt/rh/cfme-gemset/gems/rake-10.5.0/lib/rake/task.rb:235:in `each' /opt/rh/cfme-gemset/gems/rake-10.5.0/lib/rake/task.rb:235:in `execute' /opt/rh/cfme-gemset/gems/rake-10.5.0/lib/rake/task.rb:179:in `block in invoke_with_call_chain' /opt/rh/rh-ruby22/root/usr/share/ruby/monitor.rb:211:in `mon_synchronize' /opt/rh/cfme-gemset/gems/rake-10.5.0/lib/rake/task.rb:172:in `invoke_with_call_chain' /opt/rh/cfme-gemset/gems/rake-10.5.0/lib/rake/task.rb:165:in `invoke' /opt/rh/cfme-gemset/gems/rake-10.5.0/lib/rake/application.rb:150:in `invoke_task' /opt/rh/cfme-gemset/gems/rake-10.5.0/lib/rake/application.rb:106:in `block (2 levels) in top_level' /opt/rh/cfme-gemset/gems/rake-10.5.0/lib/rake/application.rb:106:in `each' /opt/rh/cfme-gemset/gems/rake-10.5.0/lib/rake/application.rb:106:in `block in top_level' /opt/rh/cfme-gemset/gems/rake-10.5.0/lib/rake/application.rb:115:in `run_with_threads' /opt/rh/cfme-gemset/gems/rake-10.5.0/lib/rake/application.rb:100:in `top_level' /opt/rh/cfme-gemset/gems/rake-10.5.0/lib/rake/application.rb:78:in `block in run' /opt/rh/cfme-gemset/gems/rake-10.5.0/lib/rake/application.rb:176:in `standard_exception_handling' /opt/rh/cfme-gemset/gems/rake-10.5.0/lib/rake/application.rb:75:in `run' /opt/rh/cfme-gemset/gems/rake-10.5.0/bin/rake:33:in `<top (required)>' /opt/rh/cfme-gemset/bin/rake:23:in `load' /opt/rh/cfme-gemset/bin/rake:23:in `<main>'
link to postgresql logs from the vmdb source database from the beginning of June 2016 can be obtained from http://file.rdu.redhat.com/~thenness/Verizon%20BZ%201348576%20compressed%20postgresql%20logs/ June 1 - June 20 2016
From Chris Adams: 3:32 PM (1 hour ago) to me, Jared, Nick, Gregg, Bill Tom, Here is the most recent error I see. The initial db:reset works but when I try a db:replicate, it fails. Replicating Region (10) to remote database... /opt/rh/cfme-gemset/bundler/gems/rubyrep- 07f85029dcb2/lib/rubyrep/table_spec_resolver.rb:91: warning: encoding option is ignored - U unknown OID 0: failed to recognize type of 'version'. It will be treated as String. /opt/rh/cfme-gemset/bundler/gems/rubyrep- 07f85029dcb2/lib/rubyrep/table_spec_resolver.rb:91: warning: encoding option is ignored - U Verifying RubyRep tables Checking for and removing rubyrep triggers from unconfigured tables /opt/rh/cfme-gemset/bundler/gems/rubyrep- 07f85029dcb2/lib/rubyrep/table_spec_resolver.rb:91: warning: encoding option is ignored - U /opt/rh/cfme-gemset/bundler/gems/rubyrep- 07f85029dcb2/lib/rubyrep/table_spec_resolver.rb:91: warning: encoding option is ignored - U /opt/rh/cfme-gemset/bundler/gems/rubyrep- 07f85029dcb2/lib/rubyrep/table_spec_resolver.rb:91: warning: encoding option is ignored - U Verifying rubyrep triggers of configured tables /opt/rh/cfme-gemset/bundler/gems/rubyrep- 07f85029dcb2/lib/rubyrep/table_spec_resolver.rb:91: warning: encoding option is ignored - U /opt/rh/cfme-gemset/bundler/gems/rubyrep- 07f85029dcb2/lib/rubyrep/table_spec_resolver.rb:91: warning: encoding option is ignored - U /opt/rh/cfme-gemset/bundler/gems/rubyrep- 07f85029dcb2/lib/rubyrep/table_spec_resolver.rb:91: warning: encoding option is ignored - U Starting replication -./opt/rh/cfme-gemset/bundler/gems/rubyrep- 07f85029dcb2/lib/rubyrep/table_spec_resolver.rb:91: warning: encoding option is ignored - U unknown OID 0: failed to recognize type of 'id'. It will be treated as String. 2016-06-28T15:26:42-04:00 Exception caught: replication run timed out /opt/rh/cfme-gemset/bundler/gems/rubyrep- 07f85029dcb2/lib/rubyrep/replication_runner.rb:129:in `execute_once' /opt/rh/cfme-gemset/bundler/gems/rubyrep- 07f85029dcb2/lib/rubyrep/replication_runner.rb:143:in `execute' /opt/rh/cfme-gemset/bundler/gems/rubyrep- 07f85029dcb2/lib/rubyrep/replication_runner.rb:165:in `run' /opt/rh/cfme-gemset/bundler/gems/rubyrep- 07f85029dcb2/lib/rubyrep/command_runner.rb:102:in `run' /var/www/miq/vmdb/lib/tasks/evm_dbsync.rake:1 92:in `run_rr_command' /var/www/miq/vmdb/lib/tasks/evm_dbsync.rake:6 6:in `block (3 levels) in <top (required)>' /opt/rh/cfme-gemset/gems/rake- 10.5.0/lib/rake/task.rb:240:in `call' /opt/rh/cfme-gemset/gems/rake- 10.5.0/lib/rake/task.rb:240:in `block in execute' /opt/rh/cfme-gemset/gems/rake- 10.5.0/lib/rake/task.rb:235:in `each' /opt/rh/cfme-gemset/gems/rake- 10.5.0/lib/rake/task.rb:235:in `execute' /opt/rh/cfme-gemset/gems/rake- 10.5.0/lib/rake/task.rb:179:in `block in invoke_with_call_chain' /opt/rh/rh- ruby22/root/usr/share/ruby/monitor.rb:211:in `mon_synchronize' /opt/rh/cfme-gemset/gems/rake- 10.5.0/lib/rake/task.rb:172:in `invoke_with_call_chain' /opt/rh/cfme-gemset/gems/rake- 10.5.0/lib/rake/task.rb:165:in `invoke' /opt/rh/cfme-gemset/gems/rake- 10.5.0/lib/rake/application.rb:150:in `invoke_task' /opt/rh/cfme-gemset/gems/rake- 10.5.0/lib/rake/application.rb:106:in `block (2 levels) in top_level' /opt/rh/cfme-gemset/gems/rake- 10.5.0/lib/rake/application.rb:106:in `each' /opt/rh/cfme-gemset/gems/rake- 10.5.0/lib/rake/application.rb:106:in `block in top_level' /opt/rh/cfme-gemset/gems/rake- 10.5.0/lib/rake/application.rb:115:in `run_with_threads' /opt/rh/cfme-gemset/gems/rake- 10.5.0/lib/rake/application.rb:100:in `top_level' /opt/rh/cfme-gemset/gems/rake- 10.5.0/lib/rake/application.rb:78:in `block in run' /opt/rh/cfme-gemset/gems/rake- 10.5.0/lib/rake/application.rb:176:in `standard_exception_handling' /opt/rh/cfme-gemset/gems/rake- 10.5.0/lib/rake/application.rb:75:in `run' /opt/rh/cfme-gemset/gems/rake- 10.5.0/bin/rake:33:in `<top (required)>' /opt/rh/cfme-gemset/bin/rake:23:in `load' /opt/rh/cfme-gemset/bin/rake:23:in `<main>' /opt/rh/cfme-gemset/bundler/gems/rubyrep- 07f85029dcb2/lib/rubyrep/table_spec_resolver.rb:91: warning: encoding option is ignored - U Exception caught: table 'binary_blob_parts' does not exist /opt/rh/cfme-gemset/bundler/gems/rubyrep- 07f85029dcb2/lib/rubyrep/connection_extenders/postgresql_extender.rb:12 7:in `primary_key_names' /opt/rh/cfme-gemset/bundler/gems/rubyrep- 07f85029dcb2/lib/rubyrep/proxy_connection.rb:151:in `primary_key_names' /opt/rh/cfme-gemset/bundler/gems/rubyrep- 07f85029dcb2/lib/rubyrep/session.rb:50:in `block in manual_primary_keys' /opt/rh/cfme-gemset/bundler/gems/rubyrep- 07f85029dcb2/lib/rubyrep/session.rb:46:in `each' /opt/rh/cfme-gemset/bundler/gems/rubyrep- 07f85029dcb2/lib/rubyrep/session.rb:46:in `manual_primary_keys' /opt/rh/cfme-gemset/bundler/gems/rubyrep- 07f85029dcb2/lib/rubyrep/session.rb:215:in `connect_database' /opt/rh/cfme-gemset/bundler/gems/rubyrep- 07f85029dcb2/lib/rubyrep/session.rb:179:in `block in refresh_database_connection' Tom Hennessy <thenness> 3:54 PM (48 minutes ago) to cadams, Jared, Nick, Gregg, Bill Chris, Are the load-balancers in play here and if so which load balancers? both local and rollup VMDB? local only? VMDB Only? Please advise. Thomas Hennessy Red Hat, Inc email: thenness irc: thennessy cadams 3:57 PM (44 minutes ago) to me, Jared, Nick, Gregg, Bill Hi Tom, The load balancers are totally removed from the configs. We have also disabled HA on the regional postgresql DB. I currently have the VMware team here providing some more resources (CPU and RAM) for the regional DB's. Once the additional resources are available, I'll try the replication job again. Thanks, Chris cadams 4:21 PM (20 minutes ago) to me, Jared, Nick, Gregg, Bill Hi Tom, We had the CPU count bumped up on the regional DB but still see the error. root@vc4fa1lpui01:/var/www/miq/vmdb # rake evm:dbsync:replicate Replicating Region (10) to remote database... /opt/rh/cfme-gemset/bundler/gems/rubyrep-07f85029dcb2/lib/rubyrep/table_spec_resolver.rb:91: warning: encoding option is ignored - U unknown OID 0: failed to recognize type of 'version'. It will be treated as String. /opt/rh/cfme-gemset/bundler/gems/rubyrep-07f85029dcb2/lib/rubyrep/table_spec_resolver.rb:91: warning: encoding option is ignored - U Verifying RubyRep tables Checking for and removing rubyrep triggers from unconfigured tables /opt/rh/cfme-gemset/bundler/gems/rubyrep-07f85029dcb2/lib/rubyrep/table_spec_resolver.rb:91: warning: encoding option is ignored - U /opt/rh/cfme-gemset/bundler/gems/rubyrep-07f85029dcb2/lib/rubyrep/table_spec_resolver.rb:91: warning: encoding option is ignored - U /opt/rh/cfme-gemset/bundler/gems/rubyrep-07f85029dcb2/lib/rubyrep/table_spec_resolver.rb:91: warning: encoding option is ignored - U Verifying rubyrep triggers of configured tables /opt/rh/cfme-gemset/bundler/gems/rubyrep-07f85029dcb2/lib/rubyrep/table_spec_resolver.rb:91: warning: encoding option is ignored - U /opt/rh/cfme-gemset/bundler/gems/rubyrep-07f85029dcb2/lib/rubyrep/table_spec_resolver.rb:91: warning: encoding option is ignored - U /opt/rh/cfme-gemset/bundler/gems/rubyrep-07f85029dcb2/lib/rubyrep/table_spec_resolver.rb:91: warning: encoding option is ignored - U Starting replication -./opt/rh/cfme-gemset/bundler/gems/rubyrep-07f85029dcb2/lib/rubyrep/table_spec_resolver.rb:91: warning: encoding option is ignored - U unknown OID 0: failed to recognize type of 'id'. It will be treated as String. 2016-06-28T16:18:25-04:00 Exception caught: replication run timed out /opt/rh/cfme-gemset/bundler/gems/rubyrep-07f85029dcb2/lib/rubyrep/replication_runner.rb:129:in `execute_once' /opt/rh/cfme-gemset/bundler/gems/rubyrep-07f85029dcb2/lib/rubyrep/replication_runner.rb:143:in `execute' /opt/rh/cfme-gemset/bundler/gems/rubyrep-07f85029dcb2/lib/rubyrep/replication_runner.rb:165:in `run' /opt/rh/cfme-gemset/bundler/gems/rubyrep-07f85029dcb2/lib/rubyrep/command_runner.rb:102:in `run' /var/www/miq/vmdb/lib/tasks/evm_dbsync.rake:192:in `run_rr_command' /var/www/miq/vmdb/lib/tasks/evm_dbsync.rake:66:in `block (3 levels) in <top (required)>' /opt/rh/cfme-gemset/gems/rake-10.5.0/lib/rake/task.rb:240:in `call' /opt/rh/cfme-gemset/gems/rake-10.5.0/lib/rake/task.rb:240:in `block in execute' /opt/rh/cfme-gemset/gems/rake-10.5.0/lib/rake/task.rb:235:in `each' /opt/rh/cfme-gemset/gems/rake-10.5.0/lib/rake/task.rb:235:in `execute' /opt/rh/cfme-gemset/gems/rake-10.5.0/lib/rake/task.rb:179:in `block in invoke_with_call_chain' /opt/rh/rh-ruby22/root/usr/share/ruby/monitor.rb:211:in `mon_synchronize' /opt/rh/cfme-gemset/gems/rake-10.5.0/lib/rake/task.rb:172:in `invoke_with_call_chain' /opt/rh/cfme-gemset/gems/rake-10.5.0/lib/rake/task.rb:165:in `invoke' /opt/rh/cfme-gemset/gems/rake-10.5.0/lib/rake/application.rb:150:in `invoke_task' /opt/rh/cfme-gemset/gems/rake-10.5.0/lib/rake/application.rb:106:in `block (2 levels) in top_level' /opt/rh/cfme-gemset/gems/rake-10.5.0/lib/rake/application.rb:106:in `each' /opt/rh/cfme-gemset/gems/rake-10.5.0/lib/rake/application.rb:106:in `block in top_level' /opt/rh/cfme-gemset/gems/rake-10.5.0/lib/rake/application.rb:115:in `run_with_threads' /opt/rh/cfme-gemset/gems/rake-10.5.0/lib/rake/application.rb:100:in `top_level' /opt/rh/cfme-gemset/gems/rake-10.5.0/lib/rake/application.rb:78:in `block in run' /opt/rh/cfme-gemset/gems/rake-10.5.0/lib/rake/application.rb:176:in `standard_exception_handling' /opt/rh/cfme-gemset/gems/rake-10.5.0/lib/rake/application.rb:75:in `run' /opt/rh/cfme-gemset/gems/rake-10.5.0/bin/rake:33:in `<top (required)>' /opt/rh/cfme-gemset/bin/rake:23:in `load' /opt/rh/cfme-gemset/bin/rake:23:in `<main>' /opt/rh/cfme-gemset/bundler/gems/rubyrep-07f85029dcb2/lib/rubyrep/table_spec_resolver.rb:91: warning: encoding option is ignored - U Exception caught: table 'conditions_miq_policies' does not exist /opt/rh/cfme-gemset/bundler/gems/rubyrep-07f85029dcb2/lib/rubyrep/connection_extenders/postgresql_extender.rb:127:in `primary_key_names' /opt/rh/cfme-gemset/bundler/gems/rubyrep-07f85029dcb2/lib/rubyrep/proxy_connection.rb:151:in `primary_key_names' /opt/rh/cfme-gemset/bundler/gems/rubyrep-07f85029dcb2/lib/rubyrep/session.rb:50:in `block in manual_primary_keys' /opt/rh/cfme-gemset/bundler/gems/rubyrep-07f85029dcb2/lib/rubyrep/session.rb:46:in `each' /opt/rh/cfme-gemset/bundler/gems/rubyrep-07f85029dcb2/lib/rubyrep/session.rb:46:in `manual_primary_keys' /opt/rh/cfme-gemset/bundler/gems/rubyrep-07f85029dcb2/lib/rubyrep/session.rb:215:in `connect_database' /opt/rh/cfme-gemset/bundler/gems/rubyrep-07f85029dcb2/lib/rubyrep/session.rb:179:in `block in refresh_database_connection' Click here to Reply, Reply to all, or Forward
Provided a patch and it seems to fix the issue, opened the following PR on rubyrep with the same changes. https://github.com/ManageIQ/rubyrep/pull/9
https://github.com/ManageIQ/manageiq/pull/9629
New commit detected on ManageIQ/manageiq/master: https://github.com/ManageIQ/manageiq/commit/f1aade5dde31a16a43fd1c8901514dcd9023cc56 commit f1aade5dde31a16a43fd1c8901514dcd9023cc56 Author: Nick Carboni <ncarboni> AuthorDate: Wed Jul 6 11:14:23 2016 -0400 Commit: Nick Carboni <ncarboni> CommitDate: Wed Jul 6 11:14:23 2016 -0400 Update rubyrep to include nested transaction fix This was required because rubyrep would open a transaction per 1000 rows and also a transaction for each change it replicated. If the inner transaction failed for any reason, the outer transaction would also be aborted because postgresql does not really support nested transactions. This would cause the replicate process to timeout because no sql could then be processed in the aborted transaction. https://github.com/ManageIQ/rubyrep/pull/9 https://bugzilla.redhat.com/show_bug.cgi?id=1348576 Gemfile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
Does this need a backport to 5.5.z and if so, will it actually get in?