Description of problem: We keep seeing user logins every few seconds for one user. When we check the logs, we see they're API logins from the Global DB UI appliance. Version-Release number of selected component (if applicable): 5.9.2.4 How reproducible: Repeatedly Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: I think we found the fix. I went into the global db and did: SELECT * FROM public.miq_queue where method_name = 'invoke_tasks_remote' It returned 5 tasks that just seemed to keep recreating themselves. I then stopped EVM in the global region and ran: DELETE * FROM public.miq_queue where method_name = 'invoke_tasks_remote' I then started EVM back up. I am no longer seeing the logins on the Danbury region. Appears some MIQ code that seems to just recreate tasks when they fail. In this case, it appears that if it fails due to a resource not being found, it should just fail, not keep trying. Or a least fail after a certain number of tries. This has been retrying since at least 6/17 it appears.
https://github.com/ManageIQ/manageiq/pull/17745
New commit detected on ManageIQ/manageiq/master: https://github.com/ManageIQ/manageiq/commit/a6343cfb71f22b050156bedff73eed55cd33190e commit a6343cfb71f22b050156bedff73eed55cd33190e Author: Brandon Dunne <brandondunne> AuthorDate: Mon Jul 23 14:38:46 2018 -0400 Commit: Brandon Dunne <brandondunne> CommitDate: Mon Jul 23 14:38:46 2018 -0400 Avoid raising and re-queueing when the remote resource is not found Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1599754 Gemfile | 2 +- app/models/mixins/process_tasks_mixin.rb | 11 +- spec/models/mixins/process_tasks_mixin_spec.rb | 13 + 3 files changed, 22 insertions(+), 4 deletions(-)
There is still a window that allows the original issue to occur, between (1) and (2) in the code below: ./app/models/mixins/process_tasks_mixin.rb: **** if resource_ids.present? resource_ids.each do |id| begin (1) --> obj = collection.find(id) rescue ManageIQ::API::Client::ResourceNotFound => err _log.error(err.message) else _log.info("Invoking task #{action} on collection #{collection_name}, object #{obj.id}, with args #{post_args}") begin (2) --> obj.send(action, post_args) rescue NoMethodError => err _log.error(err.message) end end end **** I can simulate the issue by adding a short sleep() call between (1) and (2) on the global appliance, then: 1.) Submit a VM start/stop action from the global appliance 2.) While the global appliance's invoke_api_tasks sleeps, remove the VM from the VMDB in the remote appliance's web UI. An exception gets logged, and the task gets requeued indefinitely. The first exception looks like this: **** [----] I, [2018-10-05T11:51:07.705759 #2157:e28f74] INFO -- : MIQ(Vm.invoke_api_tasks) Invoking task start on collection vms, object 117, with args {} [----] E, [2018-10-05T11:51:07.739843 #2157:e28f74] ERROR -- : An error occurred while invoking remote tasks...Requeueing for 1 minute from now. [----] E, [2018-10-05T11:51:07.740151 #2157:e28f74] ERROR -- : [RuntimeError]: Couldn't find Vm with 'id'=117 [WHERE "vms"."type" IN ('Vm', 'ManageIQ::Provide rs::CloudManager::Vm', 'ManageIQ::Providers::InfraManager::Vm', 'VmServer', 'ManageIQ::Providers::Amazon::CloudManager::Vm', 'ManageIQ::Providers::Azure::Clou dManager::Vm', 'ManageIQ::Providers::Google::CloudManager::Vm', 'ManageIQ::Providers::Openstack::CloudManager::Vm', 'ManageIQ::Providers::Vmware::CloudManager ::Vm', 'VmXen', 'ManageIQ::Providers::Kubevirt::InfraManager::Vm', 'ManageIQ::Providers::Redhat::InfraManager::Vm', 'ManageIQ::Providers::Microsoft::InfraMana ger::Vm', 'ManageIQ::Providers::Vmware::InfraManager::Vm') AND "vms"."template" = $1] Method:[rescue in block in invoke_tasks_remote] [----] E, [2018-10-05T11:51:07.740310 #2157:e28f74] ERROR -- : /opt/rh/cfme-gemset/gems/manageiq-api-client-0.3.1/lib/manageiq/api/client/connection.rb:108:in `check_response' /opt/rh/cfme-gemset/gems/manageiq-api-client-0.3.1/lib/manageiq/api/client/connection.rb:102:in `send_request' /opt/rh/cfme-gemset/gems/manageiq-api-client-0.3.1/lib/manageiq/api/client/connection.rb:34:in `post' /opt/rh/cfme-gemset/gems/manageiq-api-client-0.3.1/lib/manageiq/api/client/client.rb:65:in `post' /opt/rh/cfme-gemset/gems/manageiq-api-client-0.3.1/lib/manageiq/api/client/resource.rb:60:in `exec_action' /opt/rh/cfme-gemset/gems/manageiq-api-client-0.3.1/lib/manageiq/api/client/resource.rb:46:in `method_missing' /var/www/miq/vmdb/app/models/mixins/process_tasks_mixin.rb:131:in `block in invoke_api_tasks' /var/www/miq/vmdb/app/models/mixins/process_tasks_mixin.rb:121:in `each' /var/www/miq/vmdb/app/models/mixins/process_tasks_mixin.rb:121:in `invoke_api_tasks' /var/www/miq/vmdb/app/models/mixins/process_tasks_mixin.rb:70:in `block in invoke_tasks_remote' [...] [----] I, [2018-10-05T11:51:07.750884 #2157:e28f74] INFO -- : MIQ(MiqQueue.put) Message id: [99000000000350], id: [], Zone: [default], Role: [], Server: [], MiqTask id: [], Ident: [generic], Target id: [], Instance id: [], Task id: [], Command: [Vm.invoke_tasks_remote], Timeout: [600], Priority: [100], State: [ready], Deliver On: [2018-10-05 15:52:07 UTC], Data: [], Args: [{:ids=>[117], :task=>"start", :userid=>"admin", :miq_task_id=>nil}] **** All later exceptions look like: **** [----] E, [2018-10-05T13:30:16.497205 #2165:e28f74] ERROR -- : An error occurred while invoking remote tasks...Requeueing for 1 minute from now. [----] E, [2018-10-05T13:30:16.498234 #2165:e28f74] ERROR -- : [NoMethodError]: undefined method `id' for nil:NilClass Method:[rescue in block in invoke_tasks_remote] [----] E, [2018-10-05T13:30:16.498374 #2165:e28f74] ERROR -- : /var/www/miq/vmdb/app/models/mixins/process_tasks_mixin.rb:129:in `block in invoke_api_tasks' /var/www/miq/vmdb/app/models/mixins/process_tasks_mixin.rb:121:in `each' /var/www/miq/vmdb/app/models/mixins/process_tasks_mixin.rb:121:in `invoke_api_tasks' /var/www/miq/vmdb/app/models/mixins/process_tasks_mixin.rb:70:in `block in invoke_tasks_remote' /var/www/miq/vmdb/app/models/mixins/process_tasks_mixin.rb:65:in `each' /var/www/miq/vmdb/app/models/mixins/process_tasks_mixin.rb:65:in `invoke_tasks_remote' /var/www/miq/vmdb/app/models/miq_queue.rb:455:in `block in dispatch_method' [...] /var/www/miq/vmdb/lib/workers/evm_server.rb:27:in `start' /var/www/miq/vmdb/lib/workers/evm_server.rb:48:in `start' /var/www/miq/vmdb/lib/workers/bin/evm_server.rb:4:in `<main>' [----] I, [2018-10-05T13:30:16.510014 #2165:e28f74] INFO -- : MIQ(MiqQueue.put) Message id: [99000000000713], id: [], Zone: [default], Role: [], Server: [], MiqTask id: [], Ident: [generic], Target id: [], Instance id: [], Task id: [], Command: [Vm.invoke_tasks_remote], Timeout: [600], Priority: [100], State: [ready], Deliver On: [2018-10-05 17:31:16 UTC], Data: [], Args: [{:ids=>[117], :task=>"start", :userid=>"admin", :miq_task_id=>nil}] ****
Hi Tasos, In real life, there is only a fraction of a second between the time that we find the resource using the api client and send the action to it. In the edge case that the resource is found then disappears before we can send the action, yes it would raise an exception that is unhandled and get requeued once. However, the second time around the remote resource will not be found, no exception should be raised, and it should not get requeued again. Can you please verify this? As for the undefined method `id' for nil:NilClass, I'd need access to the appliance to see what's going on there.
New commit detected on ManageIQ/manageiq/master: https://github.com/ManageIQ/manageiq/commit/30c46272c33ddb3432a85cb99fd3898a1c5b5114 commit 30c46272c33ddb3432a85cb99fd3898a1c5b5114 Author: Brandon Dunne <brandondunne> AuthorDate: Thu Nov 8 10:44:36 2018 -0500 Commit: Brandon Dunne <brandondunne> CommitDate: Thu Nov 8 10:44:36 2018 -0500 Upgrade to the latest manageiq-api-client Catch other 404 errors and raise them as ResourceNotFound Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1599754 Gemfile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
Fixed by https://github.com/ManageIQ/manageiq/pull/18491
New commit detected on ManageIQ/manageiq/master: https://github.com/ManageIQ/manageiq/commit/1f074d7d020842b4239f3d2f85134e3bd32280c0 commit 1f074d7d020842b4239f3d2f85134e3bd32280c0 Author: Brandon Dunne <brandondunne> AuthorDate: Tue Feb 26 11:48:52 2019 -0500 Commit: Brandon Dunne <brandondunne> CommitDate: Tue Feb 26 11:48:52 2019 -0500 Update to the latest manageiq-api-client Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1599754 Gemfile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
Verified on 5.11.0.2.