1689330 – Errors when submitting VM action from global region

Bug 1689330 - Errors when submitting VM action from global region

Summary: Errors when submitting VM action from global region

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat CloudForms Management Engine
Classification:	Red Hat
Component:	Appliance
Sub Component:
Version:	5.10.2
Hardware:	All
OS:	All
Priority:	medium
Severity:	medium
Target Milestone:	GA
Target Release:	5.11.0
Assignee:	Yuri Rudman
QA Contact:	Tasos Papaioannou
Docs Contact:	Red Hat CloudForms Documentation
URL:
Whiteboard:
Depends On:
Blocks:	1693817
TreeView+	depends on / blocked

Reported:	2019-03-15 16:15 UTC by Tasos Papaioannou
Modified:	2019-12-13 15:10 UTC (History)
CC List:	5 users (show)
Fixed In Version:	5.11.0.1
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1693817 (view as bug list)
Environment:
Last Closed:	2019-12-13 15:10:00 UTC
Category:	Bug
Cloudforms Team:	CFME Core
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Tasos Papaioannou 2019-03-15 16:15:08 UTC

Description of problem:

The following BZ fixed an issue in which a VM task invoked from a global region could fail and get repeatedly requeued if the VM was removed from the remote region's database before the task is processed by the remote region.

Bug 1686035 - Multiple user logins logged every second from Global DB UI to RegionDB
https://bugzilla.redhat.com/show_bug.cgi?id=1686035

The repeated requeueing is now fixed, but there is still an issue:

1.) The first failed attempt logs a traceback.
2.) The task is requeued once.
3.) The requeued task correctly logs an error, but an AuditSuccess message gets logged as well.


Version-Release number of selected component (if applicable):

5.10.2.0

How reproducible:

100%

Steps to Reproduce:
1.) Set up a global appliance and one remote appliance, with an infrastructure provider on the remote appliance.
2.) On the global appliance, edit /var/www/miq/vmdb/app/models/mixins/process_tasks_mixin.rb, and insert a sleep between the lookup of the VM object and the task invocation, e.g.,

****

      if resource_ids.present?
        resource_ids.each do |id|
          begin
(1) -->     obj = collection.find(id)
          rescue ManageIQ::API::Client::ResourceNotFound => err
            _log.error(err.message)
          else
            _log.info("Invoking task #{action} on collection #{collection_name}, object #{obj.id}, with args #{post_args}")
            begin
(2) -->       sleep 20
(3) -->       obj.send(action, post_args)
            rescue NoMethodError => err
              _log.error(err.message)
            end
          end
        end
****

3.) Restart evmserverd on the global appliance.
4.) Submit a VM power on/off action from the global appliance UI.
5.) A couple seconds later, in the remote appliance UI, remove the VM from the database.
6.) Check the evm.log on the global appliance.

Actual results:

A traceback is logged from the first attempt, then the task is requeued, then the task logs an error message but an AuditSuccess message is also logged:

****
[----] I, [2019-03-14T10:49:57.361978 #38723:97cf5c]  INFO -- : MIQ(Vm.invoke_api_tasks) Invoking task start on collection vms, object 18, with args {}
[----] E, [2019-03-14T10:50:17.439667 #38723:97cf5c] ERROR -- : An error occurred while invoking remote tasks...Requeueing for 1 minute from now.
[----] E, [2019-03-14T10:50:17.440128 #38723:97cf5c] ERROR -- : [ManageIQ::API::Client::ResourceNotFound]: Couldn't find Vm with 'id'=18 [WHERE "vms"."type" IN ('Vm', 'ManageIQ::Providers::CloudManager::Vm', 'ManageIQ::Providers::InfraManager::Vm', 'VmServer', 'ManageIQ::Providers::Amazon::CloudManager::Vm', 'ManageIQ::Providers::Azure::CloudManager::Vm', 'ManageIQ::Providers::Google::CloudManager::Vm', 'ManageIQ::Providers::Openstack::CloudManager::Vm', 'ManageIQ::Providers::Vmware::CloudManager::Vm', 'VmXen', 'ManageIQ::Providers::Kubevirt::InfraManager::Vm', 'ManageIQ::Providers::Redhat::InfraManager::Vm', 'ManageIQ::Providers::Microsoft::InfraManager::Vm', 'ManageIQ::Providers::Vmware::InfraManager::Vm') AND "vms"."template" = $1]  Method:[rescue in block in invoke_tasks_remote]
[----] E, [2019-03-14T10:50:17.440467 #38723:97cf5c] ERROR -- : /opt/rh/cfme-gemset/gems/manageiq-api-client-0.3.3/lib/manageiq/api/client/connection.rb:108:in `check_response'
/opt/rh/cfme-gemset/gems/manageiq-api-client-0.3.3/lib/manageiq/api/client/connection.rb:102:in `send_request'
/opt/rh/cfme-gemset/gems/manageiq-api-client-0.3.3/lib/manageiq/api/client/connection.rb:34:in `post'
/opt/rh/cfme-gemset/gems/manageiq-api-client-0.3.3/lib/manageiq/api/client/client.rb:65:in `post'
/opt/rh/cfme-gemset/gems/manageiq-api-client-0.3.3/lib/manageiq/api/client/resource.rb:60:in `exec_action'
/opt/rh/cfme-gemset/gems/manageiq-api-client-0.3.3/lib/manageiq/api/client/resource.rb:46:in `method_missing'
/var/www/miq/vmdb/app/models/mixins/process_tasks_mixin.rb:130:in `block in invoke_api_tasks'
/var/www/miq/vmdb/app/models/mixins/process_tasks_mixin.rb:121:in `each'
/var/www/miq/vmdb/app/models/mixins/process_tasks_mixin.rb:121:in `invoke_api_tasks'
/var/www/miq/vmdb/app/models/mixins/process_tasks_mixin.rb:70:in `block in invoke_tasks_remote'
/var/www/miq/vmdb/app/models/mixins/process_tasks_mixin.rb:65:in `each'
/var/www/miq/vmdb/app/models/mixins/process_tasks_mixin.rb:65:in `invoke_tasks_remote'
/var/www/miq/vmdb/app/models/mixins/process_tasks_mixin.rb:38:in `invoke_tasks'
/var/www/miq/vmdb/app/models/miq_queue.rb:455:in `block in dispatch_method'
/usr/share/ruby/timeout.rb:93:in `block in timeout'
/usr/share/ruby/timeout.rb:33:in `block in catch'
/usr/share/ruby/timeout.rb:33:in `catch'
/usr/share/ruby/timeout.rb:33:in `catch'
/usr/share/ruby/timeout.rb:108:in `timeout'
/var/www/miq/vmdb/app/models/miq_queue.rb:453:in `dispatch_method'
/var/www/miq/vmdb/app/models/miq_queue.rb:430:in `block in deliver'
/var/www/miq/vmdb/app/models/user.rb:268:in `with_user'
/var/www/miq/vmdb/app/models/user.rb:283:in `with_user_group'
/var/www/miq/vmdb/app/models/miq_queue.rb:430:in `deliver'

[...]

[----] I, [2019-03-14T10:50:17.454548 #38723:97cf5c]  INFO -- : MIQ(MiqQueue.put) Message id: [99000000000127],  id: [], Zone: [default], Role: [], Server: [], MiqTask id: [], Ident: [generic], Target id: [], Instance id: [], Task id: [], Command: [Vm.invoke_tasks_remote], Timeout: [600], Priority: [100], State: [ready], Deliver On: [2019-03-14 14:51:17 UTC], Data: [], Args: [{:ids=>[18], :task=>"start", :userid=>"admin", :miq_task_id=>nil}]
[----] I, [2019-03-14T10:50:17.454983 #38723:97cf5c]  INFO -- : MIQ(MiqQueue#delivered) Message id: [99000000000120], State: [ok], Delivered in [22.701902075] seconds

[...]

[----] I, [2019-03-14T10:51:17.573208 #38723:97cf5c]  INFO -- : MIQ(MiqGenericWorker::Runner#get_message_via_drb) Message id: [99000000000127], MiqWorker id: [99000000000013], Zone: [default], Role: [], Server: [], MiqTask id: [], Ident: [generic], Target id: [], Instance id: [], Task id: [], Command: [Vm.invoke_tasks_remote], Timeout: [600], Priority: [100], State: [dequeue], Deliver On: [2019-03-14 14:51:17 UTC], Data: [], Args: [{:ids=>[18], :task=>"start", :userid=>"admin", :miq_task_id=>nil}], Dequeued in: [60.125051915] seconds
[----] I, [2019-03-14T10:51:17.573327 #38723:97cf5c]  INFO -- : MIQ(MiqQueue#deliver) Message id: [99000000000127], Delivering...
[----] E, [2019-03-14T10:51:17.964066 #38723:97cf5c] ERROR -- : MIQ(Vm.invoke_api_tasks) Couldn't find resource with 'id' [18]
[----] I, [2019-03-14T10:51:17.983566 #38723:97cf5c]  INFO -- : <AuditSuccess> MIQ(ProcessTasksMixin.task_audit_event) userid: [admin] - 'start' successfully initiated for remote VMs: [18]
[----] I, [2019-03-14T10:51:17.983841 #38723:97cf5c]  INFO -- : MIQ(MiqQueue#delivered) Message id: [99000000000127], State: [ok], Delivered in [0.410428068] seconds
****

Expected results:

The first failed attempt logs a clean error message, and there is no requeueing of the task or spurious AuditSuccess message.

Additional info:

Comment 2 CFME Bot 2019-03-18 19:45:57 UTC

https://github.com/ManageIQ/manageiq/pull/18565

Comment 3 CFME Bot 2019-03-28 15:26:32 UTC

New commit detected on ManageIQ/manageiq/master:

https://github.com/ManageIQ/manageiq/commit/b3532af44c579e4733ae9c2933dd997cf6f821a8
commit b3532af44c579e4733ae9c2933dd997cf6f821a8
Author:     Yuri Rudman <yrudman>
AuthorDate: Mon Mar 18 15:10:15 2019 -0400
Commit:     Yuri Rudman <yrudman>
CommitDate: Mon Mar 18 15:10:15 2019 -0400

    fix for hypothetical situation when remote object exists and reference to it found on global region, after that object got deleted on remote region but global region attempting to use previously found reference for Centralized Administration. What happened is: attempt to execute queue item raising 'resource not found' and message requeud. Desired behavior:  if resource not found then do not requeue message
    Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1689330

 app/models/mixins/process_tasks_mixin.rb | 11 +-
 1 file changed, 3 insertions(+), 8 deletions(-)

Comment 5 Tasos Papaioannou 2019-05-02 19:13:30 UTC

Verified on 5.11.0.2.

Note You need to log in before you can comment on or make changes to this bug.