Bug 1678564

Summary: smartstate analysis with AWS EC2 instance is timing out
Product: Red Hat CloudForms Management Engine Reporter: Rahul Chincholkar <rchincho>
Component: SmartState AnalysisAssignee: Hui Song <hsong>
Status: CLOSED DUPLICATE QA Contact: Satyajit Bulage <sbulage>
Severity: high Docs Contact: Red Hat CloudForms Documentation <cloudforms-docs>
Priority: high    
Version: 5.9.4CC: dmetzger, mshriver, obarenbo, rchincho
Target Milestone: GA   
Target Release: 5.10.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-02-22 07:21:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: Bug
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: CFME Core Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1679295    

Description Rahul Chincholkar 2019-02-19 06:02:30 UTC
Description of problem:
Smartstate analysis for AWS EC2 instance times out.

In my reproducer environment, I observed below log trace during the smartstate analysis:
~~~
[----] I, [2019-02-19T00:35:13.734389 #43896:ee1114]  INFO -- : MIQ(MiqGenericWorker::Runner#get_message_via_drb) Message id: [1000000023799], MiqWorker id: [1000000000038], Zone: [default], Role: [smartstate], Server: [], Ident: [generic], Target id: [], Instance id: [], Task id: [job_message_1550554511], Command: [Job.update_message], Timeout: [600], Priority: [100], State: [dequeue], Deliver On: [], Data: [], Args: ["5bf409ac-46b6-4871-8d74-15be26d10524", "Scanning in AWS region ap-south-1 in progress"], Dequeued in: [2.703450702] seconds
[----] I, [2019-02-19T00:35:13.735275 #43896:ee1114]  INFO -- : Q-task_id([job_dispatcher]) MIQ(MiqQueue#deliver) Message id: [1000000023799], Delivering...
[----] I, [2019-02-19T00:35:13.741327 #43896:ee1114]  INFO -- : Q-task_id([job_dispatcher]) JOB([5bf409ac-46b6-4871-8d74-15be26d10524] Message update: [Scanning in AWS region ap-south-1 in progress]
[----] I, [2019-02-19T00:35:13.780959 #43896:ee1114]  INFO -- : Q-task_id([job_dispatcher]) MIQ(MiqQueue#delivered) Message id: [1000000023799], State: [ok], Delivered in [0.045773102] seconds
[----] I, [2019-02-19T00:35:18.404120 #10490:ee1114]  INFO -- : MIQ(ManageIQ::Providers::Amazon::AgentCoordinatorWorker::Runner#do_work) Alive agents in EMS(guid=1dc2c437-45db-47aa-9770-3e73c1d651e4): [].
[----] I, [2019-02-19T00:35:18.511459 #10490:ee1114]  INFO -- : MIQ(ManageIQ::Providers::Amazon::AgentCoordinator#deploy_agent) Deploying agent ...
[----] I, [2019-02-19T00:35:18.525629 #10490:ee1114]  INFO -- : MIQ(ManageIQ::Providers::Amazon::AgentCoordinator#find_or_create_keypair) KeyPair smartstate-1dc2c437-45db-47aa-9770-3e73c1d651e4 will be created!
[----] E, [2019-02-19T00:35:19.815090 #10490:ee1114] ERROR -- : MIQ(ManageIQ::Providers::Amazon::AgentCoordinator#startup_agent) No agent is set up to process requests: undefined method `gateway_id' for nil:NilClass
[----] I, [2019-02-19T00:35:21.622408 #10490:ee1114]  INFO -- : MIQ(MiqQueue.put) Message id: [1000000023801],  id: [], Zone: [default], Role: [automate], Server: [], Ident: [generic], Target id: [], Instance id: [], Task id: [], Command: [MiqAeEngine.deliver], Timeout: [3600], Priority: [20], State: [ready], Deliver On: [], Data: [], Args: [{:object_type=>"ManageIQ::Providers::Amazon::CloudManager::Vm", :object_id=>1000000000097, :attrs=>{:event_type=>"vm_scan_abort", "VmOrTemplate::vm"=>1000000000097, :vm_id=>1000000000097, :host=>nil, "MiqEvent::miq_event"=>1000000001218, :miq_event_id=>1000000001218, "EventStream::event_stream"=>1000000001218, :event_stream_id=>1000000001218}, :instance_name=>"Event", :user_id=>1000000000001, :miq_group_id=>1000000000001, :tenant_id=>1000000000001, :automate_message=>nil}]
[----] E, [2019-02-19T00:35:21.622619 #10490:ee1114] ERROR -- : MIQ(VmScan#process_abort) job aborting, undefined method `gateway_id' for nil:NilClass
~~~

And the smartstate analysis times out with below error:
~~~
ERROR -- : MIQ(VmScan#process_abort) job aborting, job timed out after 3059.463626564 seconds of inactivity.  Inactivity threshold [3000 seconds]
~~~

Version-Release number of selected component (if applicable):
CFME 5.9.4.7

How reproducible:
Always

Steps to Reproduce:
1. Add AWS provider
2. Configure SmartState Docker to perform SmartState Analysis on AWS
(I have put my Red Hat account credentials here)
3. Perform smartstate analysis on one of the AWS EC2 instance

Actual results:
Smartstate analysis fails

Expected results:
Smartstate analysis should succeed

Comment 2 Dave Johnson 2019-02-19 09:01:12 UTC
Please assess the impact of this issue and update the severity accordingly.  Please refer to https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity for a reminder on each severity's definition.

If it's something like a tracker bug where it doesn't matter, please set the severity to Low.

Comment 3 Dave Johnson 2019-02-19 11:01:10 UTC
Please assess the impact of this issue and update the severity accordingly.  Please refer to https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity for a reminder on each severity's definition.

If it's something like a tracker bug where it doesn't matter, please set the severity to Low.

Comment 6 Rahul Chincholkar 2019-02-22 07:21:09 UTC

*** This bug has been marked as a duplicate of bug 1677268 ***