Bug 1554049

Summary: Not able to scan instances in AWS
Product: Red Hat CloudForms Management Engine Reporter: ldomb
Component: SmartState AnalysisAssignee: Hui Song <hsong>
Status: CLOSED CURRENTRELEASE QA Contact: Satyajit Bulage <sbulage>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 5.9.0CC: akarol, cpelland, kmorey, obarenbo, roliveri
Target Milestone: GAKeywords: TestOnly, ZStream
Target Release: 5.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 5.10.0.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1558040 (view as bug list) Environment:
Last Closed: 2019-02-11 14:01:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: CFME Core Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1558040    

Description ldomb 2018-03-11 00:48:03 UTC
Description of problem:
When running smart state in aws the IAM policy seems to have a race condition with the atomic host starting before the IAM role is applied

Version-Release number of selected component (if applicable):

5.9.0.22.20180221205805_f93a675
How reproducible:

Steps to Reproduce:
1. Upload the VHD to S3
2. Create the AMI
3. Launch the instance with a t2.xlarge
4. configure the appliance
5. Enter creds for the provider (admin role)
6. Run smartstate on an instance

Actual results:

[----] I, [2018-03-10T19:33:50.632916 #4885:12fb13c]  INFO -- : MIQ(ManageIQ::Providers::Amazon::AgentCoordinator#find_or_create_keypair) KeyPair smartstate-4f39d9b3-147d-45a7-8e3f-618468e8b9eb will be created!
[----] I, [2018-03-10T19:33:52.953688 #2874:12fb13c]  INFO -- : MIQ(MiqGenericWorker::Runner#get_message_via_drb) Message id: [99000000000462], MiqWorker id: [99000000000001], Zone: [default], Role: [smartstate], Server: [], Ident: [generic], Target id: [], Instance id: [], Task id: [job_message_1520728429], Command: [Job.update_message], Timeout: [600], Priority: [100], State: [dequeue], Deliver On: [], Data: [], Args: ["a7f0f3b2-7909-4c53-846f-527269efc468", "Scanning in AWS region us-east-1 in progress"], Dequeued in: [3.034605964] seconds
[----] I, [2018-03-10T19:33:55.683549 #4885:12fb13c]  INFO -- : MIQ(ManageIQ::Providers::Amazon::AgentCoordinator#get_agent_image_id) AMI Image: RHEL-Atomic_7.4_HVM_GA-20180104-x86_64-1-Access2-GP2 [ami-d97120a3] is used to launch smartstate agent.
[----] E, [2018-03-10T19:33:55.967812 #4885:12fb13c] ERROR -- : MIQ(ManageIQ::Providers::Amazon::AgentCoordinator#startup_agent) No agent is set up to process requests: Value (smartstate) for parameter iamInstanceProfile.name is invalid. Invalid IAM Instance Profile name




Expected results:
That it works

Additional info:

Comment 5 Hui Song 2018-03-12 21:45:26 UTC
Based on the Amazon documents (https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html#launch-instance-with-role-console), I'll add a retry logic to solve this issue.

--------
After you've created an IAM role, you can launch an instance, and associate that role with the instance during launch.

Important

After you create an IAM role, it may take several seconds for the permissions to propagate. If your first attempt to launch an instance with a role fails, wait a few seconds before trying again. For more information, see Troubleshooting Working with Roles in the IAM User Guide.
-------

Comment 7 CFME Bot 2018-03-16 19:41:53 UTC
New commit detected on ManageIQ/manageiq-providers-amazon/master:

https://github.com/ManageIQ/manageiq-providers-amazon/commit/b6e1528a4c78bdeda0034500b747d5f0b65eed7b
commit b6e1528a4c78bdeda0034500b747d5f0b65eed7b
Author:     hsong-rh <hsong>
AuthorDate: Thu Mar 15 10:27:20 2018 -0400
Commit:     hsong-rh <hsong>
CommitDate: Thu Mar 15 10:27:20 2018 -0400

    Fixes to add a retry logic in instance creation when IAM role is not available immediately

    https://bugzilla.redhat.com/show_bug.cgi?id=1554049

 app/models/manageiq/providers/amazon/agent_coordinator.rb | 57 +-
 1 file changed, 40 insertions(+), 17 deletions(-)

Comment 9 Satyajit Bulage 2018-10-23 15:02:08 UTC
Successfully performed SSA on EC2 instances 3-4 times, removed docker instance during each scan.

Verified Version: 5.10.0.20.20181016163900_fe677b4