Bug 1342649

Summary: Azure request remains Active even after instance is fully provisioned
Product: Red Hat CloudForms Management Engine Reporter: David La Motta <dlamotta>
Component: ProvisioningAssignee: Daniel Berger <dberger>
Status: CLOSED CURRENTRELEASE QA Contact: Jeff Teehan <jteehan>
Severity: medium Docs Contact:
Priority: high    
Version: 5.6.0CC: abellott, cpelland, dajohnso, dberger, jhardy, obarenbo, simaishi
Target Milestone: GAKeywords: TestOnly, ZStream
Target Release: 5.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: provider:azure:provision
Fixed In Version: 5.7.0.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1350448 (view as bug list) Environment:
Last Closed: 2017-01-11 20:04:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: Azure Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1350448    

Description David La Motta 2016-06-03 18:45:27 UTC
Description of problem:
When using service provisioning from template to create a self-service catalog for creating Azure instances, the request remains Active even after the Azure instance is created successfully.


Version-Release number of selected component (if applicable):
5.6.0.9-rc2.20160531154046_b4e2f6d


How reproducible:
Always


Steps to Reproduce:
1. Create a new catalog item with an Azure type and use the ManageIQ/Service/Provisioning/StateMachines/ServiceProvision_Template/CatalogItemInitialization state machine for provisioning
2. Use a service dialog whose only field is vm_name (I don't think the dialog matters much, but this is what I used to test)
3. Fill all pertinent data in the Request Info tab, specifying changeme for the instance name.

Actual results:
The instance is provisioned correctly but the request remains in Active state.

Expected results:
The instance is provisioned correctly and the request has a Finished state.


Additional info:

Comment 2 Shveta 2016-06-08 18:57:11 UTC
I was not able to reproduce this bug as azure provisioning is blocked by https://bugzilla.redhat.com/show_bug.cgi?id=1344103

Comment 4 Jeff Teehan 2016-06-22 21:05:17 UTC
So, if I had to guess, I would say it is all subsequent calls to that user image which leave it in the active state.

I think the state changes from Creating, to Succeeded, to Running and we're querying at too large an interval and missing the succeeded state.  Dan Berger and I talked about this a while back.  I think that once the provision has started, we need to redirect the REST query to the VM itself.  Bill Wei is against this idea IIRC, but the issue is not going away.

Anyway, here are my last two entries, both using the same image.  They completed a long time ago and work just fine.  I'll test my first time hypothesis after 5.6 ships.

Ok	Active	622,000,000,000,002	Administrator	VM Provision		Provision from [Microsoft.Compute/Images/templates/tmpl-osDisk] to [RetireFrom12]	Approved	Administrator	06/22/16 20:39:39 UTC	06/22/16 20:39:20 UTC	06/22/16 20:46:44 UTC	Auto-Approved	Validating New Vm	Region 622
View this item	Ok	Active	622,000,000,000,001	Administrator	VM Provision		Provision from [Microsoft.Compute/Images/templates/tmpl-osDisk] to [Delete12a]	Approved	Administrator	06/22/16 19:37:20 UTC	06/22/16 19:37:03 UTC	06/22/16 19:49:45 UTC	Auto-Approved	Validating New Vm	Region 622

Comment 7 Bill Wei 2016-07-11 14:12:26 UTC
The problem was due to refresh failure after the VM was provisioned. The automate checks whether the VM has been added to VMDB through the refresh process. If the refresh fails, the automate does not know but keep on querying until it eventually times out (seems hanging).

Because the automate does not know for sure who issues the current refreshing request, it cannot rely on the refresh error. There is no plan to enhance this part yet.

The Azure provider refresh failed at listing private images which requires a lot of API calls. Azure denies if the number of API calls exceeds certain limit. This explains why the problem is not always reproducible, depending on the usage of the testing environment.

There is an effort to reduce the number of API calls for listing private images, thus greatly reduce the chance of hanging issue reported here. Reassign to Dan Berger who will fix the refresh issue.

Comment 8 Daniel Berger 2016-07-22 18:24:20 UTC
https://github.com/ManageIQ/manageiq/pull/10003

Comment 9 Jeff Teehan 2016-09-09 22:03:14 UTC
Now I see this new error in 5.7.0.0  Find me next week and we'll check the appliance.  10.16.7.232

The VM did show up working just fine.

Error	Finished	908,000,000,000,001	Administrator	VM Provision	09/09/16 22:01:28 UTC	Provision from [Microsoft.Compute/Images/templates/tmpl-osDisk] to [QuickTest5700]	Approved	Administrator	09/08/16 16:55:49 UTC	09/08/16 16:55:33 UTC	09/09/16 22:01:28 UTC	Auto-Approved	[EVM] VM [QuickTest5700] Step [CheckProvisioned] Status [State=<CheckProvisioned> running raised exception: <number of retries <101> exceeded maximum of <100>>] Message [Validating New Vm]

Comment 10 Jeff Teehan 2016-10-11 19:55:55 UTC
There may have been an Azure issue at the time.  I've run at least 50 more attempts without issue.  I'm going to clear the needinfo flag and move this on to verified.

Comment 11 David La Motta 2016-10-19 13:45:06 UTC
Jeff, thanks for verifying. For the record, I just tried this out and the task goes to Finished as expected.