Bug 1326436 - Azure - Default Credentials valid - Refresh status failed
Summary: Azure - Default Credentials valid - Refresh status failed
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: Providers
Version: 5.5.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: GA
: 5.6.0
Assignee: Daniel Berger
QA Contact: Jeff Teehan
URL:
Whiteboard: azure
Depends On:
Blocks: 1327723
TreeView+ depends on / blocked
 
Reported: 2016-04-12 16:42 UTC by ldomb
Modified: 2019-11-14 07:46 UTC (History)
12 users (show)

Fixed In Version: 5.6.0.4
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1327723 (view as bug list)
Environment:
Last Closed: 2016-06-29 15:48:40 UTC
Category: ---
Cloudforms Team: ---
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:1348 0 normal SHIPPED_LIVE CFME 5.6.0 bug fixes and enhancement update 2016-06-29 18:50:04 UTC

Description ldomb 2016-04-12 16:42:12 UTC
Description of problem:
When adding the azure provider I see the following error in the logs

Deliver On: [], Data: [], Args: [[["ManageIQ::Providers::Azure::CloudManager", 2]]]
[----] E, [2016-04-12T08:02:53.817362 #26686:749988] ERROR -- : MIQ(ManageIQ::Providers::Azure::CloudManager::Refresher#refresh) EMS: [AzureUS-EST], id: [2] Refresh failed
[----] E, [2016-04-12T08:02:53.817659 #26686:749988] ERROR -- : /var/www/miq/vmdb/app/models/manageiq/providers/azure/cloud_manager/refresh_parser.rb:468:in `parse_image'
/var/www/miq/vmdb/app/models/manageiq/providers/azure/cloud_manager/refresh_parser.rb:124:in `block in get_images'
/var/www/miq/vmdb/app/models/manageiq/providers/azure/cloud_manager/refresh_parser.rb:133:in `block in process_collection'
/var/www/miq/vmdb/app/models/manageiq/providers/azure/cloud_manager/refresh_parser.rb:132:in `each'
/var/www/miq/vmdb/app/models/manageiq/providers/azure/cloud_manager/refresh_parser.rb:132:in `process_collection'
/var/www/miq/vmdb/app/models/manageiq/providers/azure/cloud_manager/refresh_parser.rb:124:in `get_images'
/var/www/miq/vmdb/app/models/manageiq/providers/azure/cloud_manager/refresh_parser.rb:40:in `ems_inv_to_hashes'
/var/www/miq/vmdb/app/models/manageiq/providers/azure/cloud_manager/refresh_parser.rb:10:in `ems_inv_to_hashes'
/var/www/miq/vmdb/app/models/manageiq/providers/azure/cloud_manager/refresher.rb:6:in `parse_inventory'
[----] E, [2016-04-12T08:02:53.817702 #26686:749988] ERROR -- : MIQ(ManageIQ::Providers::Azure::CloudManager::Refresher#refresh) EMS: [AzureUS-EST], id: [2] Unable to perform refresh for the following targets:
[----] E, [2016-04-12T08:02:53.817854 #26686:749988] ERROR -- : MIQ(ManageIQ::Providers::Azure::CloudManager::Refresher#refresh)  --- ManageIQ::Providers::Azure::CloudManager [AzureUS-EST] id [2]

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Add an azure provider
2.
3.

Actual results:
fails

Expected results:
should be able to parse images

Additional info:

Comment 2 ldomb 2016-04-12 16:42:57 UTC
cfme version is 
5.5.2.4.20160127105142_395c086

Comment 5 Daniel Berger 2016-04-13 21:40:07 UTC
I can't tell for certain from the backtrace, but we suspect it's a url parsing issue. Line 468 is this:

    :name => uid.split(%r{Microsoft.Compute\/}i)[1].split('.').first

However, if this image is not coming from the Microsoft.Compute provider then it will fail. I think a more robust solution is this:

    File.basename(image.name).split('.').first

However, that will remove the container name, and I'm not sure if that matters. I want to check with Bronagh first.

Comment 6 Daniel Berger 2016-04-14 15:09:38 UTC
In a call with the client today were able to confirm that my fix does work, though we refined it a bit. Further refinement after the fact shows that this is a better solution:

    File.join(File.dirname(image.name), File.basename(File.basename(image.name, '.'), '.')

This will retain the container path information, and doesn't accidentally chop off parts of the image name if it includes OS version information. It only chops off the .vhd and the Azure GUID.

Comment 8 Daniel Berger 2016-04-14 15:21:10 UTC
Line 468 of vmdb/app/models/manageiq/providers/azure/cloud_manager/refresh_parser.rb.

So this line:

    :name => uid.split(%r{Microsoft.Compute\/}i)[1].split('.').first,

Should be:

    :name => File.join(File.dirname(image.name), File.basename(File.basename(name, '.*'), '.*'),

Comment 10 Daniel Berger 2016-04-14 15:40:26 UTC
The customer used a slightly different version for a hotfix, but it works. That solution does not include the container path. The one I just posted does.

If you work with him today, feel free to use the newer version. Also, just realized there's a parens missing. So, it should be:

    :name => File.join(File.dirname(image.name), File.basename(File.basename(name, '.*'), '.*')),

Comment 11 Daniel Berger 2016-04-14 16:30:02 UTC
One more time:

    :name => File.join(File.dirname(image.name), File.basename(File.basename(image.name, '.*'), '.*')),

Comment 14 Daniel Berger 2016-04-18 13:37:55 UTC
Colin, I'm afraid I don't understand what "The Microsoft.compute/virtualMachines resource does not seem to get the correct return code so that the method exists." means.

Can you clarify?

Comment 16 Daniel Berger 2016-04-18 15:22:38 UTC
Ok, the current issue is something for the automate team then, and Colin will create a separate BZ. This was fixed with:

https://github.com/ManageIQ/manageiq/pull/7967

Moving to POST...

Comment 17 Bronagh Sorota 2016-04-18 16:12:53 UTC
commit 9b25a474d1def449ef5193f43fc503c2f40d8cbd
Author: Daniel Berger <dberger>
Date:   Thu Apr 14 10:46:59 2016 -0600

    Modify the build_image_name so that it no longer uses a regex to determine the name.

diff --git a/app/models/manageiq/providers/azure/cloud_manager/refresh_parser.rb b/app/models/manageiq/providers/azure/cloud_manager/refresh_parser.rb
index a98b478..81f6238 100644

Comment 19 Jeff Teehan 2016-04-18 18:54:14 UTC
I agree with Bill that it seems to come and go, but unfortunately it happens more than half the time and now with 5.6B2 as well.  Both of these VMs deployed just fine.  One still pending and one error.

Ok	Pending	415,000,000,000,004	Administrator	Service Provision		Provisioning Service [AddVmExistingRG] from [AddVmExistingRG]	Approved	Administrator	04/18/16 16:28:28 UTC	04/18/16 16:28:10 UTC	04/18/16 16:31:25 UTC	Auto-Approved	stack has not been deployed	Region 415

and 

Error	Finished	415,000,000,000,003	Administrator	VM Provision	04/18/16 18:49:01 UTC	Provision from [Images/templates/tmpl-osDisk] to [WS12B2418a]	Approved	Administrator	04/18/16 16:17:13 UTC	04/18/16 16:16:54 UTC	04/18/16 18:49:01 UTC	Auto-Approved	Validating New Vm	Region 415

I see the following provisioningStates when creating the Instance

    "provisioningState": "Creating"
    "provisioningState": "Succeeded"

At no time does it ever say "Running"

Are we using the latest api-version?  Does it make a difference?
https://management.azure.com/subscriptions/xxxx/resourceGroups/cfmeqe/providers/Microsoft.Compute/virtualMachines/WS12ImageMaster?api-version=2016-03-30

Anyway, 5.5.3.4 worked every time and 5.6.0.1, not so much.

Here is the json response to compare against.
http://pastebin.test.redhat.com/366662

Comment 20 Jeff Teehan 2016-04-18 19:00:42 UTC
Here's the instance state for that VM
http://pastebin.test.redhat.com/366665

I don't think we should be using that since the statuses returns an array of statuses, including running or not, but I don't really know.  I did have the same issue with SCVMM automation where I intermittently received the "wrong" status.

Comment 21 Bill Wei 2016-04-18 19:23:20 UTC
Jeff,

We don't rely on the array of resource statuses to determine the stack status. The provider has a dedicate status field for the stack. It cannot change to succeeded if any of the underling resource does not complete. It is also possible that the stack status never changes to succeeded due to a bug even if all the resources complete. Anyway we are purely relying on the status reported by Azure to complete our service provisioning.

Also note the wording for the status of the VM could be different when you view it from directly browsing VM vs from deployment operations (how we parse the resource status) of a stack provisioning.

Comment 22 Jeff Teehan 2016-04-18 19:41:12 UTC
That's fine. I'm just pointing out that the provisioning status on both failures is clearly set to succeeded, which would rule out "an intermittent problem coming from Azure provider".  There does not appear to be any evidence that this is an Azure issue.  If I can help, please let me know.  Thanks.

Comment 23 Bill Wei 2016-04-18 20:37:06 UTC
Jeff,
On the contrary I think it is an Azure issue. When you see the provisioning seems hanging in our requests table, you should verify the live provisioningState (and timestamp) of the template deployment, not the VM, in Azure portal.

Comment 24 Jeff Teehan 2016-04-26 18:42:53 UTC
This current shows a state of finished and the orchestration stack now shows a status of succeeded, and the last provider refresh was successful.  I'm not sure what to make of the comments as it implies that this is not a bug but it also has a reference to a fix for the issue.

For now, the parsing stuff is fixed.  If the "pending" issue returns, we'll have to redesign how we poll for success.

Comment 26 errata-xmlrpc 2016-06-29 15:48:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1348


Note You need to log in before you can comment on or make changes to this bug.