Description of problem: When adding the azure provider I see the following error in the logs Deliver On: [], Data: [], Args: [[["ManageIQ::Providers::Azure::CloudManager", 2]]] [----] E, [2016-04-12T08:02:53.817362 #26686:749988] ERROR -- : MIQ(ManageIQ::Providers::Azure::CloudManager::Refresher#refresh) EMS: [AzureUS-EST], id: [2] Refresh failed [----] E, [2016-04-12T08:02:53.817659 #26686:749988] ERROR -- : /var/www/miq/vmdb/app/models/manageiq/providers/azure/cloud_manager/refresh_parser.rb:468:in `parse_image' /var/www/miq/vmdb/app/models/manageiq/providers/azure/cloud_manager/refresh_parser.rb:124:in `block in get_images' /var/www/miq/vmdb/app/models/manageiq/providers/azure/cloud_manager/refresh_parser.rb:133:in `block in process_collection' /var/www/miq/vmdb/app/models/manageiq/providers/azure/cloud_manager/refresh_parser.rb:132:in `each' /var/www/miq/vmdb/app/models/manageiq/providers/azure/cloud_manager/refresh_parser.rb:132:in `process_collection' /var/www/miq/vmdb/app/models/manageiq/providers/azure/cloud_manager/refresh_parser.rb:124:in `get_images' /var/www/miq/vmdb/app/models/manageiq/providers/azure/cloud_manager/refresh_parser.rb:40:in `ems_inv_to_hashes' /var/www/miq/vmdb/app/models/manageiq/providers/azure/cloud_manager/refresh_parser.rb:10:in `ems_inv_to_hashes' /var/www/miq/vmdb/app/models/manageiq/providers/azure/cloud_manager/refresher.rb:6:in `parse_inventory' [----] E, [2016-04-12T08:02:53.817702 #26686:749988] ERROR -- : MIQ(ManageIQ::Providers::Azure::CloudManager::Refresher#refresh) EMS: [AzureUS-EST], id: [2] Unable to perform refresh for the following targets: [----] E, [2016-04-12T08:02:53.817854 #26686:749988] ERROR -- : MIQ(ManageIQ::Providers::Azure::CloudManager::Refresher#refresh) --- ManageIQ::Providers::Azure::CloudManager [AzureUS-EST] id [2] Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. Add an azure provider 2. 3. Actual results: fails Expected results: should be able to parse images Additional info:
cfme version is 5.5.2.4.20160127105142_395c086
I can't tell for certain from the backtrace, but we suspect it's a url parsing issue. Line 468 is this: :name => uid.split(%r{Microsoft.Compute\/}i)[1].split('.').first However, if this image is not coming from the Microsoft.Compute provider then it will fail. I think a more robust solution is this: File.basename(image.name).split('.').first However, that will remove the container name, and I'm not sure if that matters. I want to check with Bronagh first.
In a call with the client today were able to confirm that my fix does work, though we refined it a bit. Further refinement after the fact shows that this is a better solution: File.join(File.dirname(image.name), File.basename(File.basename(image.name, '.'), '.') This will retain the container path information, and doesn't accidentally chop off parts of the image name if it includes OS version information. It only chops off the .vhd and the Azure GUID.
Line 468 of vmdb/app/models/manageiq/providers/azure/cloud_manager/refresh_parser.rb. So this line: :name => uid.split(%r{Microsoft.Compute\/}i)[1].split('.').first, Should be: :name => File.join(File.dirname(image.name), File.basename(File.basename(name, '.*'), '.*'),
The customer used a slightly different version for a hotfix, but it works. That solution does not include the container path. The one I just posted does. If you work with him today, feel free to use the newer version. Also, just realized there's a parens missing. So, it should be: :name => File.join(File.dirname(image.name), File.basename(File.basename(name, '.*'), '.*')),
One more time: :name => File.join(File.dirname(image.name), File.basename(File.basename(image.name, '.*'), '.*')),
Colin, I'm afraid I don't understand what "The Microsoft.compute/virtualMachines resource does not seem to get the correct return code so that the method exists." means. Can you clarify?
Ok, the current issue is something for the automate team then, and Colin will create a separate BZ. This was fixed with: https://github.com/ManageIQ/manageiq/pull/7967 Moving to POST...
commit 9b25a474d1def449ef5193f43fc503c2f40d8cbd Author: Daniel Berger <dberger> Date: Thu Apr 14 10:46:59 2016 -0600 Modify the build_image_name so that it no longer uses a regex to determine the name. diff --git a/app/models/manageiq/providers/azure/cloud_manager/refresh_parser.rb b/app/models/manageiq/providers/azure/cloud_manager/refresh_parser.rb index a98b478..81f6238 100644
I agree with Bill that it seems to come and go, but unfortunately it happens more than half the time and now with 5.6B2 as well. Both of these VMs deployed just fine. One still pending and one error. Ok Pending 415,000,000,000,004 Administrator Service Provision Provisioning Service [AddVmExistingRG] from [AddVmExistingRG] Approved Administrator 04/18/16 16:28:28 UTC 04/18/16 16:28:10 UTC 04/18/16 16:31:25 UTC Auto-Approved stack has not been deployed Region 415 and Error Finished 415,000,000,000,003 Administrator VM Provision 04/18/16 18:49:01 UTC Provision from [Images/templates/tmpl-osDisk] to [WS12B2418a] Approved Administrator 04/18/16 16:17:13 UTC 04/18/16 16:16:54 UTC 04/18/16 18:49:01 UTC Auto-Approved Validating New Vm Region 415 I see the following provisioningStates when creating the Instance "provisioningState": "Creating" "provisioningState": "Succeeded" At no time does it ever say "Running" Are we using the latest api-version? Does it make a difference? https://management.azure.com/subscriptions/xxxx/resourceGroups/cfmeqe/providers/Microsoft.Compute/virtualMachines/WS12ImageMaster?api-version=2016-03-30 Anyway, 5.5.3.4 worked every time and 5.6.0.1, not so much. Here is the json response to compare against. http://pastebin.test.redhat.com/366662
Here's the instance state for that VM http://pastebin.test.redhat.com/366665 I don't think we should be using that since the statuses returns an array of statuses, including running or not, but I don't really know. I did have the same issue with SCVMM automation where I intermittently received the "wrong" status.
Jeff, We don't rely on the array of resource statuses to determine the stack status. The provider has a dedicate status field for the stack. It cannot change to succeeded if any of the underling resource does not complete. It is also possible that the stack status never changes to succeeded due to a bug even if all the resources complete. Anyway we are purely relying on the status reported by Azure to complete our service provisioning. Also note the wording for the status of the VM could be different when you view it from directly browsing VM vs from deployment operations (how we parse the resource status) of a stack provisioning.
That's fine. I'm just pointing out that the provisioning status on both failures is clearly set to succeeded, which would rule out "an intermittent problem coming from Azure provider". There does not appear to be any evidence that this is an Azure issue. If I can help, please let me know. Thanks.
Jeff, On the contrary I think it is an Azure issue. When you see the provisioning seems hanging in our requests table, you should verify the live provisioningState (and timestamp) of the template deployment, not the VM, in Azure portal.
This current shows a state of finished and the orchestration stack now shows a status of succeeded, and the last provider refresh was successful. I'm not sure what to make of the comments as it implies that this is not a bug but it also has a reference to a fix for the issue. For now, the parsing stuff is fixed. If the "pending" issue returns, we'll have to redesign how we poll for success.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1348