1235384 – [RFE] SCVMM post provisioning ems refresh takes too long

Bug 1235384 - [RFE] SCVMM post provisioning ems refresh takes too long

Summary: [RFE] SCVMM post provisioning ems refresh takes too long

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat CloudForms Management Engine
Classification:	Red Hat
Component:	Providers
Sub Component:
Version:	5.4.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	GA
Target Release:	5.5.0
Assignee:	Daniel Berger
QA Contact:	Jeff Teehan
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1235053 (view as bug list)
Depends On:
Blocks:	1245680 1245687
TreeView+	depends on / blocked

Reported:	2015-06-24 16:35 UTC by ncatling
Modified:	2019-11-14 06:46 UTC (History)
CC List:	10 users (show)
Fixed In Version:	5.5.0.1
Doc Type:	Enhancement
Doc Text:
Clone Of:
Clones:	1245687 (view as bug list)
Environment:
Last Closed:	2015-12-08 13:18:48 UTC
Category:	---
Cloudforms Team:	---
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2015:2551	0	normal	SHIPPED_LIVE	Moderate: CFME 5.5.0 bug fixes and enhancement update	2015-12-08 17:58:09 UTC

Description ncatling 2015-06-24 16:35:06 UTC

Description of problem:
SCVMM post provisioning ems refresh needs to be targeted. Check provision state waits until post provision ems refresh has completed. In large environments, this can be a long while. Example SCVMM provider with 800+ VMs, this takes 30 minutes.

Version-Release number of selected component (if applicable):


How reproducible:
Provision VM from template, check provision state will retry until ems refresh completes and VM appears in the inventory.

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 5 Greg Blomquist 2015-07-28 19:16:36 UTC

*** Bug 1235053 has been marked as a duplicate of this bug. ***

Comment 6 Daniel Berger 2015-07-28 21:52:04 UTC

I think the bottleneck here is the Get-Content call in the powershell script which is rather notoriously slow for large files. Adding "-Raw" or "-ReadCount 0" as options to Get-Content should speed it up significantly, but we need to ensure it still parses the output correctly with those options.

Comment 7 Daniel Berger 2015-07-29 20:26:02 UTC

https://github.com/ManageIQ/manageiq/pull/3653

Comment 8 Milan Falešník 2015-09-30 14:50:45 UTC

Verification blocked by BZ#1267642

Comment 9 Milan Falešník 2015-10-30 14:35:48 UTC

Hello, I was checking it in 5.5.0.8. Our SCVMM R2 (72 VMs, 12 Templates, 6 Datastores, 3 hosts) refreshes in about ~90s when refresh is requested. Is that a good time or do I need to check also something else? Or is there any bigger environment I could point it at?

Comment 10 Daniel Berger 2015-10-30 15:36:54 UTC

Milan, it should be faster than it was before at least. In my own testing it was about 30-40% faster if I recall. Assuming a linear scale, 800 VM's would be 15-16 minutes down from 30 minutes.

Someone else will have to chime in as to whether or not there's a bigger environment, because I don't know.

Comment 13 Daniel Berger 2015-10-30 18:49:32 UTC

Don't use that fix. That was an earlier version that caused problems.

Use this:

https://github.com/djberg96/manageiq/commit/8f7ef63d03a6b85adea8301d9f31ce1903bf4e12

Comment 16 Daniel Berger 2015-11-04 18:07:56 UTC

I'm confused by their results. Using -ReadCount 0 is basically analogous to using IO.read instead of IO.foreach in Ruby. They should see a performance improvement, or at worst the same speed. That, combined the underlying changes to use Nokogiri instead of REXML, is about the best I can do for now.

I suppose you could try different values to -ReadCount (with the value being chunks of bytes per read operation) to see if you can find a sweet spot.

Ultimately I would like to switch to Export-CLIXML to avoid temple generation completely, but the generated XML is different, and would require changing the parser.

Comment 17 Jeff Teehan 2015-11-30 18:38:50 UTC

I don't have an environment even remotely close in size able to provide meaningful results.  The current environment takes just over a minute but the population is too small for extrapolation in my view.  Alex and I are working on larger long-term performance environment, but nothing that will be ready in 5.5

Based on the comments, the customer is not disputing the fix, so I'm relying on their input for this verification.  Without knowing what the network traffic, IO load, CPU utilization, etc, I can't explain the customer's results.

Moving to Verified.

Comment 19 errata-xmlrpc 2015-12-08 13:18:48 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2015:2551

Comment 20 Daniel Berger 2015-12-15 22:38:23 UTC

See also: https://github.com/ManageIQ/manageiq/pull/5776

Comment 22 Daniel Berger 2016-03-01 14:33:56 UTC

It looks like the resulting output somehow broke the parser. I did not see this locally, but the parser is brittle, and customer's environment is different than ours.

I'm afraid I'll have to recommend rolling back the latest change.

Note You need to log in before you can comment on or make changes to this bug.