1265368 – perf_capture_timer benchmark takes more than 3x as long when running against RHEVM vs VMware

Bug 1265368 - perf_capture_timer benchmark takes more than 3x as long when running against RHEVM vs VMware

Summary: perf_capture_timer benchmark takes more than 3x as long when running against ...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat CloudForms Management Engine
Classification:	Red Hat
Component:	Performance
Sub Component:
Version:	5.5.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	medium
Target Milestone:	GA
Target Release:	5.6.0
Assignee:	Keenan Brock
QA Contact:	Alex Krzos
Docs Contact:
URL:
Whiteboard:	perf:c&u:rhev
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-09-22 19:06 UTC by Alex Krzos
Modified:	2019-08-06 20:05 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-08-16 16:09:21 UTC
Category:	---
Cloudforms Team:	---
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
evm.log files for respective providers (43.89 KB, application/zip) 2015-09-23 17:32 UTC, Alex Krzos	no flags	Details
View All

Description Alex Krzos 2015-09-22 19:06:12 UTC

Description of problem:
Running Capacity and Utilization benchmarks and comparing rhevm vs vmware performance capture scheduling with exact same sized environments, scheduling all realtime captures on RHEVM takes more time (> 3x) than an environment consisting of VMware managed objects.

Environment sizes:
vmware - 3k vms, 1.5k online, 100 hosts
rhevm - 3k vms, 1.5k online, 100 hosts

Version-Release number of selected component (if applicable):
5.5.0.1-alpha1 - I will present same scenario to 5.4 appliance to see if issue is with alpha

How reproducible:
on this build

Steps to Reproduce:
1.
2.
3.

Actual results:
provider, min, avg, median, max, stddev, 90th, 99th
vmware, 36.4, 37.9, 37.7, 39.7, 1.3, 39.4, 39.7
rhevm, 131.4, 134.6, 134.1, 138.7, 2.8, 137.8, 138.7

*Compare 99th percentile timings for the benchmark.

Expected results:
My understanding is that there is no interaction with the provider on mere scheduling of captures, thus I would expect scheduling 1,500 captures on VMs and 100 hosts to be the same for each provider.

Additional info:
I can/will profile the benchmark as well to see if anything sticks out as a sore thumb consuming the additional time during scheduling.

Comment 3 Alex Krzos 2015-09-23 17:31:42 UTC

Re-Ran the benchmarks and captured evm.log while benchmark was running.  RHEVM appears to take far more time queuing Storage.perf_capture.  Also noted is slight difference in number of "clusters" between two providers however that does not appear to be the source of the large discrepancy between the two nearly identically sized providers.

Attached are the evm.log files for respective providers.

Comment 4 Alex Krzos 2015-09-23 17:32:24 UTC

Created attachment 1076270 [details]
evm.log files for respective providers

Comment 5 Alex Krzos 2015-09-23 18:50:25 UTC

Measuring the Memory and GC.count on the node handling RHEVM:

irb(main):002:0> mem_rss_start = MiqProcess.processInfo()[:memory_usage]
=> 122482688
irb(main):003:0> the_zone = Zone.find_by_name('default')
=> #<Zone id: 1, name: "default", description: "Default Zone", created_on: "2015-09-22 20:01:50", updated_on: "2015-09-22 20:01:50", settings: {}, log_file_depot_id: nil>
irb(main):004:0> Benchmark.realtime{Metric::Capture.perf_capture_timer(the_zone)}
=> 99.270481548
irb(main):005:0> mem_rss_end = MiqProcess.processInfo()[:memory_usage]
=> 227196928
irb(main):006:0> mem_rss_change = mem_rss_end - mem_rss_start
=> 104714240
irb(main):007:0> GC.stat
=> {:count=>493, :heap_used=>2912, :heap_length=>4177, :heap_increment=>1265, :heap_live_num=>890940, :heap_free_num=>426151, :heap_final_num=>0, :total_allocated_object=>84540091, :total_freed_object=>83649151}

VMware:

irb(main):002:0> mem_rss_start = MiqProcess.processInfo()[:memory_usage]
=> 122499072
irb(main):003:0> the_zone = Zone.find_by_name('default')
=> #<Zone id: 1, name: "default", description: "Default Zone", created_on: "2015-09-22 20:02:01", updated_on: "2015-09-22 20:02:01", settings: {}, log_file_depot_id: nil>
irb(main):004:0> Benchmark.realtime{Metric::Capture.perf_capture_timer(the_zone)}
=> 34.97297929
irb(main):005:0> mem_rss_end = MiqProcess.processInfo()[:memory_usage]
=> 209465344
irb(main):006:0> mem_rss_change = mem_rss_end - mem_rss_start
=> 86966272
irb(main):007:0> GC.stat
=> {:count=>188, :heap_used=>2396, :heap_length=>4311, :heap_increment=>1915, :heap_live_num=>899762, :heap_free_num=>116896, :heap_final_num=>0, :total_allocated_object=>17272673, :total_freed_object=>16372911}

I see far less GC calls and slightly less memory consumed (216MiB for RHEVM and 199MiB for Vmware)

Comment 6 Alex Krzos 2015-09-24 16:47:50 UTC

FYI Initial BZ opened against 5.5.0.1-alpha

Comment #3 and Comment #5 are on CFME 5.4.2.0 thus this issue exists on both versions.

I attempted to get perftools.rb running on 5.4.2.0 appliances in order to profile for what is taking a large percentage of time but ran into multiple issues getting it to run.  I did have an older 5.4.0.2 appliance that already had perftools.rb installed and running so I reset/cleaned the database and re-ran the benchmarks on that appliance.  I found several issues.

First lets look at the profiling:

Methodology (code in rails console executed) was:

require 'perftools'
the_zone = Zone.find_by_name('default')
PerfTools::CpuProfiler.start("/tmp/profile-perf_capture_timer-rhevm-1") do
  Metric::Capture.perf_capture_timer(the_zone)
end

RHEVM profile (top 20):

[root@dhcp23-113 tmp]# /opt/rh/cfme-gemset/bin/pprof.rb --text profile-perf_capture_timer-rhevm-1 | head -n 21
Using local file /opt/rh/ruby200/root/usr/bin/ruby.
Using local file profile-perf_capture_timer-rhevm-1.
Total: 10415 samples
    5798  55.7%  55.7%     5798  55.7% garbage_collector
     683   6.6%  62.2%      685   6.6% Hash.[]
     371   3.6%  65.8%      371   3.6% PGresult#values
     361   3.5%  69.3%      361   3.5% Array#zip
     183   1.8%  71.0%      183   1.8% Regexp#match
     173   1.7%  72.7%     1962  18.8% ActiveRecord::Associations::Preloader::Association#associated_records_by_owner
     170   1.6%  74.3%      170   1.6% PGconn#async_exec
     137   1.3%  75.6%      193   1.9% Array#map!
      93   0.9%  76.5%       93   0.9% Marshal.dump
      89   0.9%  77.4%       89   0.9% Module#===
      83   0.8%  78.2%       83   0.8% Marshal.load_without_autoload_missing_constants
      80   0.8%  78.9%      151   1.4% #<Module:0x0000000507c218>#__temp__
      60   0.6%  79.5%     1274  12.2% Array#map
      48   0.5%  80.0%      151   1.4% Class#value_to_integer
      46   0.4%  80.4%       46   0.4% Class#logger
      42   0.4%  80.8%      109   1.0% #<Module:0x00000001b78aa8>#__temp__
      42   0.4%  81.2%      132   1.3% Arel::Visitors::Visitor#visit
      39   0.4%  81.6%      208   2.0% ActiveRecord::Base#==
      35   0.3%  81.9%       35   0.3% String#intern
      34   0.3%  82.3%       34   0.3% String#to_i

I performed the same profiling again x 2 and deleted all perf_captures on the queue to confirm scheduling would occur:

MiqQueue.where('method_name = \'perf_capture\'').delete_all was used to delete all queued perf_captures.

Between the 3 profiles 53-55% was spent in the garbage collector and 6.6-7.1% of time was spent in Hash.[]


VMware Profiling (Top 20):
[root@dhcp23-113 tmp]# /opt/rh/cfme-gemset/bin/pprof.rb --text profile-perf_capture_timer-vmware-2 | head -n 21
Using local file /opt/rh/ruby200/root/usr/bin/ruby.
Using local file profile-perf_capture_timer-vmware-2.
Total: 2824 samples
    1084  38.4%  38.4%     1084  38.4% garbage_collector
     183   6.5%  44.9%      183   6.5% Regexp#match
      90   3.2%  48.1%       92   3.3% Marshal.dump
      66   2.3%  50.4%       66   2.3% Marshal.load_without_autoload_missing_constants
      40   1.4%  51.8%       41   1.5% Hash.[]
      37   1.3%  53.1%       37   1.3% PGconn#async_exec
      27   1.0%  54.1%       27   1.0% Module#===
      25   0.9%  55.0%       25   0.9% Class#logger
      23   0.8%  55.8%       23   0.8% PGresult#values
      22   0.8%  56.6%      194   6.9% Array#map
      21   0.7%  57.3%       77   2.7% Arel::Visitors::Visitor#visit
      20   0.7%  58.0%       21   0.7% Kernel#respond_to?
      19   0.7%  58.7%       19   0.7% Regexp#===
      19   0.7%  59.3%       19   0.7% Symbol#to_s
      18   0.6%  60.0%      104   3.7% ActiveRecord::AttributeMethods#respond_to?
      18   0.6%  60.6%       43   1.5% ActiveRecord::Relation#initialize
      18   0.6%  61.3%       18   0.6% String#=~
      17   0.6%  61.9%       34   1.2% ActiveSupport::Callbacks::ClassMethods#__callback_runner_name
      17   0.6%  62.5%       17   0.6% Module#module_eval
      16   0.6%  63.0%       16   0.6% Hash#[]=


I also performed the same profiling for vmware 2 more times and deleted all perf_captures on the queue before testing.

Likewise to RHEVM, scheduling perf_captures against VMware also spent the most time in the garbage collector at between 38.4-40.4% of time.  In addition to the less percentage of time spent in the garbage collector, the order of subsequent procedures is different between the two providers.


While getting the profiler to work I had added the RHEVM large provider first, performed profiling and then created a second zone on the appliance since C&U scheduling is zone based.  I moved the RHEVM provider to the "Unmanaged Zone" which is a zone without any appliances.  I then added the VMware large provider into the default zone and attempted to reproduce the faster C&U scheduling.  Although I can clearly see that the perf_capture_timer is only scheduling C&U captures per the default zone while running, the VMware provider is now suffering the same slow scheduling timings.  

This means that if a provider is just in the same region, it can adversely affect the scheduling of C&U in another zone.  This possibly could be considered a separate BZ.  Since the above evm.log output suggested something slow with Storage.perf_captures, I investigated datastores on the appliance and noted that even rendering of the datastore UI page is slow for any of the RHEVM datastores.  That is when I determined that the RHEVM datastores have a host relationship to every single host mean 100 hosts, 120 datastores = 12000 entries in hosts_storages as the Vmware provider's relationships are different, only 300 entries in the hosts_storages table.   

Lastly I cleaned out the RHEVM provider and was able to perform the profiling of the VMware provider scheduling C&U captures.

In summary the issues currently are:
1. Scale and performance of perf_capture_timer can vary based on the number of host to datastore relationships (This might be the actual BZ rather than RHEVM slower than VMware)
2. Scheduling of C&U captures can be affected by providers that are not being captured on. (Zone issues)
3. Datastores Web UI page and likely other components (frontend/backend) are subject to performance/scale issues as the number of host relationships grows in the VMDB.

Comment 7 Alex Krzos 2015-09-24 17:13:41 UTC

Break down of why the hosts to datastores relationships are different between the two providers:

RHEVM (NFS datastores)

Each host has all 120 datastores: 100(hosts) x 120(datastores) = 12000

VMware (VMFS datastores)

Each host has a single datastore: 100(hosts) x 1(datastores) = 100
Each cluster (20 total, ea. 5 hosts) has a datastore: 20(clusters) x 5(hosts) x 1(datastores) = 100
One global datastore that every host has: 100(hosts) x 1(datastores) = 100

Sum all those up and you have only 300 host to data store relationships with the VMware provider.


Hence 300 relationships vs 12000 relationships although each provider has identical number of Hosts/VMs and almost the same number of datastores (120 vs 121).  *Technically RHEVM has 121 but the ISODOMAIN isn't activated.

Comment 8 Keenan Brock 2016-08-16 16:09:21 UTC

storages specifically have an inefficient way of looking up the ems and were brought back in large numbers.

I have fixed the ems issue in particular and with the reworking in the capture timer code, should have reduced the host to storages data.

Since our timer is much quicker now, am marking as closed, but if this comes up again, please reopen / open another.

Note You need to log in before you can comment on or make changes to this bug.