Bug 1244905

Summary: Scheduling of C&U Perf Collections exceeds 10 minutes causing indeterminate number of elements to not have C&U collections performed
Product: Red Hat CloudForms Management Engine Reporter: Thomas Hennessy <thenness>
Component: PerformanceAssignee: Keenan Brock <kbrock>
Status: CLOSED CURRENTRELEASE QA Contact: Pradeep Kumar Surisetty <psuriset>
Severity: high Docs Contact:
Priority: high    
Version: 5.3.0CC: arcsharm, cpelland, dmetzger, jdeubel, jhardy, jocarter, jprause, kbrock, mfeifer, obarenbo, simaishi
Target Milestone: GAKeywords: TestOnly, ZStream
Target Release: 5.7.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: c&u:perf
Fixed In Version: 5.7.0.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1290080 1346988 (view as bug list) Environment:
Last Closed: 2016-12-12 13:32:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1290080, 1346988    
Attachments:
Description Flags
zip of evm.log for pid 30952 in which the C&U collection scheduling times out
none
one page pdf charting number of vms in since VC environment for two weeks before problem reported none

Description Thomas Hennessy 2015-07-20 18:07:39 UTC
Created attachment 1054009 [details]
zip of evm.log for pid 30952 in which the C&U collection scheduling times out

Description of problem: the message for the command 'Metric::Capture.perf_capture_timer' exceeds the scheduled 10 minute timeout period preventing C&U collections from being scheduled into miq_queue table


Version-Release number of selected component (if applicable):
Version: 5.2.5.3
Build:   20140811145418_a290c21

How reproducible: 


Steps to Reproduce:
1. create a vmware provider with 5400 vm instances
2. turn on C&U
3. monitor the time that the specified command takes to complete.

Actual results: the message encounters a timeout at about 600 seconds (10 minutes)


Expected results: all elements included in the message ought to be scheduled for C&U collections and the message should complete with a status of 'ok'


Additional info:
a sample of the process executing the scheduling of collections is provided (pid 30952) .  In this example about 1900 C&U Perf Collections were scheduled before the timeout value as reached.   There are about 5500 VMs in the vSphere in this customers environment.

Comment 2 Thomas Hennessy 2015-07-21 13:38:58 UTC
Created attachment 1054403 [details]
one page pdf charting number of vms in since VC environment for two weeks before problem reported

Comment 5 Keenan Brock 2016-09-06 13:58:11 UTC
There are ~10 PRs for this. Not sure I could track them down

2 of interest:

https://github.com/ManageIQ/manageiq/pull/9766
https://github.com/ManageIQ/manageiq/pull/9611
https://github.com/ManageIQ/manageiq/pull/9477

I have been given feedback by Alex Krzos that this has been fixed on 5.6.x (and master)

Comment 6 Archit Sharma 2016-12-12 13:15:53 UTC
single db appliance, 4 worker appliances, 0 recycled workers. 5k online vms from Vmware provider.
=======================================================================

showing output from log file: ~/perf_capture_timer_last_avg.log:

This reading was recorded on: 2016-12-12 08:12:08,268
...Reading logs from /var/www/miq/vmdb/log/evm.log
      and extracting last 100 events.
      Begin time: 2016-12-12T07:34
      End time: 2016-12-12T08:04
10 IDs found..

#	 | Timestamp			 | IDs | Metric::Capture.perf_capture_timer
----------------------------------------------------------------------
0	 | Mon Dec 12 07:34:52 EST 2016	 | 8667 | 20.486461867
1	 | Mon Dec 12 07:38:00 EST 2016	 | 10880 | 27.291034255
2	 | Mon Dec 12 07:41:09 EST 2016	 | 29407 | 32.840495136
3	 | Mon Dec 12 07:44:04 EST 2016	 | 37936 | 31.357679297
4	 | Mon Dec 12 07:47:00 EST 2016	 | 45778 | 27.009139387
5	 | Mon Dec 12 07:49:56 EST 2016	 | 53909 | 23.166915821
6	 | Mon Dec 12 07:52:54 EST 2016	 | 55273 | 19.152661771
7	 | Mon Dec 12 07:55:53 EST 2016	 | 56148 | 21.572165016
8	 | Mon Dec 12 07:58:56 EST 2016	 | 57016 | 21.522156136
9	 | Mon Dec 12 08:01:54 EST 2016	 | 57898 | 22.586555143
----------------------------------------------------------------------
▁▅█▇▅▃▁▂▂▂
Average: 24.69850000000000000000
----------------------------------------------------------------------
10 events were queued, out of which 10 events were delivered with [ok] status...