Bug 1244905

Summary:

Scheduling of C&U Perf Collections exceeds 10 minutes causing indeterminate number of elements to not have C&U collections performed

Product:

Red Hat CloudForms Management Engine

Reporter:

Thomas Hennessy <thenness>

Component:

Performance

Assignee:

Keenan Brock <kbrock>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Pradeep Kumar Surisetty <psuriset>

Severity:

high

Docs Contact:

Priority:

high

Version:

5.3.0

CC:

arcsharm, cpelland, dmetzger, jdeubel, jhardy, jocarter, jprause, kbrock, mfeifer, obarenbo, simaishi

Target Milestone:

Keywords:

TestOnly, ZStream

Target Release:

5.7.0

Hardware:

x86_64

OS:

Linux

Whiteboard:

c&u:perf

Fixed In Version:

5.7.0.0

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Clones:

1290080 1346988 (view as bug list)

Environment:

Last Closed:

2016-12-12 13:32:26 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1290080, 1346988

Attachments:

Description	Flags
zip of evm.log for pid 30952 in which the C&U collection scheduling times out	none
one page pdf charting number of vms in since VC environment for two weeks before problem reported	none

Description Thomas Hennessy 2015-07-20 18:07:39 UTC

Created attachment 1054009 [details]
zip of evm.log for pid 30952 in which the C&U collection scheduling times out

Description of problem: the message for the command 'Metric::Capture.perf_capture_timer' exceeds the scheduled 10 minute timeout period preventing C&U collections from being scheduled into miq_queue table


Version-Release number of selected component (if applicable):
Version: 5.2.5.3
Build:   20140811145418_a290c21

How reproducible: 


Steps to Reproduce:
1. create a vmware provider with 5400 vm instances
2. turn on C&U
3. monitor the time that the specified command takes to complete.

Actual results: the message encounters a timeout at about 600 seconds (10 minutes)


Expected results: all elements included in the message ought to be scheduled for C&U collections and the message should complete with a status of 'ok'


Additional info:
a sample of the process executing the scheduling of collections is provided (pid 30952) .  In this example about 1900 C&U Perf Collections were scheduled before the timeout value as reached.   There are about 5500 VMs in the vSphere in this customers environment.

Comment 2 Thomas Hennessy 2015-07-21 13:38:58 UTC

Created attachment 1054403 [details]
one page pdf charting number of vms in since VC environment for two weeks before problem reported

Comment 5 Keenan Brock 2016-09-06 13:58:11 UTC

There are ~10 PRs for this. Not sure I could track them down

2 of interest:

https://github.com/ManageIQ/manageiq/pull/9766
https://github.com/ManageIQ/manageiq/pull/9611
https://github.com/ManageIQ/manageiq/pull/9477

I have been given feedback by Alex Krzos that this has been fixed on 5.6.x (and master)

Comment 6 Archit Sharma 2016-12-12 13:15:53 UTC

single db appliance, 4 worker appliances, 0 recycled workers. 5k online vms from Vmware provider.
=======================================================================

showing output from log file: ~/perf_capture_timer_last_avg.log:

This reading was recorded on: 2016-12-12 08:12:08,268
...Reading logs from /var/www/miq/vmdb/log/evm.log
      and extracting last 100 events.
      Begin time: 2016-12-12T07:34
      End time: 2016-12-12T08:04
10 IDs found..

#	 | Timestamp			 | IDs | Metric::Capture.perf_capture_timer
----------------------------------------------------------------------
0	 | Mon Dec 12 07:34:52 EST 2016	 | 8667 | 20.486461867
1	 | Mon Dec 12 07:38:00 EST 2016	 | 10880 | 27.291034255
2	 | Mon Dec 12 07:41:09 EST 2016	 | 29407 | 32.840495136
3	 | Mon Dec 12 07:44:04 EST 2016	 | 37936 | 31.357679297
4	 | Mon Dec 12 07:47:00 EST 2016	 | 45778 | 27.009139387
5	 | Mon Dec 12 07:49:56 EST 2016	 | 53909 | 23.166915821
6	 | Mon Dec 12 07:52:54 EST 2016	 | 55273 | 19.152661771
7	 | Mon Dec 12 07:55:53 EST 2016	 | 56148 | 21.572165016
8	 | Mon Dec 12 07:58:56 EST 2016	 | 57016 | 21.522156136
9	 | Mon Dec 12 08:01:54 EST 2016	 | 57898 | 22.586555143
----------------------------------------------------------------------
▁▅█▇▅▃▁▂▂▂
Average: 24.69850000000000000000
----------------------------------------------------------------------
10 events were queued, out of which 10 events were delivered with [ok] status...