Created attachment 1054009 [details] zip of evm.log for pid 30952 in which the C&U collection scheduling times out Description of problem: the message for the command 'Metric::Capture.perf_capture_timer' exceeds the scheduled 10 minute timeout period preventing C&U collections from being scheduled into miq_queue table Version-Release number of selected component (if applicable): Version: 5.2.5.3 Build: 20140811145418_a290c21 How reproducible: Steps to Reproduce: 1. create a vmware provider with 5400 vm instances 2. turn on C&U 3. monitor the time that the specified command takes to complete. Actual results: the message encounters a timeout at about 600 seconds (10 minutes) Expected results: all elements included in the message ought to be scheduled for C&U collections and the message should complete with a status of 'ok' Additional info: a sample of the process executing the scheduling of collections is provided (pid 30952) . In this example about 1900 C&U Perf Collections were scheduled before the timeout value as reached. There are about 5500 VMs in the vSphere in this customers environment.
Created attachment 1054403 [details] one page pdf charting number of vms in since VC environment for two weeks before problem reported
There are ~10 PRs for this. Not sure I could track them down 2 of interest: https://github.com/ManageIQ/manageiq/pull/9766 https://github.com/ManageIQ/manageiq/pull/9611 https://github.com/ManageIQ/manageiq/pull/9477 I have been given feedback by Alex Krzos that this has been fixed on 5.6.x (and master)
single db appliance, 4 worker appliances, 0 recycled workers. 5k online vms from Vmware provider. ======================================================================= showing output from log file: ~/perf_capture_timer_last_avg.log: This reading was recorded on: 2016-12-12 08:12:08,268 ...Reading logs from /var/www/miq/vmdb/log/evm.log and extracting last 100 events. Begin time: 2016-12-12T07:34 End time: 2016-12-12T08:04 10 IDs found.. # | Timestamp | IDs | Metric::Capture.perf_capture_timer ---------------------------------------------------------------------- 0 | Mon Dec 12 07:34:52 EST 2016 | 8667 | 20.486461867 1 | Mon Dec 12 07:38:00 EST 2016 | 10880 | 27.291034255 2 | Mon Dec 12 07:41:09 EST 2016 | 29407 | 32.840495136 3 | Mon Dec 12 07:44:04 EST 2016 | 37936 | 31.357679297 4 | Mon Dec 12 07:47:00 EST 2016 | 45778 | 27.009139387 5 | Mon Dec 12 07:49:56 EST 2016 | 53909 | 23.166915821 6 | Mon Dec 12 07:52:54 EST 2016 | 55273 | 19.152661771 7 | Mon Dec 12 07:55:53 EST 2016 | 56148 | 21.572165016 8 | Mon Dec 12 07:58:56 EST 2016 | 57016 | 21.522156136 9 | Mon Dec 12 08:01:54 EST 2016 | 57898 | 22.586555143 ---------------------------------------------------------------------- ▁▅█▇▅▃▁▂▂▂ Average: 24.69850000000000000000 ---------------------------------------------------------------------- 10 events were queued, out of which 10 events were delivered with [ok] status...