Bug 751231 - sometimes, initial metric collection is missed, causing first metrics to be delayed
sometimes, initial metric collection is missed, causing first metrics to be d...
Product: RHQ Project
Classification: Other
Component: Plugin Container (Show other bugs)
Unspecified Unspecified
high Severity high (vote)
: ---
: JON 3.0.0,RHQ 4.3.0
Assigned To: John Mazzitelli
Mike Foley
: RHQ-559 (view as bug list)
Depends On:
Blocks: jon30-sprint8
  Show dependency treegraph
Reported: 2011-11-03 19:58 EDT by John Mazzitelli
Modified: 2012-02-07 14:22 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2012-02-07 14:22:48 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description John Mazzitelli 2011-11-03 19:58:35 EDT
there is a timing issue in the plugin container's measurement manager. when you first start the agent/plugin container, sometimes you will notice that the plugin container misses the first metric collection (you'll see a debug message like "Measurement collection is falling behind... Missed requested time by [30040ms]")

Sometimes this doesn't happen, other times it does. Its just a random timing issue.

I saw this when I started with a fresh DB, nothing in inventory. I then imported the platform and I then went to the platform view once I saw the first availability go green. But I noticed that I did not collect any traits data, nor did I get any dynamic measurement data. This should have been collected within about 60s after the agent got all its plugins and the plugin container started.
Comment 1 John Mazzitelli 2011-11-03 19:59:51 EDT
(05:02:04 PM) mazz: ips: check this out. PC's MeasurementManager.scheduleCollection:
(05:02:04 PM) mazz:         long firstCollection = System.currentTimeMillis();
(05:02:04 PM) mazz:         for (MeasurementScheduleRequest request : requests) {
(05:02:04 PM) mazz:             ScheduledMeasurementInfo info = new ScheduledMeasurementInfo(request, resourceId);
(05:02:04 PM) mazz:             info.setNextCollection(firstCollection);
(05:02:04 PM) mazz:         ....
(05:02:24 PM) mazz: notice that it (from my reading of this) sets the next collection time to NOW (System.currentTimeMillis)
(05:03:03 PM) mazz: so I'm not sure why the collection of metrics is delayed at startup sometimes. like I said, earlier today, I was seeing traits come up and other data pretty quick. but then the last two times, I did not see this.
(05:04:25 PM) ips: hmm
(05:06:31 PM) ips: mazz - myabe bc of the: if ((System.currentTimeMillis() - 30000L) > requests.iterator().next().getNextCollection())   block in MeasurementCollectorRunner
(05:06:56 PM) ips: maybe by the time it actually gets scheduled 30s has passed
(05:07:17 PM) ips: well, before it actually runs i mean
(05:08:39 PM) mazz: yes.. that is exactly what I'm looking at now
(05:08:45 PM) mazz: I do see those messags in my logs 
(05:08:49 PM) mazz: about falling behind
(05:09:18 PM) mazz: that might be why I see it sometimes but not others? just a timing issue
(05:10:16 PM) ips: ah, check out MeasurementManager.initialize()
(05:10:26 PM) ips: the initial delay for the collector threadpool is 30s
(05:10:38 PM) ips: also its max size is 5
(05:11:09 PM) ips: between those 2 things, i'm not suprised that if check in MeasurementCollectorRunner is often true
(05:37:39 PM) ips: mazz - assuming that initial delay is what's causing issues, here's one idea:
(05:38:05 PM) mazz: I think that is it
(05:38:09 PM) ips: in MeasurementManager.initialize():
(05:38:10 PM) mazz: I'm sitting at a breatpoint right now
(05:38:25 PM) ips: right after the line:  this.collectorThreadPool.schedule(new MeasurementCollectionRequester(), collectionInitialDelaySecs,
(05:38:25 PM) ips:                 TimeUnit.SECONDS);
(05:38:27 PM) mazz: it hit that part where it thinks it is falling behind - and its all the platform metrics
(05:39:25 PM) ips: add: this.initialCollectionStartTime = System.currentTimeMillis() + (collectionInitialDelaySecs * 1000);
(05:40:27 PM) ips: and then in scheduleCollection() change: long firstCollection = System.currentTimeMillis();
(05:40:52 PM) ips: to: long firstCollection = Math.max(System.currentTimeMillis(), this.initialCollectionStartTime);
(05:43:00 PM) mazz: yeah, that would delete the very first collecton out in the future by the initial delay amount
(05:43:08 PM) mazz: I'll see if that works
Comment 2 John Mazzitelli 2011-11-03 20:15:12 EDT
that proposed solution doesn't work. metrics are still delayed, but I don't see the falling behind message. not sure what the problem is yet
Comment 3 John Mazzitelli 2011-11-03 20:48:31 EDT
fix was to always delay the first collection.

git commit:
   master: 98ea586e555d3f07ba568b19b75f144872a8471a
   release_jon3.x: 61f2d882d8c614c5635f875fbe6c37e58032bb0b
Comment 4 John Mazzitelli 2011-11-03 20:50:14 EDT
note: do not expect metrics to come in immediately upon starting the agent or immediately upon importing a resource. it will take about 60 seconds for the data to come in - this includes dynamic metrics as well as traits.
Comment 5 John Mazzitelli 2011-11-16 13:19:40 EST
*** Bug 536181 has been marked as a duplicate of this bug. ***
Comment 6 Mike Foley 2012-02-07 14:22:48 EST
changing status of VERIFIED BZs for JON 2.4.2 and JON 3.0 to CLOSED/CURRENTRELEASE

Note You need to log in before you can comment on or make changes to this bug.