| Summary: | sometimes, initial metric collection is missed, causing first metrics to be delayed | ||
|---|---|---|---|
| Product: | [Other] RHQ Project | Reporter: | John Mazzitelli <mazz> |
| Component: | Plugin Container | Assignee: | John Mazzitelli <mazz> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Mike Foley <mfoley> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 4.2 | CC: | hrupp |
| Target Milestone: | --- | ||
| Target Release: | JON 3.0.0, RHQ 4.3.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2012-02-07 19:22:48 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Bug Depends On: | |||
| Bug Blocks: | 745494 | ||
|
Description
John Mazzitelli
2011-11-03 23:58:35 UTC
(05:02:04 PM) mazz: ips: check this out. PC's MeasurementManager.scheduleCollection:
(05:02:04 PM) mazz: long firstCollection = System.currentTimeMillis();
(05:02:04 PM) mazz: for (MeasurementScheduleRequest request : requests) {
(05:02:04 PM) mazz: ScheduledMeasurementInfo info = new ScheduledMeasurementInfo(request, resourceId);
(05:02:04 PM) mazz: info.setNextCollection(firstCollection);
(05:02:04 PM) mazz: ....
(05:02:24 PM) mazz: notice that it (from my reading of this) sets the next collection time to NOW (System.currentTimeMillis)
(05:03:03 PM) mazz: so I'm not sure why the collection of metrics is delayed at startup sometimes. like I said, earlier today, I was seeing traits come up and other data pretty quick. but then the last two times, I did not see this.
(05:04:25 PM) ips: hmm
(05:06:31 PM) ips: mazz - myabe bc of the: if ((System.currentTimeMillis() - 30000L) > requests.iterator().next().getNextCollection()) block in MeasurementCollectorRunner
(05:06:56 PM) ips: maybe by the time it actually gets scheduled 30s has passed
(05:07:17 PM) ips: well, before it actually runs i mean
(05:08:39 PM) mazz: yes.. that is exactly what I'm looking at now
(05:08:45 PM) mazz: I do see those messags in my logs
(05:08:49 PM) mazz: about falling behind
(05:09:18 PM) mazz: that might be why I see it sometimes but not others? just a timing issue
(05:10:16 PM) ips: ah, check out MeasurementManager.initialize()
(05:10:26 PM) ips: the initial delay for the collector threadpool is 30s
(05:10:38 PM) ips: also its max size is 5
(05:11:09 PM) ips: between those 2 things, i'm not suprised that if check in MeasurementCollectorRunner is often true
(05:37:39 PM) ips: mazz - assuming that initial delay is what's causing issues, here's one idea:
(05:38:05 PM) mazz: I think that is it
(05:38:09 PM) ips: in MeasurementManager.initialize():
(05:38:10 PM) mazz: I'm sitting at a breatpoint right now
(05:38:25 PM) ips: right after the line: this.collectorThreadPool.schedule(new MeasurementCollectionRequester(), collectionInitialDelaySecs,
(05:38:25 PM) ips: TimeUnit.SECONDS);
(05:38:27 PM) mazz: it hit that part where it thinks it is falling behind - and its all the platform metrics
(05:39:25 PM) ips: add: this.initialCollectionStartTime = System.currentTimeMillis() + (collectionInitialDelaySecs * 1000);
(05:40:27 PM) ips: and then in scheduleCollection() change: long firstCollection = System.currentTimeMillis();
(05:40:52 PM) ips: to: long firstCollection = Math.max(System.currentTimeMillis(), this.initialCollectionStartTime);
(05:43:00 PM) mazz: yeah, that would delete the very first collecton out in the future by the initial delay amount
(05:43:08 PM) mazz: I'll see if that works
that proposed solution doesn't work. metrics are still delayed, but I don't see the falling behind message. not sure what the problem is yet fix was to always delay the first collection. git commit: master: 98ea586e555d3f07ba568b19b75f144872a8471a release_jon3.x: 61f2d882d8c614c5635f875fbe6c37e58032bb0b note: do not expect metrics to come in immediately upon starting the agent or immediately upon importing a resource. it will take about 60 seconds for the data to come in - this includes dynamic metrics as well as traits. *** Bug 536181 has been marked as a duplicate of this bug. *** changing status of VERIFIED BZs for JON 2.4.2 and JON 3.0 to CLOSED/CURRENTRELEASE |