Bug 751231
Summary: | sometimes, initial metric collection is missed, causing first metrics to be delayed | ||
---|---|---|---|
Product: | [Other] RHQ Project | Reporter: | John Mazzitelli <mazz> |
Component: | Plugin Container | Assignee: | John Mazzitelli <mazz> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Mike Foley <mfoley> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 4.2 | CC: | hrupp |
Target Milestone: | --- | ||
Target Release: | JON 3.0.0, RHQ 4.3.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2012-02-07 19:22:48 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 745494 |
Description
John Mazzitelli
2011-11-03 23:58:35 UTC
(05:02:04 PM) mazz: ips: check this out. PC's MeasurementManager.scheduleCollection: (05:02:04 PM) mazz: long firstCollection = System.currentTimeMillis(); (05:02:04 PM) mazz: for (MeasurementScheduleRequest request : requests) { (05:02:04 PM) mazz: ScheduledMeasurementInfo info = new ScheduledMeasurementInfo(request, resourceId); (05:02:04 PM) mazz: info.setNextCollection(firstCollection); (05:02:04 PM) mazz: .... (05:02:24 PM) mazz: notice that it (from my reading of this) sets the next collection time to NOW (System.currentTimeMillis) (05:03:03 PM) mazz: so I'm not sure why the collection of metrics is delayed at startup sometimes. like I said, earlier today, I was seeing traits come up and other data pretty quick. but then the last two times, I did not see this. (05:04:25 PM) ips: hmm (05:06:31 PM) ips: mazz - myabe bc of the: if ((System.currentTimeMillis() - 30000L) > requests.iterator().next().getNextCollection()) block in MeasurementCollectorRunner (05:06:56 PM) ips: maybe by the time it actually gets scheduled 30s has passed (05:07:17 PM) ips: well, before it actually runs i mean (05:08:39 PM) mazz: yes.. that is exactly what I'm looking at now (05:08:45 PM) mazz: I do see those messags in my logs (05:08:49 PM) mazz: about falling behind (05:09:18 PM) mazz: that might be why I see it sometimes but not others? just a timing issue (05:10:16 PM) ips: ah, check out MeasurementManager.initialize() (05:10:26 PM) ips: the initial delay for the collector threadpool is 30s (05:10:38 PM) ips: also its max size is 5 (05:11:09 PM) ips: between those 2 things, i'm not suprised that if check in MeasurementCollectorRunner is often true (05:37:39 PM) ips: mazz - assuming that initial delay is what's causing issues, here's one idea: (05:38:05 PM) mazz: I think that is it (05:38:09 PM) ips: in MeasurementManager.initialize(): (05:38:10 PM) mazz: I'm sitting at a breatpoint right now (05:38:25 PM) ips: right after the line: this.collectorThreadPool.schedule(new MeasurementCollectionRequester(), collectionInitialDelaySecs, (05:38:25 PM) ips: TimeUnit.SECONDS); (05:38:27 PM) mazz: it hit that part where it thinks it is falling behind - and its all the platform metrics (05:39:25 PM) ips: add: this.initialCollectionStartTime = System.currentTimeMillis() + (collectionInitialDelaySecs * 1000); (05:40:27 PM) ips: and then in scheduleCollection() change: long firstCollection = System.currentTimeMillis(); (05:40:52 PM) ips: to: long firstCollection = Math.max(System.currentTimeMillis(), this.initialCollectionStartTime); (05:43:00 PM) mazz: yeah, that would delete the very first collecton out in the future by the initial delay amount (05:43:08 PM) mazz: I'll see if that works that proposed solution doesn't work. metrics are still delayed, but I don't see the falling behind message. not sure what the problem is yet fix was to always delay the first collection. git commit: master: 98ea586e555d3f07ba568b19b75f144872a8471a release_jon3.x: 61f2d882d8c614c5635f875fbe6c37e58032bb0b note: do not expect metrics to come in immediately upon starting the agent or immediately upon importing a resource. it will take about 60 seconds for the data to come in - this includes dynamic metrics as well as traits. *** Bug 536181 has been marked as a duplicate of this bug. *** changing status of VERIFIED BZs for JON 2.4.2 and JON 3.0 to CLOSED/CURRENTRELEASE |