Bug 725429
Summary: | Measurement Cache Element Count is incorrect sometimes | ||||||
---|---|---|---|---|---|---|---|
Product: | [Other] RHQ Project | Reporter: | Simeon Pinder <spinder> | ||||
Component: | Alerts | Assignee: | Simeon Pinder <spinder> | ||||
Status: | CLOSED WORKSFORME | QA Contact: | Mike Foley <mfoley> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 4.0.1 | CC: | hrupp | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2011-07-27 21:37:08 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 678340, 725459 | ||||||
Attachments: |
|
Description
Simeon Pinder
2011-07-25 13:20:26 UTC
Created attachment 515056 [details]
Measurement Cache Element should never go above 15, but hit 20 here.
The Measurement Cache Element count was at 15 during correct alert generation and no new conditions were added but alert cache count rose to 20. The logs show cache recalculation where the additional cache elements are still being found.
Extra cache elements could cause extra or missing alert notifications.
(4:10:22 PM) jshaughn: ccrouch: the spinder alert bzs are basically done. Today we closed one (725320) as a duplicate of yesterday's resolved issue. Also, we created 726202 for another behavior he was tracking and I've got that fixed (you can review). (4:11:17 PM) jshaughn: it leaves only 725429 and whatever issues may still get reported from the customer. (4:11:21 PM) ccrouch: good stuff jshaughn spinder (4:12:54 PM) ccrouch: nice catch on https://bugzilla.redhat.com/show_bug.cgi?id=726202 (4:13:10 PM) jshaughn: as for 725429, I'm not sure about that one. I'd suggest we defer work there as I have not been able to recreate it, nor is the observed behavior, I think, obviously linked to an alerting issue. (4:14:25 PM) ccrouch: jshaughn: the customer has seen it though right? Every 24hrs? Or was that a graph of something else? (4:14:44 PM) jshaughn: as for the customer's issue, I never really saw that exact behavior. So, between 725429 and their report there may be something lurking. On the other hand, both could be innocuous. (4:15:07 PM) jshaughn: or, it could have been related to yesterday's issue. (4:15:15 PM) jshaughn: resolved by the slowed restart (4:15:30 PM) ccrouch: are you saying that its perfectly ok to the cache element count vary? (4:15:35 PM) jshaughn: I'm not sure. I think we need them to come back to us (4:15:54 PM) ccrouch: ...for the cache element count to vary? (4:15:55 PM) jshaughn: after the whole perf issue is reolved (4:16:32 PM) ccrouch: they were certainly in a pretty bad state (4:17:02 PM) jshaughn: the cache element count maybe should not vary but I'm not sure. As the agent caches are reloaded at db maintenance time, perhaps the dip is related to the reload. (4:17:34 PM) ccrouch: jshaughn: but you've not been able to trigger it? (4:17:46 PM) jshaughn: I've not seen it yet (4:18:21 PM) jshaughn: but my reloads are very fast because I'm not built out like they are, and I don't generate millions of alerts and crush my db (4:18:38 PM) ccrouch: right, but then neither was spinder (4:18:49 PM) ccrouch: so i guess its a 1-1 draw so far (4:18:50 PM) jshaughn: spinder did not report that issue in his bzs (4:19:14 PM) jshaughn: if he has seen it then we should definitely pusue it further (4:19:38 PM) ccrouch: i'm sorry i'm talking about the cache element count changes (4:19:43 PM) jshaughn: or, if they come back with that issue again, after the server restart tweak and resolved perf (4:20:11 PM) jshaughn: the customer complained abouta dip, and of missing alerts (4:20:36 PM) jshaughn: simeon claimed to see a higher than expected cache size. (4:21:13 PM) spinder: yep. I'm not sure how much of that was related to my agent<->server mismatch though. (4:21:25 PM) ccrouch: what agent/server mismatch? (4:21:41 PM) jshaughn: the customer dip, maybe was a product of agents that had been lost due to a server restart. I don't know. Or, it may certainly be another , real, issue. (4:22:16 PM) ccrouch: jshaughn: i see, you are differentiating between going up and going down. I was merging them together into "count changed" (4:22:17 PM) ccrouch: i see your point (4:22:52 PM) ccrouch: jshaughn: but regardless counts were steady for you? (4:22:57 PM) spinder: ccrouch: 725445. Basically if you're ever find you agent count not correct ... it could affect your agent cache count numbers I believe. (4:24:38 PM) jshaughn: ccrouch: for me, I have yet to see an unexpected cache size, other than due to the problems we've resolved. (4:24:48 PM) vhalber_afk is now known as vhalbert (4:24:50 PM) ccrouch: great ok (4:25:21 PM) ccrouch: so do you want to close as cannot repro ? (4:26:21 PM) ccrouch: 725429 i mean (4:27:07 PM) ccrouch: also https://bugzilla.redhat.com/show_bug.cgi?id=725445 can presumably be closed as wontfix? and we'll pick everything up in https://bugzilla.redhat.com/show_bug.cgi?id=725881 ? (4:28:11 PM) brandon_hm is now known as brandon_hm_afk jsanda jshaughn (4:30:55 PM) ccrouch: jshaughn: ^ ? (4:31:37 PM) jsanda is now known as jsanda_bbl (4:32:09 PM) jshaughn: we can leave 725429, I think. It's basically unexplained. Or, have simeon try to reproduce it again. (4:32:40 PM) jshaughn: but I recommend we don't work on it now as I'm not sure it related to any actual customer or alerting issue. (4:33:07 PM) ccrouch: right, thats why i'm hesitant to keep it open, if it cant be reproduced (4:33:24 PM) jshaughn: ask spinder what he wants to do (4:33:48 PM) jshaughn: 725445 can be closed as wontfix but yeah, the one I added should be done for jon3 (4:34:09 PM) spinder: close it. I've reinstalled three times already. I'm sure it happened. Mazz witnessed it. I'm just not sure what the repro steps are. |