Bug 829962
Summary: | platform "goes down" alert doesn't fire the first time | |||
---|---|---|---|---|
Product: | [Other] RHQ Project | Reporter: | John Mazzitelli <mazz> | |
Component: | Alerts | Assignee: | RHQ Project Maintainer <rhq-maint> | |
Status: | CLOSED NOTABUG | QA Contact: | Mike Foley <mfoley> | |
Severity: | unspecified | Docs Contact: | ||
Priority: | unspecified | |||
Version: | 4.4 | CC: | hrupp | |
Target Milestone: | --- | |||
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 830299 (view as bug list) | Environment: | ||
Last Closed: | 2012-06-08 19:12:16 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 830299 |
Description
John Mazzitelli
2012-06-07 21:45:19 UTC
documenting this is OK in JON 3.1 <mfoley_> trying this now <mfoley_> ok ... it worked for me 1st time in JON 3.1 <mfoley_> but i can retest <mfoley_> this is working for me in JON 3.1 <viet> it worked for me too first time in CR3 I am seeing this, but not 100% of the time. I just tried again, started with fresh DB, newly imported platform. I start the server, when its up, I start the agent. I import the RHQ Agent and the platform. On the platform, I create a Going Down alert. I shutdown the agent. In the server logs, I see this: 13:45:34,901 INFO [CoreServerServiceImpl] Agent [mazztower][4.5.0-SNAPSHOT(c96fb05)] would like to connect to this server 13:45:35,018 INFO [CoreServerServiceImpl] Agent [mazztower] has connected to this server at Fri Jun 08 13:45:35 EDT 2012 13:45:52,170 INFO [CoreServerServiceImpl] Got agent registration request for existing agent: mazztower[192.168.1.2:16163][4.5.0-SNAPSHOT(c96fb05)] - Will not regenerate a new token 13:46:30,143 INFO [CacheConsistencyManagerBean] localhost took [49]ms to reload cache for 1 agents 13:46:41,767 INFO [AgentManagerBean] Agent with name [mazztower] just went down 13:47:00,200 INFO [CacheConsistencyManagerBean] localhost took [43]ms to reload global cache 13:47:00,258 INFO [CacheConsistencyManagerBean] localhost took [43]ms to reload cache for 1 agents I think it might have something to do wiht the reloading of the caches. I just tried again - clean DB, new agent. This time, the alert fired. But here's something different, I did not see the alert caches get reloaded: 14:11:54,500 INFO [CoreServerServiceImpl] Got agent registration request for existing agent: mazztower[192.168.1.2:16163][4.5.0-SNAPSHOT(c96fb05)] - Will not regenerate a new token 14:12:38,094 INFO [CacheConsistencyManagerBean] localhost took [51]ms to reload global cache 14:12:38,158 INFO [CacheConsistencyManagerBean] localhost took [49]ms to reload cache for 1 agents 14:12:56,487 INFO [AgentManagerBean] Agent with name [mazztower] just went down Notice in comment #2, when the alert didn't fire, you notice that after the agent went down, the two alert caches reloaded. This is to be expected. See the new FAQ I added so I don't forget this again 3 years from now :) http://rhq-project.org/display/JOPR2/FAQ#FAQ-IcreatedanalertdefinitionandIknowimmediatelythereaftermyagentshouldhavereporteddatathatshouldhavetriggeredthealertbutmyalertdidnotfire.Wheredidmyalertgo%3F |