Bug 829962 - platform "goes down" alert doesn't fire the first time
Summary: platform "goes down" alert doesn't fire the first time
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: RHQ Project
Classification: Other
Component: Alerts
Version: 4.4
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: RHQ Project Maintainer
QA Contact: Mike Foley
URL:
Whiteboard:
Depends On:
Blocks: 830299
TreeView+ depends on / blocked
 
Reported: 2012-06-07 21:45 UTC by John Mazzitelli
Modified: 2012-06-08 19:15 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 830299 (view as bug list)
Environment:
Last Closed: 2012-06-08 19:12:16 UTC
Embargoed:


Attachments (Terms of Use)

Description John Mazzitelli 2012-06-07 21:45:19 UTC
1) start a server and a new agent
2) import the new platform
3) Create an alert on the platform resource - a "Goes Down" availability alert.
4) in the agent prompt, invoke "shutdown" (or just kill the agent)
5) notice no alert is fired - this is the bug
6) restart the agent (or type "start" if you are still at the agent prompt)
7) repeat step 4 (shutdown the agent)
8) notice that an alert IS fired.

Why does the alert fire the second time, but not the first?

Comment 1 Mike Foley 2012-06-07 22:40:20 UTC
documenting this is OK in JON 3.1

<mfoley_> trying this now
<mfoley_> ok ... it worked for me 1st time in JON 3.1
<mfoley_> but i can retest
<mfoley_> this is working for me in JON 3.1
<viet> it worked for me too first time in CR3

Comment 2 John Mazzitelli 2012-06-08 17:50:45 UTC
I am seeing this, but not 100% of the time. I just tried again, started with fresh DB, newly imported platform. I start the server, when its up, I start the agent. I import the RHQ Agent and the platform. On the platform, I create a Going Down alert. I shutdown the agent. In the server logs, I see this:

13:45:34,901 INFO  [CoreServerServiceImpl] Agent [mazztower][4.5.0-SNAPSHOT(c96fb05)] would like to connect to this server
13:45:35,018 INFO  [CoreServerServiceImpl] Agent [mazztower] has connected to this server at Fri Jun 08 13:45:35 EDT 2012
13:45:52,170 INFO  [CoreServerServiceImpl] Got agent registration request for existing agent: mazztower[192.168.1.2:16163][4.5.0-SNAPSHOT(c96fb05)] - Will not regenerate a new token
13:46:30,143 INFO  [CacheConsistencyManagerBean] localhost took [49]ms to reload cache for 1 agents
13:46:41,767 INFO  [AgentManagerBean] Agent with name [mazztower] just went down
13:47:00,200 INFO  [CacheConsistencyManagerBean] localhost took [43]ms to reload global cache
13:47:00,258 INFO  [CacheConsistencyManagerBean] localhost took [43]ms to reload cache for 1 agents

I think it might have something to do wiht the reloading of the caches.

Comment 3 John Mazzitelli 2012-06-08 18:15:57 UTC
I just tried again - clean DB, new agent. This time, the alert fired. But here's something different, I did not see the alert caches get reloaded:

14:11:54,500 INFO  [CoreServerServiceImpl] Got agent registration request for existing agent: mazztower[192.168.1.2:16163][4.5.0-SNAPSHOT(c96fb05)] - Will not regenerate a new token
14:12:38,094 INFO  [CacheConsistencyManagerBean] localhost took [51]ms to reload global cache
14:12:38,158 INFO  [CacheConsistencyManagerBean] localhost took [49]ms to reload cache for 1 agents
14:12:56,487 INFO  [AgentManagerBean] Agent with name [mazztower] just went down

Notice in comment #2, when the alert didn't fire, you notice that after the agent went down, the two alert caches reloaded.

Comment 4 John Mazzitelli 2012-06-08 19:12:16 UTC
This is to be expected. See the new FAQ I added so I don't forget this again 3 years from now :)

http://rhq-project.org/display/JOPR2/FAQ#FAQ-IcreatedanalertdefinitionandIknowimmediatelythereaftermyagentshouldhavereporteddatathatshouldhavetriggeredthealertbutmyalertdidnotfire.Wheredidmyalertgo%3F


Note You need to log in before you can comment on or make changes to this bug.