829962 – platform "goes down" alert doesn't fire the first time

Bug 829962 - platform "goes down" alert doesn't fire the first time

Summary: platform "goes down" alert doesn't fire the first time

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	RHQ Project
Classification:	Other
Component:	Alerts
Sub Component:
Version:	4.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	---
Assignee:	RHQ Project Maintainer
QA Contact:	Mike Foley
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	830299
TreeView+	depends on / blocked

Reported:	2012-06-07 21:45 UTC by John Mazzitelli
Modified:	2012-06-08 19:15 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	830299 (view as bug list)
Environment:
Last Closed:	2012-06-08 19:12:16 UTC
Embargoed:

Attachments	(Terms of Use)

Description John Mazzitelli 2012-06-07 21:45:19 UTC

1) start a server and a new agent
2) import the new platform
3) Create an alert on the platform resource - a "Goes Down" availability alert.
4) in the agent prompt, invoke "shutdown" (or just kill the agent)
5) notice no alert is fired - this is the bug
6) restart the agent (or type "start" if you are still at the agent prompt)
7) repeat step 4 (shutdown the agent)
8) notice that an alert IS fired.

Why does the alert fire the second time, but not the first?

Comment 1 Mike Foley 2012-06-07 22:40:20 UTC

documenting this is OK in JON 3.1

<mfoley_> trying this now
<mfoley_> ok ... it worked for me 1st time in JON 3.1
<mfoley_> but i can retest
<mfoley_> this is working for me in JON 3.1
<viet> it worked for me too first time in CR3

Comment 2 John Mazzitelli 2012-06-08 17:50:45 UTC

I am seeing this, but not 100% of the time. I just tried again, started with fresh DB, newly imported platform. I start the server, when its up, I start the agent. I import the RHQ Agent and the platform. On the platform, I create a Going Down alert. I shutdown the agent. In the server logs, I see this:

13:45:34,901 INFO  [CoreServerServiceImpl] Agent [mazztower][4.5.0-SNAPSHOT(c96fb05)] would like to connect to this server
13:45:35,018 INFO  [CoreServerServiceImpl] Agent [mazztower] has connected to this server at Fri Jun 08 13:45:35 EDT 2012
13:45:52,170 INFO  [CoreServerServiceImpl] Got agent registration request for existing agent: mazztower[192.168.1.2:16163][4.5.0-SNAPSHOT(c96fb05)] - Will not regenerate a new token
13:46:30,143 INFO  [CacheConsistencyManagerBean] localhost took [49]ms to reload cache for 1 agents
13:46:41,767 INFO  [AgentManagerBean] Agent with name [mazztower] just went down
13:47:00,200 INFO  [CacheConsistencyManagerBean] localhost took [43]ms to reload global cache
13:47:00,258 INFO  [CacheConsistencyManagerBean] localhost took [43]ms to reload cache for 1 agents

I think it might have something to do wiht the reloading of the caches.

Comment 3 John Mazzitelli 2012-06-08 18:15:57 UTC

I just tried again - clean DB, new agent. This time, the alert fired. But here's something different, I did not see the alert caches get reloaded:

14:11:54,500 INFO  [CoreServerServiceImpl] Got agent registration request for existing agent: mazztower[192.168.1.2:16163][4.5.0-SNAPSHOT(c96fb05)] - Will not regenerate a new token
14:12:38,094 INFO  [CacheConsistencyManagerBean] localhost took [51]ms to reload global cache
14:12:38,158 INFO  [CacheConsistencyManagerBean] localhost took [49]ms to reload cache for 1 agents
14:12:56,487 INFO  [AgentManagerBean] Agent with name [mazztower] just went down

Notice in comment #2, when the alert didn't fire, you notice that after the agent went down, the two alert caches reloaded.

Comment 4 John Mazzitelli 2012-06-08 19:12:16 UTC

This is to be expected. See the new FAQ I added so I don't forget this again 3 years from now :)

http://rhq-project.org/display/JOPR2/FAQ#FAQ-IcreatedanalertdefinitionandIknowimmediatelythereaftermyagentshouldhavereporteddatathatshouldhavetriggeredthealertbutmyalertdidnotfire.Wheredidmyalertgo%3F

Note You need to log in before you can comment on or make changes to this bug.