Bug 829962 - platform "goes down" alert doesn't fire the first time
platform "goes down" alert doesn't fire the first time
Status: CLOSED NOTABUG
Product: RHQ Project
Classification: Other
Component: Alerts (Show other bugs)
4.4
Unspecified Unspecified
unspecified Severity unspecified (vote)
: ---
: ---
Assigned To: RHQ Project Maintainer
Mike Foley
:
Depends On:
Blocks: 830299
  Show dependency treegraph
 
Reported: 2012-06-07 17:45 EDT by John Mazzitelli
Modified: 2012-06-08 15:15 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 830299 (view as bug list)
Environment:
Last Closed: 2012-06-08 15:12:16 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description John Mazzitelli 2012-06-07 17:45:19 EDT
1) start a server and a new agent
2) import the new platform
3) Create an alert on the platform resource - a "Goes Down" availability alert.
4) in the agent prompt, invoke "shutdown" (or just kill the agent)
5) notice no alert is fired - this is the bug
6) restart the agent (or type "start" if you are still at the agent prompt)
7) repeat step 4 (shutdown the agent)
8) notice that an alert IS fired.

Why does the alert fire the second time, but not the first?
Comment 1 Mike Foley 2012-06-07 18:40:20 EDT
documenting this is OK in JON 3.1

<mfoley_> trying this now
<mfoley_> ok ... it worked for me 1st time in JON 3.1
<mfoley_> but i can retest
<mfoley_> this is working for me in JON 3.1
<viet> it worked for me too first time in CR3
Comment 2 John Mazzitelli 2012-06-08 13:50:45 EDT
I am seeing this, but not 100% of the time. I just tried again, started with fresh DB, newly imported platform. I start the server, when its up, I start the agent. I import the RHQ Agent and the platform. On the platform, I create a Going Down alert. I shutdown the agent. In the server logs, I see this:

13:45:34,901 INFO  [CoreServerServiceImpl] Agent [mazztower][4.5.0-SNAPSHOT(c96fb05)] would like to connect to this server
13:45:35,018 INFO  [CoreServerServiceImpl] Agent [mazztower] has connected to this server at Fri Jun 08 13:45:35 EDT 2012
13:45:52,170 INFO  [CoreServerServiceImpl] Got agent registration request for existing agent: mazztower[192.168.1.2:16163][4.5.0-SNAPSHOT(c96fb05)] - Will not regenerate a new token
13:46:30,143 INFO  [CacheConsistencyManagerBean] localhost took [49]ms to reload cache for 1 agents
13:46:41,767 INFO  [AgentManagerBean] Agent with name [mazztower] just went down
13:47:00,200 INFO  [CacheConsistencyManagerBean] localhost took [43]ms to reload global cache
13:47:00,258 INFO  [CacheConsistencyManagerBean] localhost took [43]ms to reload cache for 1 agents

I think it might have something to do wiht the reloading of the caches.
Comment 3 John Mazzitelli 2012-06-08 14:15:57 EDT
I just tried again - clean DB, new agent. This time, the alert fired. But here's something different, I did not see the alert caches get reloaded:

14:11:54,500 INFO  [CoreServerServiceImpl] Got agent registration request for existing agent: mazztower[192.168.1.2:16163][4.5.0-SNAPSHOT(c96fb05)] - Will not regenerate a new token
14:12:38,094 INFO  [CacheConsistencyManagerBean] localhost took [51]ms to reload global cache
14:12:38,158 INFO  [CacheConsistencyManagerBean] localhost took [49]ms to reload cache for 1 agents
14:12:56,487 INFO  [AgentManagerBean] Agent with name [mazztower] just went down

Notice in comment #2, when the alert didn't fire, you notice that after the agent went down, the two alert caches reloaded.
Comment 4 John Mazzitelli 2012-06-08 15:12:16 EDT
This is to be expected. See the new FAQ I added so I don't forget this again 3 years from now :)

http://rhq-project.org/display/JOPR2/FAQ#FAQ-IcreatedanalertdefinitionandIknowimmediatelythereaftermyagentshouldhavereporteddatathatshouldhavetriggeredthealertbutmyalertdidnotfire.Wheredidmyalertgo%3F

Note You need to log in before you can comment on or make changes to this bug.