Bug 534604 (RHQ-1385)

Summary: Server should know to stop talking to db if its existing db instance has been overwritten
Product: [Other] RHQ Project Reporter: Corey Welton <cwelton>
Component: Core ServerAssignee: RHQ Project Maintainer <rhq-maint>
Status: CLOSED WONTFIX QA Contact: Corey Welton <cwelton>
Severity: medium Docs Contact:
Priority: low    
Version: 1.2Keywords: SubBug
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: All   
URL: http://jira.rhq-project.org/browse/RHQ-1385
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-08-18 14:54:49 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 565628    

Description Corey Welton 2009-01-21 21:14:00 UTC
I guess the summary is kind of confusing.   So just follow the steps.

1. Configure RHQ as an HA instance:  Install server on PlatformA, and Install it on PlatformB
2. Assure both servers are connected and can communicate with db, etc.
3. Re-install and overwrite RHQ on PlatformA -- basically assure it is a fresh install.
4. Observe database connections thereafter.

Expected results:
* PlatformB should 'know' that its instance is no longer valid in the db -- after all, its entity shouldn't be there any more if the db is truly overwritten...
* I would think we would be able to do a simple query to check whether the server's name still exists in the HA table, and if not, to shut itself down.  It shouldn't blindly continue to communicate w/ db.

Current results:
PlatformB maintains its connection to the db, even though said db has already apparently been overwritten during the re-install.


Comment 1 John Mazzitelli 2009-01-21 21:59:21 UTC
The server also fails to work when submerged in water, we might need to Jira that, too :)

I would think this is one of those "Patient : it hurts when I hold my arm like this. Doctor: don't hold your arm like that" issues.
If you completely blow away the database, then all servers previously running against that database will not work, so you shouldn't have your servers running if you blow away the db (which is a rare occurrence in production - you would never do this normally, after the very first server is installed). Perhaps we need to document that if you elect to "Overwrite Schema" in the installer, that you must only do so when all other servers are down.

We could add some querying in the installer to have it check the database to see if there are any servers currently active against the database (look at RHQ_SERVER and see if mtime is recent - within the past minute or two). If we see that, we could popup a message or something to warn the user. I wouldn't want to add another query to our timer jobs to check that our server still exists in the DB - it would be a waste to do this every 30 seconds when 99.99999% of the time, it will always be there (in fact, the design is such that it must be there).

But, again, I really don't think we need to spend alot of time or do any major refactoring to fix this - this is a rare occurrence and should be obvious to the user that if they blow away the schema, that any servers currently running will "break".


Comment 2 Red Hat Bugzilla 2009-11-10 20:31:48 UTC
This bug was previously known as http://jira.rhq-project.org/browse/RHQ-1385


Comment 3 wes hayutin 2010-02-16 16:57:00 UTC
Temporarily adding the keyword "SubBug" so we can be sure we have accounted for all the bugs.

keyword:
new = Tracking + FutureFeature + SubBug

Comment 4 wes hayutin 2010-02-16 17:02:08 UTC
making sure we're not missing any bugs in rhq_triage

Comment 5 Corey Welton 2010-08-18 14:54:49 UTC
Per 17-Aug-2010 triage, closing this bug.  It can be reopened if considered a critical issue.