Red Hat Bugzilla – Bug 534604
Server should know to stop talking to db if its existing db instance has been overwritten
Last modified: 2010-08-18 10:54:49 EDT
I guess the summary is kind of confusing. So just follow the steps.
1. Configure RHQ as an HA instance: Install server on PlatformA, and Install it on PlatformB
2. Assure both servers are connected and can communicate with db, etc.
3. Re-install and overwrite RHQ on PlatformA -- basically assure it is a fresh install.
4. Observe database connections thereafter.
* PlatformB should 'know' that its instance is no longer valid in the db -- after all, its entity shouldn't be there any more if the db is truly overwritten...
* I would think we would be able to do a simple query to check whether the server's name still exists in the HA table, and if not, to shut itself down. It shouldn't blindly continue to communicate w/ db.
PlatformB maintains its connection to the db, even though said db has already apparently been overwritten during the re-install.
The server also fails to work when submerged in water, we might need to Jira that, too :)
I would think this is one of those "Patient : it hurts when I hold my arm like this. Doctor: don't hold your arm like that" issues.
If you completely blow away the database, then all servers previously running against that database will not work, so you shouldn't have your servers running if you blow away the db (which is a rare occurrence in production - you would never do this normally, after the very first server is installed). Perhaps we need to document that if you elect to "Overwrite Schema" in the installer, that you must only do so when all other servers are down.
We could add some querying in the installer to have it check the database to see if there are any servers currently active against the database (look at RHQ_SERVER and see if mtime is recent - within the past minute or two). If we see that, we could popup a message or something to warn the user. I wouldn't want to add another query to our timer jobs to check that our server still exists in the DB - it would be a waste to do this every 30 seconds when 99.99999% of the time, it will always be there (in fact, the design is such that it must be there).
But, again, I really don't think we need to spend alot of time or do any major refactoring to fix this - this is a rare occurrence and should be obvious to the user that if they blow away the schema, that any servers currently running will "break".
This bug was previously known as http://jira.rhq-project.org/browse/RHQ-1385
Temporarily adding the keyword "SubBug" so we can be sure we have accounted for all the bugs.
new = Tracking + FutureFeature + SubBug
making sure we're not missing any bugs in rhq_triage
Per 17-Aug-2010 triage, closing this bug. It can be reopened if considered a critical issue.