Bug 536587 (RHQ-921)
Summary: | add command line option to start server in maintenance mode | ||
---|---|---|---|
Product: | [Other] RHQ Project | Reporter: | Jay Shaughnessy <jshaughn> |
Component: | No Component | Assignee: | John Mazzitelli <mazz> |
Status: | CLOSED NEXTRELEASE | QA Contact: | Corey Welton <cwelton> |
Severity: | medium | Docs Contact: | |
Priority: | high | ||
Version: | 1.1 | Keywords: | Improvement |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | All | ||
URL: | http://jira.rhq-project.org/browse/RHQ-921 | ||
Whiteboard: | |||
Fixed In Version: | 1.2 | Doc Type: | Enhancement |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: |
all
|
|
Last Closed: | Type: | --- | |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Jay Shaughnessy
2008-10-02 14:08:00 UTC
we could have the rhq-server script look for "-maintenance" and if it sees it on the command line, add "-Drhq.server.maintenance-mode-at-startup=true" to the VM opts. Otherwise, "-Drhq.server.maintenance-mode-at-startup=false" should be passed. Need to always pass it to make it easier to support launching via Java Service Wrapper. We could alternatively add this setting to the rhq-server.properties and you can manually set it to "true" in the file. Then , the code in StartupServlet would look for this setting and if true, ignore what's in the DB and immediately set the mode to MM. Without something like this its hard to recover from certain failure scenarios... -Assume you're server cloud is humming along and your DB goes down. -You shut the JON servers down since they can't talk to the DB and are just throwing exceptions. [Note you can't put the servers in maintenance mode since you can't write to the DB.] -You fix the DB and want to bring the Servers back up. Note agents stayed up spooling data during the DB outage. -You go through and start the Servers. While you were starting the Servers the agents were polling all the servers every 60seconds to see who might be up. The first server an agent finds which is up, it is going to try to talk to, and it won't check for whether its primary is up for another hour (or until you run the 'Download Latest Failover List' operation on the agent). So if you spend more than 1minute starting each server, odds are *all* your agents are going to be trying to talk to the first server you started. Which if you have a large number of agents is probably going to exceed the concurrency limits, so it may not be possible for all the agents to get properly connected (since agents being rejected for exceeding concurrency limits will not result in failover). Without all agents successfully being connected you can't reliably execute the 'Download Latest Failover List' operation so you're best bet is just to wait an hour for the agents to check again if their primary server is alive. This situation would be some alleviated if you could start the servers in maintenance mode, because you could then switch them all at once to Normal mode. (9:49:23 AM) mazz: we COULD have them update RHQ_SERVER, turn all of them into Maintenance Mode (9:49:27 AM) mazz: only THEN start the servers (9:49:33 AM) mazz: then go to Admin>ListServers (9:49:41 AM) mazz: check all of them and "switch to normal" (9:50:08 AM) mazz: hopefully, it creates less stress since all servers will come online at about the same time (9:57:59 AM) mazz: the ONLY problem I see with this (9:58:02 AM) ccrouch: and we assume that their DB can take 110 concurrent connections (9:58:11 AM) mazz: putting the servers in MM is NOT immediate. (9:58:24 AM) mazz: the servers have a 30s timer (9:58:32 AM) mazz: wakes up, reads DB - "am I in MM?" (9:58:46 AM) mazz: so there is a 30s lag between the first server going online and the last (9:59:06 AM) mazz: in that 30s period, all the agents that happen to poll at that time will not connect to the server(s) not yet in MM Right now the only way to get downed servers into maintenance mode is to use SQL as mazz describes above, a command line option would be preferable. going to add a setting to rhq-server.properties - this way, you can configure the server via the UI config tab (and all the goodness that entails). it is also consistent with all our other server-specific configs. This involves: 1) updating StartupServlet to look at the config and change the server mode prior to comm module setup 2) updating the container build to add the new setting to the .properties 3) updating installer so you can set this setting at install time 4) updating rhq-server plugin so it can see the new setting in config tab rhq-server plugin can now manage this new setting: https://jira.jboss.org/jira/browse/JOPR-41 QA Verified.. entered RHQ-1459 re: the location of the UI element, but the issue technically passes. This bug was previously known as http://jira.rhq-project.org/browse/RHQ-921 This bug is related to RHQ-1082 |