Red Hat Bugzilla – Bug 975399
[RHS-C] When glusterd is restarted, Server Status is not changing to "UP" from "Non-Operational" immediately
Last modified: 2015-12-28 01:57:35 EST
Description of problem:
When glusterd is restarted, Server Status is not changing to "UP" from "Non-Operational" immediately
Version-Release number of selected component (if applicable): Red Hat Storage Console Version: 2.1.0-0.bb3.el6rhs
How reproducible: Always
Steps to Reproduce:
1. Create a cluster and add 2 servers (make sure that both are in UP state)
2. In server1, stop the glusterd (#/etc/init.d/glusterd stop)
3. In the UI, the status of server1 should now change to "Non-Operational"
4. Now restart glusterd in server 1 and check the Server Status in the UI.
Actual results: Server status still shows as "Non-Operational". However, I noticed that it is getting detected as a new server after 5 min with the following in the Event message: "Detected new Host server1. Host state was set to Up.".
Expected results: Server status should automatically change to UP as soon as the issue is resolved. Waiting for 5 min, will actually prevent any successful volume operation from the UI during that time.
GlusterMonotoringStrategy has been implemented to check gluster capabilities.
When host state is Non-Operational, it is set back to UP via the AutoRecoveryManager. The interval of it can be configured via ConfigValues.AutoRecoverySchedule. Currently set to run every 5 mins.
This can be changed to run for shorter duration.
Please check by changing the ConfigValues.AutoRecoverySchedule
(In reply to Sahina Bose from comment #4)
> Please check by changing the ConfigValues.AutoRecoverySchedule
Is it user configurable? If so, can you tell me where can I find and edit this configuration parameter to verify this bug?
The config value is currently not exposed. Will be posting a patch to do that.
You can change value meantime, by
psql engine postgres -c "update vdc_options set option_value = '0/5 * * * * ?' where option_name = 'AutoRecoverySchedule';"
Changing a database value is not an end-user solution.
Please set to ON_QA once there is an configuration solution that would be used by an end-user.
User can manually activate the host, if they know that the glusterd has already come-up.
This is a framework limitation and for the time being, we can not enhance this to synch up immediately after the auto-recovery. It happens every 5 min (default) and the user would not see the update in that 5 min window.
Hence need to move it out of Corbett.
There is a limitation on the framework for 5 min auto recovery operation... ( kind of synch up )... Need to take it out of Corbett
Please review the edited Doc Text and sign off.
RHSC 2.1 is EOLed and we don't have any plan to fix this issue.
If you think, this is an important bug to be addressed, pls. re-open it on the latest release.
I will go ahead and "CLOSE" this bug for this release.