Bug 975399 - [RHS-C] When glusterd is restarted, Server Status is not changing to "UP" from "Non-Operational" immediately
[RHS-C] When glusterd is restarted, Server Status is not changing to "UP" fro...
Status: CLOSED WONTFIX
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: rhsc (Show other bugs)
2.1
x86_64 Linux
low Severity medium
: ---
: ---
Assigned To: Sahina Bose
RHS-C QE
:
Depends On:
Blocks: 1035040
  Show dependency treegraph
 
Reported: 2013-06-18 07:27 EDT by Prasanth
Modified: 2015-12-28 01:57 EST (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Known Issue
Doc Text:
When gluster daemon service is restarted, the host status does not change to UP from Non-Operational immediately in the Red Hat Storage Console. There would be a 5 minute interval for auto-recovery operations which detect changes in Non-Operational hosts.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-12-28 01:57:35 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 21452 None ABANDONED tools: [WIP] Expose AutoRecoverySchedule as engine-config parameter Never

  None (edit)
Description Prasanth 2013-06-18 07:27:46 EDT
Description of problem:

When glusterd is restarted, Server Status is not changing to "UP" from "Non-Operational" immediately

Version-Release number of selected component (if applicable):  Red Hat Storage Console Version: 2.1.0-0.bb3.el6rhs 


How reproducible: Always


Steps to Reproduce:
1. Create a cluster and add 2 servers (make sure that both are in UP state)
2. In server1, stop the glusterd (#/etc/init.d/glusterd stop)
3. In the UI, the status of server1 should now change to "Non-Operational"
4. Now restart glusterd in server 1 and check the Server Status in the UI.

Actual results: Server status still shows as "Non-Operational". However, I noticed that it is getting detected as a new server after 5 min with the following in the Event message: "Detected new Host server1. Host state was set to Up.". 


Expected results: Server status should automatically change to UP as soon as the issue is resolved. Waiting for 5 min, will actually prevent any successful volume operation from the UI during that time.


Additional info:
Comment 2 Sahina Bose 2013-10-25 08:05:38 EDT
GlusterMonotoringStrategy has been implemented to check gluster capabilities.
Comment 3 Sahina Bose 2013-10-30 10:25:21 EDT
When host state is Non-Operational, it is set back to UP via the AutoRecoveryManager. The interval of it can be configured via ConfigValues.AutoRecoverySchedule. Currently set to run every 5 mins.
This can be changed to run for shorter duration.
Comment 4 Sahina Bose 2013-11-12 00:28:57 EST
Please check by changing the ConfigValues.AutoRecoverySchedule
Comment 5 Prasanth 2013-11-13 04:55:49 EST
(In reply to Sahina Bose from comment #4)
> Please check by changing the ConfigValues.AutoRecoverySchedule

Is it user configurable? If so, can you tell me where can I find and edit this configuration parameter to verify this bug?
Comment 6 Sahina Bose 2013-11-15 03:40:34 EST
The config value is currently not exposed. Will be posting a patch to do that.
You can change value meantime, by
psql engine postgres -c "update vdc_options set option_value = '0/5 * * * * ?' where option_name = 'AutoRecoverySchedule';"
Comment 7 Matt Mahoney 2013-11-15 14:17:03 EST
Changing a database value is not an end-user solution.
Please set to ON_QA once there is an configuration solution that would be used by an end-user.
Comment 8 Dusmant 2013-11-25 04:36:29 EST
User can manually activate the host, if they know that the glusterd has already come-up.

This is a framework limitation and for the time being, we can not enhance this to synch up immediately after the auto-recovery. It happens every 5 min (default) and the user would not see the update in that 5 min window.

Hence need to move it out of Corbett.
Comment 9 Dusmant 2013-11-26 11:21:58 EST
There is a limitation on the framework for 5 min auto recovery operation... ( kind of synch up )... Need to take it out of Corbett
Comment 10 Shalaka 2014-01-24 06:28:40 EST
Please review the edited Doc Text and sign off.
Comment 11 Sahina Bose 2014-01-30 02:13:55 EST
Doctext ok
Comment 12 Dusmant 2015-12-28 01:57:35 EST
RHSC 2.1 is EOLed and we don't have any plan to fix this issue.
If you think, this is an important bug to be addressed, pls. re-open it on the latest release. 
I will go ahead and "CLOSE" this bug for this release.

Note You need to log in before you can comment on or make changes to this bug.