Bug 975399 - [RHS-C] When glusterd is restarted, Server Status is not changing to "UP" from "Non-Operational" immediately
Summary: [RHS-C] When glusterd is restarted, Server Status is not changing to "UP" fro...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: rhsc
Version: 2.1
Hardware: x86_64
OS: Linux
low
medium
Target Milestone: ---
: ---
Assignee: Sahina Bose
QA Contact: RHS-C QE
URL:
Whiteboard:
Depends On:
Blocks: 1035040
TreeView+ depends on / blocked
 
Reported: 2013-06-18 11:27 UTC by Prasanth
Modified: 2015-12-28 06:57 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
When gluster daemon service is restarted, the host status does not change to UP from Non-Operational immediately in the Red Hat Storage Console. There would be a 5 minute interval for auto-recovery operations which detect changes in Non-Operational hosts.
Clone Of:
Environment:
Last Closed: 2015-12-28 06:57:35 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 21452 0 None ABANDONED tools: [WIP] Expose AutoRecoverySchedule as engine-config parameter Never

Description Prasanth 2013-06-18 11:27:46 UTC
Description of problem:

When glusterd is restarted, Server Status is not changing to "UP" from "Non-Operational" immediately

Version-Release number of selected component (if applicable):  Red Hat Storage Console Version: 2.1.0-0.bb3.el6rhs 


How reproducible: Always


Steps to Reproduce:
1. Create a cluster and add 2 servers (make sure that both are in UP state)
2. In server1, stop the glusterd (#/etc/init.d/glusterd stop)
3. In the UI, the status of server1 should now change to "Non-Operational"
4. Now restart glusterd in server 1 and check the Server Status in the UI.

Actual results: Server status still shows as "Non-Operational". However, I noticed that it is getting detected as a new server after 5 min with the following in the Event message: "Detected new Host server1. Host state was set to Up.". 


Expected results: Server status should automatically change to UP as soon as the issue is resolved. Waiting for 5 min, will actually prevent any successful volume operation from the UI during that time.


Additional info:

Comment 2 Sahina Bose 2013-10-25 12:05:38 UTC
GlusterMonotoringStrategy has been implemented to check gluster capabilities.

Comment 3 Sahina Bose 2013-10-30 14:25:21 UTC
When host state is Non-Operational, it is set back to UP via the AutoRecoveryManager. The interval of it can be configured via ConfigValues.AutoRecoverySchedule. Currently set to run every 5 mins.
This can be changed to run for shorter duration.

Comment 4 Sahina Bose 2013-11-12 05:28:57 UTC
Please check by changing the ConfigValues.AutoRecoverySchedule

Comment 5 Prasanth 2013-11-13 09:55:49 UTC
(In reply to Sahina Bose from comment #4)
> Please check by changing the ConfigValues.AutoRecoverySchedule

Is it user configurable? If so, can you tell me where can I find and edit this configuration parameter to verify this bug?

Comment 6 Sahina Bose 2013-11-15 08:40:34 UTC
The config value is currently not exposed. Will be posting a patch to do that.
You can change value meantime, by
psql engine postgres -c "update vdc_options set option_value = '0/5 * * * * ?' where option_name = 'AutoRecoverySchedule';"

Comment 7 Matt Mahoney 2013-11-15 19:17:03 UTC
Changing a database value is not an end-user solution.
Please set to ON_QA once there is an configuration solution that would be used by an end-user.

Comment 8 Dusmant 2013-11-25 09:36:29 UTC
User can manually activate the host, if they know that the glusterd has already come-up.

This is a framework limitation and for the time being, we can not enhance this to synch up immediately after the auto-recovery. It happens every 5 min (default) and the user would not see the update in that 5 min window.

Hence need to move it out of Corbett.

Comment 9 Dusmant 2013-11-26 16:21:58 UTC
There is a limitation on the framework for 5 min auto recovery operation... ( kind of synch up )... Need to take it out of Corbett

Comment 10 Shalaka 2014-01-24 11:28:40 UTC
Please review the edited Doc Text and sign off.

Comment 11 Sahina Bose 2014-01-30 07:13:55 UTC
Doctext ok

Comment 12 Dusmant 2015-12-28 06:57:35 UTC
RHSC 2.1 is EOLed and we don't have any plan to fix this issue.
If you think, this is an important bug to be addressed, pls. re-open it on the latest release. 
I will go ahead and "CLOSE" this bug for this release.


Note You need to log in before you can comment on or make changes to this bug.