Bug 975399

Summary:	[RHS-C] When glusterd is restarted, Server Status is not changing to "UP" from "Non-Operational" immediately
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Prasanth <pprakash>
Component:	rhsc	Assignee:	Sahina Bose <sabose>
Status:	CLOSED WONTFIX	QA Contact:	RHS-C QE <rhsc-qe-bugs>
Severity:	medium	Docs Contact:
Priority:	low
Version:	2.1	CC:	asriram, dpati, knarra, mmahoney, mmccune, rhs-bugs, sabose, ssampat
Target Milestone:	---
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Known Issue
Doc Text:	When gluster daemon service is restarted, the host status does not change to UP from Non-Operational immediately in the Red Hat Storage Console. There would be a 5 minute interval for auto-recovery operations which detect changes in Non-Operational hosts.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2015-12-28 06:57:35 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1035040

Description Prasanth 2013-06-18 11:27:46 UTC

Description of problem:

When glusterd is restarted, Server Status is not changing to "UP" from "Non-Operational" immediately

Version-Release number of selected component (if applicable):  Red Hat Storage Console Version: 2.1.0-0.bb3.el6rhs 


How reproducible: Always


Steps to Reproduce:
1. Create a cluster and add 2 servers (make sure that both are in UP state)
2. In server1, stop the glusterd (#/etc/init.d/glusterd stop)
3. In the UI, the status of server1 should now change to "Non-Operational"
4. Now restart glusterd in server 1 and check the Server Status in the UI.

Actual results: Server status still shows as "Non-Operational". However, I noticed that it is getting detected as a new server after 5 min with the following in the Event message: "Detected new Host server1. Host state was set to Up.". 


Expected results: Server status should automatically change to UP as soon as the issue is resolved. Waiting for 5 min, will actually prevent any successful volume operation from the UI during that time.


Additional info:

Comment 2 Sahina Bose 2013-10-25 12:05:38 UTC

GlusterMonotoringStrategy has been implemented to check gluster capabilities.

Comment 3 Sahina Bose 2013-10-30 14:25:21 UTC

When host state is Non-Operational, it is set back to UP via the AutoRecoveryManager. The interval of it can be configured via ConfigValues.AutoRecoverySchedule. Currently set to run every 5 mins.
This can be changed to run for shorter duration.

Comment 4 Sahina Bose 2013-11-12 05:28:57 UTC

Please check by changing the ConfigValues.AutoRecoverySchedule

Comment 5 Prasanth 2013-11-13 09:55:49 UTC

(In reply to Sahina Bose from comment #4)
> Please check by changing the ConfigValues.AutoRecoverySchedule

Is it user configurable? If so, can you tell me where can I find and edit this configuration parameter to verify this bug?

Comment 6 Sahina Bose 2013-11-15 08:40:34 UTC

The config value is currently not exposed. Will be posting a patch to do that.
You can change value meantime, by
psql engine postgres -c "update vdc_options set option_value = '0/5 * * * * ?' where option_name = 'AutoRecoverySchedule';"

Comment 7 Matt Mahoney 2013-11-15 19:17:03 UTC

Changing a database value is not an end-user solution.
Please set to ON_QA once there is an configuration solution that would be used by an end-user.

Comment 8 Dusmant 2013-11-25 09:36:29 UTC

User can manually activate the host, if they know that the glusterd has already come-up.

This is a framework limitation and for the time being, we can not enhance this to synch up immediately after the auto-recovery. It happens every 5 min (default) and the user would not see the update in that 5 min window.

Hence need to move it out of Corbett.

Comment 9 Dusmant 2013-11-26 16:21:58 UTC

There is a limitation on the framework for 5 min auto recovery operation... ( kind of synch up )... Need to take it out of Corbett

Comment 10 Shalaka 2014-01-24 11:28:40 UTC

Please review the edited Doc Text and sign off.

Comment 11 Sahina Bose 2014-01-30 07:13:55 UTC

Doctext ok

Comment 12 Dusmant 2015-12-28 06:57:35 UTC

RHSC 2.1 is EOLed and we don't have any plan to fix this issue.
If you think, this is an important bug to be addressed, pls. re-open it on the latest release. 
I will go ahead and "CLOSE" this bug for this release.