975399 – [RHS-C] When glusterd is restarted, Server Status is not changing to "UP" from "Non-Operational" immediately

Bug 975399 - [RHS-C] When glusterd is restarted, Server Status is not changing to "UP" from "Non-Operational" immediately

Summary: [RHS-C] When glusterd is restarted, Server Status is not changing to "UP" fro...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	rhsc
Sub Component:
Version:	2.1
Hardware:	x86_64
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Sahina Bose
QA Contact:	RHS-C QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1035040
TreeView+	depends on / blocked

Reported:	2013-06-18 11:27 UTC by Prasanth
Modified:	2015-12-28 06:57 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	Known Issue
Doc Text:	When gluster daemon service is restarted, the host status does not change to UP from Non-Operational immediately in the Red Hat Storage Console. There would be a 5 minute interval for auto-recovery operations which detect changes in Non-Operational hosts.
Clone Of:
Environment:
Last Closed:	2015-12-28 06:57:35 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
oVirt gerrit	21452	0	None	ABANDONED	tools: [WIP] Expose AutoRecoverySchedule as engine-config parameter	Never

Description Prasanth 2013-06-18 11:27:46 UTC

Description of problem:

When glusterd is restarted, Server Status is not changing to "UP" from "Non-Operational" immediately

Version-Release number of selected component (if applicable):  Red Hat Storage Console Version: 2.1.0-0.bb3.el6rhs 


How reproducible: Always


Steps to Reproduce:
1. Create a cluster and add 2 servers (make sure that both are in UP state)
2. In server1, stop the glusterd (#/etc/init.d/glusterd stop)
3. In the UI, the status of server1 should now change to "Non-Operational"
4. Now restart glusterd in server 1 and check the Server Status in the UI.

Actual results: Server status still shows as "Non-Operational". However, I noticed that it is getting detected as a new server after 5 min with the following in the Event message: "Detected new Host server1. Host state was set to Up.". 


Expected results: Server status should automatically change to UP as soon as the issue is resolved. Waiting for 5 min, will actually prevent any successful volume operation from the UI during that time.


Additional info:

Comment 2 Sahina Bose 2013-10-25 12:05:38 UTC

GlusterMonotoringStrategy has been implemented to check gluster capabilities.

Comment 3 Sahina Bose 2013-10-30 14:25:21 UTC

When host state is Non-Operational, it is set back to UP via the AutoRecoveryManager. The interval of it can be configured via ConfigValues.AutoRecoverySchedule. Currently set to run every 5 mins.
This can be changed to run for shorter duration.

Comment 4 Sahina Bose 2013-11-12 05:28:57 UTC

Please check by changing the ConfigValues.AutoRecoverySchedule

Comment 5 Prasanth 2013-11-13 09:55:49 UTC

(In reply to Sahina Bose from comment #4)
> Please check by changing the ConfigValues.AutoRecoverySchedule

Is it user configurable? If so, can you tell me where can I find and edit this configuration parameter to verify this bug?

Comment 6 Sahina Bose 2013-11-15 08:40:34 UTC

The config value is currently not exposed. Will be posting a patch to do that.
You can change value meantime, by
psql engine postgres -c "update vdc_options set option_value = '0/5 * * * * ?' where option_name = 'AutoRecoverySchedule';"

Comment 7 Matt Mahoney 2013-11-15 19:17:03 UTC

Changing a database value is not an end-user solution.
Please set to ON_QA once there is an configuration solution that would be used by an end-user.

Comment 8 Dusmant 2013-11-25 09:36:29 UTC

User can manually activate the host, if they know that the glusterd has already come-up.

This is a framework limitation and for the time being, we can not enhance this to synch up immediately after the auto-recovery. It happens every 5 min (default) and the user would not see the update in that 5 min window.

Hence need to move it out of Corbett.

Comment 9 Dusmant 2013-11-26 16:21:58 UTC

There is a limitation on the framework for 5 min auto recovery operation... ( kind of synch up )... Need to take it out of Corbett

Comment 10 Shalaka 2014-01-24 11:28:40 UTC

Please review the edited Doc Text and sign off.

Comment 11 Sahina Bose 2014-01-30 07:13:55 UTC

Doctext ok

Comment 12 Dusmant 2015-12-28 06:57:35 UTC

RHSC 2.1 is EOLed and we don't have any plan to fix this issue.
If you think, this is an important bug to be addressed, pls. re-open it on the latest release. 
I will go ahead and "CLOSE" this bug for this release.

Note You need to log in before you can comment on or make changes to this bug.