Bug 975399

Summary: [RHS-C] When glusterd is restarted, Server Status is not changing to "UP" from "Non-Operational" immediately
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Prasanth <pprakash>
Component: rhscAssignee: Sahina Bose <sabose>
Status: CLOSED WONTFIX QA Contact: RHS-C QE <rhsc-qe-bugs>
Severity: medium Docs Contact:
Priority: low    
Version: 2.1CC: asriram, dpati, knarra, mmahoney, mmccune, rhs-bugs, sabose, ssampat
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Known Issue
Doc Text:
When gluster daemon service is restarted, the host status does not change to UP from Non-Operational immediately in the Red Hat Storage Console. There would be a 5 minute interval for auto-recovery operations which detect changes in Non-Operational hosts.
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-12-28 06:57:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1035040    

Description Prasanth 2013-06-18 11:27:46 UTC
Description of problem:

When glusterd is restarted, Server Status is not changing to "UP" from "Non-Operational" immediately

Version-Release number of selected component (if applicable):  Red Hat Storage Console Version: 2.1.0-0.bb3.el6rhs 


How reproducible: Always


Steps to Reproduce:
1. Create a cluster and add 2 servers (make sure that both are in UP state)
2. In server1, stop the glusterd (#/etc/init.d/glusterd stop)
3. In the UI, the status of server1 should now change to "Non-Operational"
4. Now restart glusterd in server 1 and check the Server Status in the UI.

Actual results: Server status still shows as "Non-Operational". However, I noticed that it is getting detected as a new server after 5 min with the following in the Event message: "Detected new Host server1. Host state was set to Up.". 


Expected results: Server status should automatically change to UP as soon as the issue is resolved. Waiting for 5 min, will actually prevent any successful volume operation from the UI during that time.


Additional info:

Comment 2 Sahina Bose 2013-10-25 12:05:38 UTC
GlusterMonotoringStrategy has been implemented to check gluster capabilities.

Comment 3 Sahina Bose 2013-10-30 14:25:21 UTC
When host state is Non-Operational, it is set back to UP via the AutoRecoveryManager. The interval of it can be configured via ConfigValues.AutoRecoverySchedule. Currently set to run every 5 mins.
This can be changed to run for shorter duration.

Comment 4 Sahina Bose 2013-11-12 05:28:57 UTC
Please check by changing the ConfigValues.AutoRecoverySchedule

Comment 5 Prasanth 2013-11-13 09:55:49 UTC
(In reply to Sahina Bose from comment #4)
> Please check by changing the ConfigValues.AutoRecoverySchedule

Is it user configurable? If so, can you tell me where can I find and edit this configuration parameter to verify this bug?

Comment 6 Sahina Bose 2013-11-15 08:40:34 UTC
The config value is currently not exposed. Will be posting a patch to do that.
You can change value meantime, by
psql engine postgres -c "update vdc_options set option_value = '0/5 * * * * ?' where option_name = 'AutoRecoverySchedule';"

Comment 7 Matt Mahoney 2013-11-15 19:17:03 UTC
Changing a database value is not an end-user solution.
Please set to ON_QA once there is an configuration solution that would be used by an end-user.

Comment 8 Dusmant 2013-11-25 09:36:29 UTC
User can manually activate the host, if they know that the glusterd has already come-up.

This is a framework limitation and for the time being, we can not enhance this to synch up immediately after the auto-recovery. It happens every 5 min (default) and the user would not see the update in that 5 min window.

Hence need to move it out of Corbett.

Comment 9 Dusmant 2013-11-26 16:21:58 UTC
There is a limitation on the framework for 5 min auto recovery operation... ( kind of synch up )... Need to take it out of Corbett

Comment 10 Shalaka 2014-01-24 11:28:40 UTC
Please review the edited Doc Text and sign off.

Comment 11 Sahina Bose 2014-01-30 07:13:55 UTC
Doctext ok

Comment 12 Dusmant 2015-12-28 06:57:35 UTC
RHSC 2.1 is EOLed and we don't have any plan to fix this issue.
If you think, this is an important bug to be addressed, pls. re-open it on the latest release. 
I will go ahead and "CLOSE" this bug for this release.