1468442 – [GSS] Too frequent gluster related command issued resulting in failed locking

Bug 1468442 - [GSS] Too frequent gluster related command issued resulting in failed locking

Summary: [GSS] Too frequent gluster related command issued resulting in failed locking

Keywords:
Status:	CLOSED CANTFIX
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	vdsm
Sub Component:
Version:	rhgs-3.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Sahina Bose
QA Contact:	Sweta Anandpara
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-07-07 06:34 UTC by Abhishek Kumar
Modified:	2021-06-10 12:33 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-10-25 09:37:34 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Abhishek Kumar 2017-07-07 06:34:49 UTC

Description of problem:

 Too frequent gluster related command issued resulting in failed locking

Version-Release number of selected component (if applicable):

RHV manager (v4.0)
vdsm-4.19.15-1.el7ev.x86_64

How reproducible:

Cu environment

Steps to Reproduce:
1. Enable management through RHV manager to the gluster nodes
2. Locking failed issue arises on the nodes
3. Restarting glusterd on the node solves the issue

Actual results:

Vdsm triggering many call per status check

Expected results:

Vdsm should trigger only one call per status check

Additional info:

Comment 4 Sahina Bose 2017-07-11 05:31:56 UTC

Shubhendu, can you take a look at logs to see what's causing the frequent calls to gluster volume status?

Comment 5 Shubhendu Tripathi 2017-07-11 08:01:44 UTC

Sahina, yes I would try to analyze this and would update here.

Comment 7 Shubhendu Tripathi 2017-07-12 05:31:05 UTC

Abhishek, can you check the value of `GlusterRefreshRateHeavy` in `vdc_options` table? Ideally the value should be 300 secs.

Comment 8 Shubhendu Tripathi 2017-07-12 06:27:29 UTC

Also please check if some value set for `vds_retries` in `vdc_options`

Comment 9 Shubhendu Tripathi 2017-07-13 04:28:43 UTC

Abhishek,

Kindly check if task monitoring is critical is this scenario.
If not you may try to set the value of `GlusterRefreshRateTasks` to 300 or even 600 and see if the issue still persists.

Comment 12 Shubhendu Tripathi 2017-07-14 08:07:25 UTC

Abhishek,

As discussed over IRC, you can use the below commands to check the existing value and change them

- sudo -i -u postgres psql engine -c "select * from  vdc_options WHERE option_name = 'GlusterRefreshRateTasks';"

- sudo -i -u postgres psql engine -c "update vdc_options set option_value = '600' where option_name = 'GlusterRefreshRateTasks';"

Comment 13 Shubhendu Tripathi 2017-07-14 08:40:21 UTC

Abhishek,

Also post discussion with gluster team, we would certainly need cmd_history from all the storage nodes and the no of volumes.

Kindly share the details.

Comment 16 Shubhendu Tripathi 2017-07-17 07:50:43 UTC

Atin,

The requested cmd_history is available as attachment.
Please check and comment.

Comment 38 Sahina Bose 2017-11-29 06:21:19 UTC

To fix the locking issue, RHV monitoring of gluster will need to change to use get-state and aggregate information collected from each node as suggested by Atin. This will need to be raised as an RFE in RHV.

The other option to minimise this happening is to increase the polling frequency of gluster status commands in RHV. This has an impact on the status reporting in RHV being stale.

Comment 43 Sahina Bose 2018-10-25 08:21:39 UTC

Bipin, since we have the RHV bug tracking the request for this customer case, can we close this bug? There are no changes that can be done in vdsm to address this

Note You need to log in before you can comment on or make changes to this bug.