Bug 1468442 - [GSS] Too frequent gluster related command issued resulting in failed locking
[GSS] Too frequent gluster related command issued resulting in failed locking
Status: NEW
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: vdsm (Show other bugs)
3.2
Unspecified Unspecified
unspecified Severity high
: ---
: ---
Assigned To: Sahina Bose
Sweta Anandpara
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-07-07 02:34 EDT by Abhishek Kumar
Modified: 2017-11-29 01:21 EST (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Abhishek Kumar 2017-07-07 02:34:49 EDT
Description of problem:

 Too frequent gluster related command issued resulting in failed locking

Version-Release number of selected component (if applicable):

RHV manager (v4.0)
vdsm-4.19.15-1.el7ev.x86_64

How reproducible:

Cu environment

Steps to Reproduce:
1. Enable management through RHV manager to the gluster nodes
2. Locking failed issue arises on the nodes
3. Restarting glusterd on the node solves the issue

Actual results:

Vdsm triggering many call per status check

Expected results:

Vdsm should trigger only one call per status check

Additional info:
Comment 4 Sahina Bose 2017-07-11 01:31:56 EDT
Shubhendu, can you take a look at logs to see what's causing the frequent calls to gluster volume status?
Comment 5 Shubhendu Tripathi 2017-07-11 04:01:44 EDT
Sahina, yes I would try to analyze this and would update here.
Comment 7 Shubhendu Tripathi 2017-07-12 01:31:05 EDT
Abhishek, can you check the value of `GlusterRefreshRateHeavy` in `vdc_options` table? Ideally the value should be 300 secs.
Comment 8 Shubhendu Tripathi 2017-07-12 02:27:29 EDT
Also please check if some value set for `vds_retries` in `vdc_options`
Comment 9 Shubhendu Tripathi 2017-07-13 00:28:43 EDT
Abhishek,

Kindly check if task monitoring is critical is this scenario.
If not you may try to set the value of `GlusterRefreshRateTasks` to 300 or even 600 and see if the issue still persists.
Comment 12 Shubhendu Tripathi 2017-07-14 04:07:25 EDT
Abhishek,

As discussed over IRC, you can use the below commands to check the existing value and change them

- sudo -i -u postgres psql engine -c "select * from  vdc_options WHERE option_name = 'GlusterRefreshRateTasks';"

- sudo -i -u postgres psql engine -c "update vdc_options set option_value = '600' where option_name = 'GlusterRefreshRateTasks';"
Comment 13 Shubhendu Tripathi 2017-07-14 04:40:21 EDT
Abhishek,

Also post discussion with gluster team, we would certainly need cmd_history from all the storage nodes and the no of volumes.

Kindly share the details.
Comment 16 Shubhendu Tripathi 2017-07-17 03:50:43 EDT
Atin,

The requested cmd_history is available as attachment.
Please check and comment.
Comment 38 Sahina Bose 2017-11-29 01:21:19 EST
To fix the locking issue, RHV monitoring of gluster will need to change to use get-state and aggregate information collected from each node as suggested by Atin. This will need to be raised as an RFE in RHV.

The other option to minimise this happening is to increase the polling frequency of gluster status commands in RHV. This has an impact on the status reporting in RHV being stale.

Note You need to log in before you can comment on or make changes to this bug.