Description of problem: To fix the locking issue, RHV monitoring of gluster will need to change to use get-state and aggregate information collected from each node Version-Release number of selected component (if applicable): NA Additional info: Currently, Vdsm triggering many call per status check which is resulting in failed locking inside gluster. This is stopping user to run any manual gluster related commands, since locks are already held.
Gluster eventing has been introduced, so we can change the monitoring interval to a less frequent value.
We have integrated with gluster eventing, so can we ensure that we do not poll as frequently to avoid the locking issues.
What should be the refresh rate? Currently I have checked it to be 15 sec.
(In reply to kmajumde from comment #7) > What should be the refresh rate? Currently I have checked it to be 15 sec. Volume status are every 5 mins...it's part of the RefreshRateHeavy, I think - can you check. So this can be changed to 15mins. but also check if there's any other polling within this same method that needs to continue at 5 mins?
(In reply to Sahina Bose from comment #8) > (In reply to kmajumde from comment #7) > > What should be the refresh rate? Currently I have checked it to be 15 sec. > > Volume status are every 5 mins...it's part of the RefreshRateHeavy, I think > - can you check. > So this can be changed to 15mins. but also check if there's any other > polling within this same method that needs to continue at 5 mins? brick details, volume capacity , volume advanced details , volume online status are under this polling method . None i feel that needs to continue at 5 mins.
This bug has not been marked as blocker for oVirt 4.3.0. Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1.
Kaustav, can you backport to 4.3?
Kaustav in which version is this fix included? I see only one patch attached and it's on the engine side and merged in ovirt-engine-4.3.3. This bug is on vdsm, can you please update this bug to match current real state?
Done. The bug was wrongly tagged to vdsm.
Tested with RHV 4.3.3 'gluster volume status detail' is queried for every 15 mins [2019-07-26 15:00:38.425519] : system:: uuid get : SUCCESS [2019-07-26 15:00:54.035630] : system:: uuid get : SUCCESS [2019-07-26 15:01:09.674883] : system:: uuid get : SUCCESS [2019-07-26 15:01:12.398454] : volume status vmstore : SUCCESS [2019-07-26 15:01:12.586130] : volume status vmstore detail : SUCCESS [2019-07-26 15:01:16.200499] : volume status engine : SUCCESS [2019-07-26 15:01:16.378735] : volume status engine detail : SUCCESS [2019-07-26 15:01:19.968367] : volume status non_vdo : SUCCESS [2019-07-26 15:01:20.152647] : volume status non_vdo detail : SUCCESS [2019-07-26 15:01:23.795427] : volume status data : SUCCESS [2019-07-26 15:01:23.984186] : volume status data detail : SUCCESS <lines-snipped> [2019-07-26 15:16:27.651132] : volume status vmstore : SUCCESS [2019-07-26 15:16:27.835725] : volume status vmstore detail : SUCCESS [2019-07-26 15:16:31.528667] : volume status engine : SUCCESS [2019-07-26 15:16:31.708591] : volume status engine detail : SUCCESS [2019-07-26 15:16:35.358410] : volume status non_vdo : SUCCESS [2019-07-26 15:16:35.538591] : volume status non_vdo detail : SUCCESS [2019-07-26 15:16:36.997835] : system:: uuid get : SUCCESS [2019-07-26 15:16:39.815920] : volume status data : SUCCESS [2019-07-26 15:16:40.006440] : volume status data detail : SUCCESS
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:2431