Bug 156872 - lt_high_locks setting
lt_high_locks setting
Status: CLOSED ERRATA
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: gulm (Show other bugs)
3
All Linux
medium Severity high
: ---
: ---
Assigned To: michael conrad tadpol tilstra
Cluster QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-05-04 16:27 EDT by Wendy Cheng
Modified: 2009-04-16 16:24 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-05-25 12:41:14 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
kludge around ccs's lack of find_css_long (3.68 KB, patch)
2005-05-09 11:38 EDT, michael conrad tadpol tilstra
no flags Details | Diff
celera files 4-1 (58.59 KB, application/pdf)
2005-05-19 15:42 EDT, Wendy Cheng
no flags Details

  None (edit)
Description Wendy Cheng 2005-05-04 16:27:15 EDT
Description of problem:
The problem is reported as the lt_high_locks setting in cluster.ccs is getting
ignored. However, via few quick greps/finds, the issue (looks to me) seems to be
caused by the compiler casting in the bound_to_ulong() call since both ccs and
gulm all know about this tunable and have code to work with it.

Note that the customer is running GFS-6.0.2.12 on AMD64 and this is affecting
their production environment - when the maximum number of locks is reached,
performance is drastically reduced while the lock server requests nodes drop
their unncessary locks.  Adjusting this setting higher would be a work-around to
that problem if this bug can be fixed.

Version-Release number of selected component (if applicable): 
GFS-6.0.2.12

How reproducible:
Always

Steps to Reproduce:
1. 
2.
3.
  
Actual results:


Expected results:


Additional info:
Comment 2 Wendy Cheng 2005-05-04 16:34:48 EDT
Symptoms:

1) Numerous messages shown up in /var/log/messages on the lock_gulm MASTER:

Apr 28 10:51:51 icla2g lock_gulmd_LT000[11621]: Lock count is at 2127373 which
is more than the max 2097152. Sending Drop all req to clients
..............

2) performance is drastically reduced since lock server keeps requesting nodes
to drop their unncessary locks.
Comment 3 Wendy Cheng 2005-05-04 16:37:20 EDT
The culprit seems to be in this routine where val should have been (casted) set
to ulong ? 

unsigned long bound_to_ulong(int val, unsigned long min, unsigned long max)
{
   if( val < min ) return min;
   if( val > max ) return max;
   return val;
}

Comment 4 michael conrad tadpol tilstra 2005-05-05 09:12:44 EDT
its not there actually.  ccs doesn't have a function to find a long, only int or
float values.  So ccs is actually not reading the number correctly.

For fixing the code, I think libccs will need to add a find long function.


Another temporary thing the customer can do is decrease the rate at when the
drop lock req are sent.  This would be the lt_drop_req_rate, set this to the
number of seconds between each drop req. 
Comment 5 michael conrad tadpol tilstra 2005-05-05 09:57:13 EDT
ccs in 6.0 can only find int, float, or string.
so to get a long form ccs, either we need to have a string passed in and parse
it ourselves, or we need to change the ccs libs.
Comment 8 Wendy Cheng 2005-05-06 12:01:50 EDT
Second report on GFS-6.0.2-25-i686 2.4.21-27.0.2.ELhugemem kernel. The problem
causes failover to occur. 
Comment 9 michael conrad tadpol tilstra 2005-05-09 09:26:47 EDT
also, setting lt_high_locks to -1 will max it out.  (although if you dump the
config with either -C or SIGUSR1, it will show -1 instead of 4294967295.  One
more thing to fix. wheeeee.)
Comment 10 michael conrad tadpol tilstra 2005-05-09 11:38:13 EDT
Created attachment 114165 [details]
kludge around ccs's lack of find_css_long

This is a quick patch that can fix this bug.  It kludges around things by
letting users specify unsigned longs as a string.  ie lt_high_locks =
"4294967296" (which would be the maximum value)

This also changes a bunch of %d to %u is the config dump function.
Comment 11 michael conrad tadpol tilstra 2005-05-11 11:59:54 EDT
checked this into cvs.
Comment 12 michael conrad tadpol tilstra 2005-05-11 12:01:09 EDT
oh, you can still use numbers to set lt_hight_locks, just in case that wasn't clear.
Comment 13 michael conrad tadpol tilstra 2005-05-16 11:00:05 EDT
Without the patch, the following two settings effectively turn the HighWater
lock drop request off.

 lt_high_locks = -1
 lt_drop_req_rate = -1

This actually sets both values to the maximum of an unsigned integer.
Comment 21 Jay Turner 2005-05-25 12:41:14 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2005-466.html

Note You need to log in before you can comment on or make changes to this bug.