Bug 156872

Summary: lt_high_locks setting
Product: [Retired] Red Hat Cluster Suite Reporter: Wendy Cheng <nobody+wcheng>
Component: gulmAssignee: michael conrad tadpol tilstra <mtilstra>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: high Docs Contact:
Priority: medium    
Version: 3CC: cluster-maint, tao
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-05-25 16:41:14 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
kludge around ccs's lack of find_css_long
none
celera files 4-1 none

Description Wendy Cheng 2005-05-04 20:27:15 UTC
Description of problem:
The problem is reported as the lt_high_locks setting in cluster.ccs is getting
ignored. However, via few quick greps/finds, the issue (looks to me) seems to be
caused by the compiler casting in the bound_to_ulong() call since both ccs and
gulm all know about this tunable and have code to work with it.

Note that the customer is running GFS-6.0.2.12 on AMD64 and this is affecting
their production environment - when the maximum number of locks is reached,
performance is drastically reduced while the lock server requests nodes drop
their unncessary locks.  Adjusting this setting higher would be a work-around to
that problem if this bug can be fixed.

Version-Release number of selected component (if applicable): 
GFS-6.0.2.12

How reproducible:
Always

Steps to Reproduce:
1. 
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 2 Wendy Cheng 2005-05-04 20:34:48 UTC
Symptoms:

1) Numerous messages shown up in /var/log/messages on the lock_gulm MASTER:

Apr 28 10:51:51 icla2g lock_gulmd_LT000[11621]: Lock count is at 2127373 which
is more than the max 2097152. Sending Drop all req to clients
..............

2) performance is drastically reduced since lock server keeps requesting nodes
to drop their unncessary locks.

Comment 3 Wendy Cheng 2005-05-04 20:37:20 UTC
The culprit seems to be in this routine where val should have been (casted) set
to ulong ? 

unsigned long bound_to_ulong(int val, unsigned long min, unsigned long max)
{
   if( val < min ) return min;
   if( val > max ) return max;
   return val;
}



Comment 4 michael conrad tadpol tilstra 2005-05-05 13:12:44 UTC
its not there actually.  ccs doesn't have a function to find a long, only int or
float values.  So ccs is actually not reading the number correctly.

For fixing the code, I think libccs will need to add a find long function.


Another temporary thing the customer can do is decrease the rate at when the
drop lock req are sent.  This would be the lt_drop_req_rate, set this to the
number of seconds between each drop req. 

Comment 5 michael conrad tadpol tilstra 2005-05-05 13:57:13 UTC
ccs in 6.0 can only find int, float, or string.
so to get a long form ccs, either we need to have a string passed in and parse
it ourselves, or we need to change the ccs libs.


Comment 8 Wendy Cheng 2005-05-06 16:01:50 UTC
Second report on GFS-6.0.2-25-i686 2.4.21-27.0.2.ELhugemem kernel. The problem
causes failover to occur. 

Comment 9 michael conrad tadpol tilstra 2005-05-09 13:26:47 UTC
also, setting lt_high_locks to -1 will max it out.  (although if you dump the
config with either -C or SIGUSR1, it will show -1 instead of 4294967295.  One
more thing to fix. wheeeee.)

Comment 10 michael conrad tadpol tilstra 2005-05-09 15:38:13 UTC
Created attachment 114165 [details]
kludge around ccs's lack of find_css_long

This is a quick patch that can fix this bug.  It kludges around things by
letting users specify unsigned longs as a string.  ie lt_high_locks =
"4294967296" (which would be the maximum value)

This also changes a bunch of %d to %u is the config dump function.

Comment 11 michael conrad tadpol tilstra 2005-05-11 15:59:54 UTC
checked this into cvs.

Comment 12 michael conrad tadpol tilstra 2005-05-11 16:01:09 UTC
oh, you can still use numbers to set lt_hight_locks, just in case that wasn't clear.

Comment 13 michael conrad tadpol tilstra 2005-05-16 15:00:05 UTC
Without the patch, the following two settings effectively turn the HighWater
lock drop request off.

 lt_high_locks = -1
 lt_drop_req_rate = -1

This actually sets both values to the maximum of an unsigned integer.


Comment 21 Jay Turner 2005-05-25 16:41:14 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2005-466.html