Bug 146670 - gulm filesystems deadlock with heavy load
gulm filesystems deadlock with heavy load
Status: CLOSED NEXTRELEASE
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: gulm (Show other bugs)
4
i686 Linux
medium Severity medium
: ---
: ---
Assigned To: michael conrad tadpol tilstra
Cluster QE
:
Depends On:
Blocks: 144795
  Show dependency treegraph
 
Reported: 2005-01-31 12:29 EST by Corey Marthaler
Modified: 2009-04-16 16:24 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-04-06 12:02:31 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Corey Marthaler 2005-01-31 12:29:36 EST
Description of problem:
There is already a bug in sistina bugs I believe for the issue of Gulm
and posix locks, however I reproduced this hang by only using flocks
as well with the tests genesis and accordion on each of the machines
in the cluster.

genesis -L flock -n 250 -d 50 -p 2
accordion -L flock -p 2 accrdfile1 accrdfile2 accrdfile3 accrdfile4
accrdfile5

eventually the fileystems become stuck:

[root@morph-03 tmp]# ps -efFwTl | grep genesis
4 S root      5044  5044  5043  0  81   0 -   365 wait    336   0
Jan28 ?        00:00:00 genesis -L flock -n 250 -d 50 -p 2
5 D root      5049  5049  5044  0  78   0 -   368 glock_  448   0
Jan28 ?        00:06:05 genesis -L flock -n 250 -d 50 -p 2
5 D root      5050  5050  5044  0  78   0 -   368 glock_  448   0
Jan28 ?        00:06:02 genesis -L flock -n 250 -d 50 -p 2
4 S root      5055  5055  5054  0  84   0 -   525 wait    336   1
Jan28 ?        00:00:00 genesis -L flock -n 250 -d 50 -p 2
5 D root      5058  5058  5055  0  78   0 -   528 glock_  448   0
Jan28 ?        00:05:08 genesis -L flock -n 250 -d 50 -p 2
5 D root      5060  5060  5055  0  78   0 -   528 glock_  448   0
Jan28 ?        00:05:09 genesis -L flock -n 250 -d 50 -p 2
4 S root      5075  5075  5074  0  85   0 -   726 wait    332   0
Jan28 ?        00:00:00 genesis -L flock -n 250 -d 50 -p 2
5 D root      5076  5076  5075  0  78   0 -   730 glock_  448   0
Jan28 ?        00:04:56 genesis -L flock -n 250 -d 50 -p 2
5 D root      5078  5078  5075  0  78   0 -   730 glock_  448   0
Jan28 ?        00:04:56 genesis -L flock -n 250 -d 50 -p 2
4 S root      5086  5086  5085  0  85   0 -   843 wait    336   0
Jan28 ?        00:00:00 genesis -L flock -n 250 -d 50 -p 2
5 D root      5088  5088  5086  0  78   0 -   846 glock_  448   0
Jan28 ?        00:05:47 genesis -L flock -n 250 -d 50 -p 2
5 D root      5090  5090  5086  0  78   0 -   846 glock_  448   1
Jan28 ?        00:05:45 genesis -L flock -n 250 -d 50 -p 2


[root@morph-04 root]# strace df -h
.
.
.
statfs64("/mnt/gfs0", 84,

I tried to get more info from /proc/pid but that was stuck as well.

Version-Release number of selected component (if applicable):
Gulm <CVS> (built Jan 28 2005 16:39:38) installed


How reproducible:
Sometimes
Comment 1 michael conrad tadpol tilstra 2005-02-01 09:08:13 EST
pretty sure this doesn't have anything to do with plocks or flocks.  Seems to be
entirely load based.
Comment 2 Kiersten (Kerri) Anderson 2005-02-01 10:35:20 EST
Adding to release blocker list
Comment 3 michael conrad tadpol tilstra 2005-02-01 13:32:01 EST
All that's required to get this is two clients, gulm server[s] in either slm or
rlm modes and load. (no clvm)
load for me is fsstresses and a couple doios.  Lighten the load (stop all
fsstress or doio) and the deadlock isn't hit.
Comment 4 michael conrad tadpol tilstra 2005-02-01 16:10:04 EST
found it.  Flooding ltpx faster than it can handle. so it locks up.  
a `gulm_tool getstats <deadlocked-client>:ltpx` will timeout.
now to fix.....
Comment 5 michael conrad tadpol tilstra 2005-02-03 11:14:23 EST
Added out going queues to the local connects.  Bug seems to dissapeared.
Comment 6 Corey Marthaler 2005-04-06 12:02:31 EDT
fix verified.

Note You need to log in before you can comment on or make changes to this bug.