Bug 133368

Summary: lvmextend timing out on GFS-mounted volume
Product: [Retired] Red Hat Cluster Suite Reporter: Derek Anderson <danderso>
Component: dlmAssignee: David Teigland <teigland>
Status: CLOSED ERRATA QA Contact: GFS Bugs <gfs-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 4CC: bmarzins, bstevens, cluster-maint, kpreslan, lenny, teigland
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-01-04 19:01:51 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 144795    
Attachments:
Description Flags
output from clvmd -d when compiled with -DDEBUG
none
like before, but no gfs this time none

Description Derek Anderson 2004-09-23 15:43:30 UTC
Description of problem:
Setup: 3 node quorate cluster, all running clvmd and one GFS
filesystem with lock_dlm protocol mounted.  From the third node
(link-12) run I/O traffic on the GFS (bonnie++).  From the first node
create a new pv, add it to the vg, and attempt to lvextend.  The
command is timing out.
[root@link-10 /]# lvextend -L +1G /dev/MINE/lvol0
  Extending logical volume lvol0 to 201.00 GB
  Error locking on node link-12: Command timed out
  Failed to suspend lvol0
[root@link-10 /]#


Version-Release number of selected component (if applicable):
[root@link-10 /]# lvextend --version
  LVM version:     2.00.25-cvs (2004-09-16)
  Library version: 1.00.20-ioctl-cvs (2004-07-03)
  Driver version:  4.1.0

How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Christine Caulfield 2004-09-23 15:55:48 UTC
My guess is that it's taking a long time to flush all those
outstanding IOs to the GFS volume before it can suspend.

You'd like me to up the timeout (from 1 minute?)

Actually, try the operation with clvmd -t600 (or so) to increase the
timeout.

Comment 2 Christine Caulfield 2004-10-04 13:30:07 UTC
On further examination it seems that this really is a locking deadlock
of some kind. Possibly interesting extract from /proc/cluster/dlm_locks:

Resource c6f185a8 (parent 00000000). Name (len=24) "       1         
     2"  
Master Copy
Granted Queue
000100bf CW Remote:   2 00020164
00010292 CW Remote:   9 00010057
000301b7 CW Remote:  11 00020083
00020304 CW Remote:  10 0001025a
00010166 CW Remote:  12 00010139
000103b8 CW Remote:   4 00010254
Conversion Queue
00010082 NL (CW) Remote:   3 0001004a  LQ: 0
00010071 NL (CW) Remote:   5 00010386  LQ: 0
0001010a NL (CW) Remote:   6 00010342  LQ: 0
000103a2 NL (CW) Remote:   1 000101ca  LQ: 0
00010150 NL (CW) Remote:   7 0001027d  LQ: 0
Waiting Queue
0003012d -- (PR)  LQ: 0


Comment 3 David Teigland 2004-10-14 07:00:05 UTC
From Patrick - Easy way to reproduce:
nodeA:  run bonnie++ on GFS filesystem
nodeB:  dmsetup suspend vg-gfs
nodeA:  dmsetup suspend vg-gfs   deadlock

It appears the problem may be the way gfs handles the PRIORITY flag.  

On nodeA, the bonnie++ process is blocked in gfs waiting to get the   
transaction lock (1,2) in the normal SHARED (PR) mode.  We see this  
request on the Waiting queue.  This seems normal.

Also on nodeA, the dmsetup process has called gfs_freeze_fs() which
acquires the transaction lock in DEFERRED (CW) mode.  GFS uses the
PRIORITY flag so that this request will bypass the existing PR request
from bonnie.  (PRIORITY translates to the dlm NOORDER flag.)  This CW
request isn't getting to the dlm, though; it's blocking in gfs,
waiting for the PR request from bonnie to complete.

Here's what the transaction lock looks like in gfs
5154 is bonnie++
5155 is dmsetup

Glock (1, 2)
  gl_flags = 1 2
  gl_count = 4
  gl_state = 0
  lvb_count = 0
  object = no
  aspace = no
  reclaim = no
  Request
    owner = 5154
    gh_state = 3
    gh_flags =  
    error = 0
    gh_iflags = 1 4
  Waiter3
    owner = 5155
    gh_state = 2
    gh_flags = 4 7 10
    error = 0
    gh_iflags = 1
  Waiter3
    owner = 5154
    gh_state = 3
    gh_flags =
    error = 0
    gh_iflags = 1 4

I don't see anything questionable outside of gfs at this point so
I'm at a bit of a dead end myself.  If I were to speculate, I'd
suppose gfs should either use the TRY flag, cancel the PR request,
or allow both PR and CW to go through to the dlm.


Comment 4 Christine Caulfield 2004-10-28 08:27:28 UTC
Assigning to Ken as this does seem to be in GFS.

I thought it was similar to #126952, but tests this morning show the
problem still there.

Comment 5 Kiersten (Kerri) Anderson 2004-11-16 19:04:29 UTC
Updating version to the right level in the defects.  Sorry for the storm.

Comment 6 Ken Preslan 2004-11-17 19:13:20 UTC
Fixed.


Comment 7 Derek Anderson 2005-02-02 20:00:42 UTC
I am seeing this again.  Situation: 3 node cluster:
ccsd/cman/dlm/clvmd/gfs.  All nodes mount a GFS on a CLVM.  Create a
new PV, vgextend the VG with the new PV, and attempt to lvextend the
now larger VG.

***No traffic needs to be running on the GFS to make this happen now.

#cluster_locking.c:412       Locking
KDquH96lS7NQdoqXIKkxsvtCWikbTaTDMhQyGt2Hzs6J2tyrJZ6xoo9W67mg66Hs at 0x1c
#cluster_locking.c:360   Error locking on node link-12: Command timed out
#cluster_locking.c:360   Error locking on node link-11: Command timed out
#lvresize.c:469   Failed to suspend LV1
#device/dev-io.c:405         Closed /dev/sda1
#device/dev-io.c:405         Closed /dev/sda2
#cluster_locking.c:412       Locking V_VG1 at 0x6


After the timeout you are hosed.  Can't run LVM commands or umount the
GFS (it hangs).  Have to reboot the nodes to recover the cluster.

The lvextend works fine if GFS is not mounted on top of it.

Comment 8 Kiersten (Kerri) Anderson 2005-02-09 16:12:21 UTC
Reassigning to Mike to have a look.

Comment 9 michael conrad tadpol tilstra 2005-02-10 21:17:26 UTC
Created attachment 110940 [details]
output from clvmd -d when compiled with -DDEBUG

mostly just me trying to figure out what is all happening. and making some sort
of record of it

Comment 10 michael conrad tadpol tilstra 2005-02-10 21:45:21 UTC
Created attachment 110941 [details]
like before, but no gfs this time

resize on an lv without mounting gfs.  Just for comparing.

Comment 11 michael conrad tadpol tilstra 2005-02-14 17:21:01 UTC
well, it looks like one of the ndoes that the command wasn't ran on is getting
hung up in do_lock_lv() in clvmd.  Just gotta figure out why, and why only with gfs.

Comment 12 michael conrad tadpol tilstra 2005-02-14 17:42:50 UTC
just for giggles, I loaded up clvmd on cman/dlm, then loaded gfs using gulm. 
Then tried the lvextend.  And it worked.  This leads me to think there is
something weird going on when you've got both clvm and gfs using dlm.  So, quick
to see how things work with all gulm, and lo, they work.  So I get the feeling
that this may actually be a dlm bug now.  But I'm gonna poke a bit more first.

Comment 13 michael conrad tadpol tilstra 2005-02-22 20:23:29 UTC
was playing with some extra prints to trace things about.  adding prints made
this maybe dissapear.  so where ever it is, its a race.  Also, from the prints,
doesn't seem that it really should matter which lock manager you use.  Just the
difference between the two show/hide the race.


Comment 14 michael conrad tadpol tilstra 2005-02-22 20:59:42 UTC
ignore comment #13


Comment 15 michael conrad tadpol tilstra 2005-02-22 21:59:28 UTC
gfs_tool freeze on multiple nodes with dlm deadlocks.

Comment 16 michael conrad tadpol tilstra 2005-02-22 22:00:24 UTC
forgot to say with load.

Comment 17 michael conrad tadpol tilstra 2005-02-22 22:07:16 UTC
taking back comment #16 , after mount, run gfs_tool freeze on three nodes at the
same time. (using broadcast or such.)  so no load.  just need to get multiple
freezes at the same time from multiple nodes it seems.

Comment 18 David Teigland 2005-02-23 04:27:46 UTC
- running make_panic -f3 on four nodes
- run gfs_tool freeze at once on all
- gfs_tool completes on nodeid 4, stuck on others

- dlm lock dump shows:

Resource c3e1ff50 (parent 00000000). Name (len=24) "       1         
     2"
Master Copy
Granted Queue
00e50333 CW Remote:   4 00c9012a
Conversion Queue
00d202f5 NL (CW) Remote:   6 029202f6  LQ: 0,0x3044
00f80139 NL (CW)                       LQ: 0,0x3044
00d802c1 NL (CW) Remote:   1 00e2035f  LQ: 0,0x3044
Waiting Queue

(0x3044 = DLM_LKF_CONVERT DLM_LKF_CONVDEADLK DLM_LKF_HEADQUE
DLM_LKF_NOORDER)

- gfs_tool unfreeze on nodeid 4 and the other 3 gfs_tools complete

- dlm lock dump shows:

Resource c3e1ff50 (parent 00000000). Name (len=24) "       1         
     2"
Master Copy
Granted Queue
00d802c1 CW Remote:   1 00e2035f
00f80139 CW
00d202f5 CW Remote:   6 029202f6
Conversion Queue
Waiting Queue
010b03a8 -- (PR) Remote:   4 00e90091  LQ: 0,0x0

- after gfs_tool unfreeze on the 3 frozen nodes make_panic resumes on all

obviously, there's some bug that's preventing all the CW requests
from being granted right away


Comment 19 David Teigland 2005-02-23 07:45:53 UTC
Changes by:     teigland 2005-02-23 07:39:11

Modified files:
        dlm-kernel/src : locking.c

When locks on the convert queue are granted, we need to try again to
grant locks from the beginning of the convert queue.  Locks at the
beginning may not be grantable because of locks at the end.  But,
granting locks at the end (permitted when using the NOORDER flag) may
make earlier locks grantable.  Specifically, we have the following
situation when running "gfs_tool freeze" in parallel on three machines:

   holder lock
   nodeid mode

Granted 1 PR
        2 PR
        3 PR

* nodeid 3 converts PR->CW

Granted 1 PR
        2 PR
Convert 3 PR->CW

* nodeid 2 converts PR->CW
  granted mode is demoted to avoid conversion deadlock

Granted 1 PR
Convert 2 NL->CW
        3 PR->CW

* nodeid 1 converts PR->CW
  granted mode is demoted to avoid conversion deadlock

Granted
Convert 1 NL->CW
        2 NL->CW
        3 PR->CW

* conversions for 1 and 2 are blocked by 3's PR
* conversion for 3 is granted

Granted 3 CW
Convert 1 NL->CW
        2 NL->CW

* other conversions are now grantable, we must try to grant them again


Comment 20 Derek Anderson 2005-03-08 16:33:46 UTC
Tried this again and it is not working for me.  In this test link-11
is doing the extend and link-12 is running the traffic.

[root@link-11 gfs1]# lvextend -l +11576 /dev/VG1/LV1
  Extending logical volume LV1 to 135.66 GB
  Error locking on node link-12: Command timed out
  Failed to suspend LV1
[root@link-11 gfs1]# echo $?
5

Node link-12 got tons of these on the console:
lock_dlm: cancel 1,2 flags 80

Comment 21 David Teigland 2005-03-09 08:57:33 UTC
If the lock being canceled is actually blocked in the dlm, lock_dlm
isn't doing anything about it, which is "the bug".  The fix is quite
simple.

However, fixing this means doing dlm cancelations which we've not
done before and that has uncovered a whole pile of cancel-related
bugs in the dlm.  They are fixed now, but the changes were so
significant to all the unlock code that I'm doing a fair amount of
testing before checking it in.

(Cancels usually happen in the context of recovery and in that case
they are usually resolved within lock_dlm itself.  This explains
why we've not seen these problems very often if at all.  The
lvextend causes a bunch of cancelations with no recovery involved.)


Comment 22 David Teigland 2005-03-10 07:15:59 UTC
Seems to work for me now, and the fix wasn't quite so major.
Requires an update to lock_dlm and dlm.


Comment 23 Henry Harris 2006-05-30 18:27:06 UTC
We are sometimes seeing this same problem with the lvextend command timing out 
while traffic is running to the gfs file system.  We are running dlm-1.0.0-5.  
Is the fix mentioned in comment #22 in this version?  

Comment 24 Lenny Maiorani 2006-06-05 19:49:46 UTC
Update to Henry's report...

The problem we are seeing is that a timeout occurs. clvmd times out after 60
seconds. When this was increased to 600 seconds the timeout no longer occurred.

Comment 25 David Teigland 2006-06-05 20:20:56 UTC
lvextend is such a high level command, there are any number of places
the code could be getting stuck.  The history above shows at least
three different bugs that have led to lvextend hangs, for different
reasons, and been fixed.  This is most likely another one.

To figure this out we should get a backtrace on threads and a dump of
dlm locks from all the nodes (or at least on the node with the hung
command to start with).  We should dump dlm locks from all lockspaces.
To dump locks in a lockspace you:
- echo "lockspace name" >> /proc/cluster/dlm_locks
- cat /proc/cluster/dlm_locks > dlm_locks_lsname.txt


Comment 26 Lenny Maiorani 2006-06-05 20:26:29 UTC
Actually, I do not think it is hanging anymore. It just is taking longer than 1
minute. After changing to a 10 minute timeout, this timeout didn't occur. I
think when running high traffic rates it is just taking a long time to to write
the outstanding IOs. 

This appears fixed to me.

Comment 27 Lon Hohberger 2010-01-04 19:01:51 UTC
This was apparently fixed around RHCS4 Update 4 but this bug was never closed.