Bug 133368
Summary: | lvmextend timing out on GFS-mounted volume | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Retired] Red Hat Cluster Suite | Reporter: | Derek Anderson <danderso> | ||||||
Component: | dlm | Assignee: | David Teigland <teigland> | ||||||
Status: | CLOSED ERRATA | QA Contact: | GFS Bugs <gfs-bugs> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 4 | CC: | bmarzins, bstevens, cluster-maint, kpreslan, lenny, teigland | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2010-01-04 19:01:51 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 144795 | ||||||||
Attachments: |
|
Description
Derek Anderson
2004-09-23 15:43:30 UTC
My guess is that it's taking a long time to flush all those outstanding IOs to the GFS volume before it can suspend. You'd like me to up the timeout (from 1 minute?) Actually, try the operation with clvmd -t600 (or so) to increase the timeout. On further examination it seems that this really is a locking deadlock of some kind. Possibly interesting extract from /proc/cluster/dlm_locks: Resource c6f185a8 (parent 00000000). Name (len=24) " 1 2" Master Copy Granted Queue 000100bf CW Remote: 2 00020164 00010292 CW Remote: 9 00010057 000301b7 CW Remote: 11 00020083 00020304 CW Remote: 10 0001025a 00010166 CW Remote: 12 00010139 000103b8 CW Remote: 4 00010254 Conversion Queue 00010082 NL (CW) Remote: 3 0001004a LQ: 0 00010071 NL (CW) Remote: 5 00010386 LQ: 0 0001010a NL (CW) Remote: 6 00010342 LQ: 0 000103a2 NL (CW) Remote: 1 000101ca LQ: 0 00010150 NL (CW) Remote: 7 0001027d LQ: 0 Waiting Queue 0003012d -- (PR) LQ: 0 From Patrick - Easy way to reproduce: nodeA: run bonnie++ on GFS filesystem nodeB: dmsetup suspend vg-gfs nodeA: dmsetup suspend vg-gfs deadlock It appears the problem may be the way gfs handles the PRIORITY flag. On nodeA, the bonnie++ process is blocked in gfs waiting to get the transaction lock (1,2) in the normal SHARED (PR) mode. We see this request on the Waiting queue. This seems normal. Also on nodeA, the dmsetup process has called gfs_freeze_fs() which acquires the transaction lock in DEFERRED (CW) mode. GFS uses the PRIORITY flag so that this request will bypass the existing PR request from bonnie. (PRIORITY translates to the dlm NOORDER flag.) This CW request isn't getting to the dlm, though; it's blocking in gfs, waiting for the PR request from bonnie to complete. Here's what the transaction lock looks like in gfs 5154 is bonnie++ 5155 is dmsetup Glock (1, 2) gl_flags = 1 2 gl_count = 4 gl_state = 0 lvb_count = 0 object = no aspace = no reclaim = no Request owner = 5154 gh_state = 3 gh_flags = error = 0 gh_iflags = 1 4 Waiter3 owner = 5155 gh_state = 2 gh_flags = 4 7 10 error = 0 gh_iflags = 1 Waiter3 owner = 5154 gh_state = 3 gh_flags = error = 0 gh_iflags = 1 4 I don't see anything questionable outside of gfs at this point so I'm at a bit of a dead end myself. If I were to speculate, I'd suppose gfs should either use the TRY flag, cancel the PR request, or allow both PR and CW to go through to the dlm. Assigning to Ken as this does seem to be in GFS. I thought it was similar to #126952, but tests this morning show the problem still there. Updating version to the right level in the defects. Sorry for the storm. Fixed. I am seeing this again. Situation: 3 node cluster: ccsd/cman/dlm/clvmd/gfs. All nodes mount a GFS on a CLVM. Create a new PV, vgextend the VG with the new PV, and attempt to lvextend the now larger VG. ***No traffic needs to be running on the GFS to make this happen now. #cluster_locking.c:412 Locking KDquH96lS7NQdoqXIKkxsvtCWikbTaTDMhQyGt2Hzs6J2tyrJZ6xoo9W67mg66Hs at 0x1c #cluster_locking.c:360 Error locking on node link-12: Command timed out #cluster_locking.c:360 Error locking on node link-11: Command timed out #lvresize.c:469 Failed to suspend LV1 #device/dev-io.c:405 Closed /dev/sda1 #device/dev-io.c:405 Closed /dev/sda2 #cluster_locking.c:412 Locking V_VG1 at 0x6 After the timeout you are hosed. Can't run LVM commands or umount the GFS (it hangs). Have to reboot the nodes to recover the cluster. The lvextend works fine if GFS is not mounted on top of it. Reassigning to Mike to have a look. Created attachment 110940 [details]
output from clvmd -d when compiled with -DDEBUG
mostly just me trying to figure out what is all happening. and making some sort
of record of it
Created attachment 110941 [details]
like before, but no gfs this time
resize on an lv without mounting gfs. Just for comparing.
well, it looks like one of the ndoes that the command wasn't ran on is getting hung up in do_lock_lv() in clvmd. Just gotta figure out why, and why only with gfs. just for giggles, I loaded up clvmd on cman/dlm, then loaded gfs using gulm. Then tried the lvextend. And it worked. This leads me to think there is something weird going on when you've got both clvm and gfs using dlm. So, quick to see how things work with all gulm, and lo, they work. So I get the feeling that this may actually be a dlm bug now. But I'm gonna poke a bit more first. was playing with some extra prints to trace things about. adding prints made this maybe dissapear. so where ever it is, its a race. Also, from the prints, doesn't seem that it really should matter which lock manager you use. Just the difference between the two show/hide the race. ignore comment #13 gfs_tool freeze on multiple nodes with dlm deadlocks. forgot to say with load. taking back comment #16 , after mount, run gfs_tool freeze on three nodes at the same time. (using broadcast or such.) so no load. just need to get multiple freezes at the same time from multiple nodes it seems. - running make_panic -f3 on four nodes - run gfs_tool freeze at once on all - gfs_tool completes on nodeid 4, stuck on others - dlm lock dump shows: Resource c3e1ff50 (parent 00000000). Name (len=24) " 1 2" Master Copy Granted Queue 00e50333 CW Remote: 4 00c9012a Conversion Queue 00d202f5 NL (CW) Remote: 6 029202f6 LQ: 0,0x3044 00f80139 NL (CW) LQ: 0,0x3044 00d802c1 NL (CW) Remote: 1 00e2035f LQ: 0,0x3044 Waiting Queue (0x3044 = DLM_LKF_CONVERT DLM_LKF_CONVDEADLK DLM_LKF_HEADQUE DLM_LKF_NOORDER) - gfs_tool unfreeze on nodeid 4 and the other 3 gfs_tools complete - dlm lock dump shows: Resource c3e1ff50 (parent 00000000). Name (len=24) " 1 2" Master Copy Granted Queue 00d802c1 CW Remote: 1 00e2035f 00f80139 CW 00d202f5 CW Remote: 6 029202f6 Conversion Queue Waiting Queue 010b03a8 -- (PR) Remote: 4 00e90091 LQ: 0,0x0 - after gfs_tool unfreeze on the 3 frozen nodes make_panic resumes on all obviously, there's some bug that's preventing all the CW requests from being granted right away Changes by: teigland 2005-02-23 07:39:11 Modified files: dlm-kernel/src : locking.c When locks on the convert queue are granted, we need to try again to grant locks from the beginning of the convert queue. Locks at the beginning may not be grantable because of locks at the end. But, granting locks at the end (permitted when using the NOORDER flag) may make earlier locks grantable. Specifically, we have the following situation when running "gfs_tool freeze" in parallel on three machines: holder lock nodeid mode Granted 1 PR 2 PR 3 PR * nodeid 3 converts PR->CW Granted 1 PR 2 PR Convert 3 PR->CW * nodeid 2 converts PR->CW granted mode is demoted to avoid conversion deadlock Granted 1 PR Convert 2 NL->CW 3 PR->CW * nodeid 1 converts PR->CW granted mode is demoted to avoid conversion deadlock Granted Convert 1 NL->CW 2 NL->CW 3 PR->CW * conversions for 1 and 2 are blocked by 3's PR * conversion for 3 is granted Granted 3 CW Convert 1 NL->CW 2 NL->CW * other conversions are now grantable, we must try to grant them again Tried this again and it is not working for me. In this test link-11 is doing the extend and link-12 is running the traffic. [root@link-11 gfs1]# lvextend -l +11576 /dev/VG1/LV1 Extending logical volume LV1 to 135.66 GB Error locking on node link-12: Command timed out Failed to suspend LV1 [root@link-11 gfs1]# echo $? 5 Node link-12 got tons of these on the console: lock_dlm: cancel 1,2 flags 80 If the lock being canceled is actually blocked in the dlm, lock_dlm isn't doing anything about it, which is "the bug". The fix is quite simple. However, fixing this means doing dlm cancelations which we've not done before and that has uncovered a whole pile of cancel-related bugs in the dlm. They are fixed now, but the changes were so significant to all the unlock code that I'm doing a fair amount of testing before checking it in. (Cancels usually happen in the context of recovery and in that case they are usually resolved within lock_dlm itself. This explains why we've not seen these problems very often if at all. The lvextend causes a bunch of cancelations with no recovery involved.) Seems to work for me now, and the fix wasn't quite so major. Requires an update to lock_dlm and dlm. We are sometimes seeing this same problem with the lvextend command timing out while traffic is running to the gfs file system. We are running dlm-1.0.0-5. Is the fix mentioned in comment #22 in this version? Update to Henry's report... The problem we are seeing is that a timeout occurs. clvmd times out after 60 seconds. When this was increased to 600 seconds the timeout no longer occurred. lvextend is such a high level command, there are any number of places the code could be getting stuck. The history above shows at least three different bugs that have led to lvextend hangs, for different reasons, and been fixed. This is most likely another one. To figure this out we should get a backtrace on threads and a dump of dlm locks from all the nodes (or at least on the node with the hung command to start with). We should dump dlm locks from all lockspaces. To dump locks in a lockspace you: - echo "lockspace name" >> /proc/cluster/dlm_locks - cat /proc/cluster/dlm_locks > dlm_locks_lsname.txt Actually, I do not think it is hanging anymore. It just is taking longer than 1 minute. After changing to a 10 minute timeout, this timeout didn't occur. I think when running high traffic rates it is just taking a long time to to write the outstanding IOs. This appears fixed to me. This was apparently fixed around RHCS4 Update 4 but this bug was never closed. |