Red Hat Bugzilla – Bug 253768
GFS2: deadlock on distributed mmap test case
Last modified: 2007-11-30 17:12:13 EST
Description of problem:
This is the upstream bug that I see when I try to test for bz 248480.
When I run the QA tests, the gfs2 filesystem instantly locks up. Unlike 248480,
where the nodes were livelocked because of a lock ping-ponging back and forth,
This is a hard deadlock. In my tests, There was one glock that everyone was
waiting on, and the process that had the glock seemed to be stuck in io_schedule
This is the process that is holding the glock.
id_doio D f377dc88 2596 2914 2910
f377dc9c 00000086 00000002 f377dc88 f377dc80 00000000 f377d000 00000001
00000000 f3106cd0 f3106e7c c2019080 00000001 f322b200 f7fc706c c04d5cb6
f7fc706c c04d6ae8 0002d314 c04d75d8 c043b3e0 ffffffff 00000000 00000000
[<f8c5b837>] gfs2_writepages+0x0/0x38 [gfs2]
[<f8c54cd7>] inode_go_sync+0x44/0xbe [gfs2]
[<f8c53948>] gfs2_glock_xmote_th+0x2a/0x15c [gfs2]
[<f8c54589>] gfs2_glmutex_lock+0x9c/0xa3 [gfs2]
[<f8c53b49>] run_queue+0xcf/0x249 [gfs2]
[<f8c54601>] gfs2_glock_dq+0x71/0x7b [gfs2]
[<f8c54715>] gfs2_glock_dq_uninit+0x8/0x10 [gfs2]
[<f8c60ae6>] gfs2_sharewrite_fault+0x29a/0x2a6 [gfs2]
[<f8c60880>] gfs2_sharewrite_fault+0x34/0x2a6 [gfs2]
Trying to do IO directly to the block device that GFS2 is running on also hangs
on the node with the process stuck in io_schedule. IO to the block device works
fine from the other nodes in the cluster, who are simply waiting on the glock.
Version-Release number of selected component (if applicable):
The lastest code from gfs2-2.6-nmw, as 2007-08-21 12:00 CDT
Steps to Reproduce:
1. setup a cluster on three machines with one GFS2 filesystem
2. Create the following dd_io test file
[root@cypher-07 ~]# cat /usr/tests/sts-rhel5.1/gfs/lib/dd_io/248480.h2.m4
dnl --- Scenario Metadata ---
dnl DESC=Test for 248480
<cmd>d_iogen -b -S RANDSEED -I SESSION_ID -R RESOURCE_FILE -i RUN_TIME
-m sequential -s mmread,mmwrite,readv,writev,read,write,pread,pwrite -t MINTRANS
-T MAXTRANS -F FILESIZE:mmap1 </cmd>
3. Run the QA test. Here is what I run on my setup:
# /usr/tests/sts-rhel5.1/gfs/bin/dd_io -m /mnt/test1 -R /root/hedge-123.xml -S
248480 -l /usr/tests/sts-rhel5.1/ -r /usr/tests/sts-rhel5.1
All the test processes lockup
The test runs to completion
Created attachment 162049 [details]
Attempt to solve the bug
The stack trace posts what I think is a pretty clear picture of whats going on.
The run_queue() has tried to demote the lock and push out the pages, but since
its a writable mapping and a write has occurred, its got to write out the page,
so it tried to lock it, but since we are in a page fault, its already locked by
the higher layers.
My solution to this is to move the run_queue() call from gfs2_glock_dq()
directly to a workqueue. In fact its my eventual aim to move _all_ run_queue
calls to the work queue to avoid issues just like this. We have to be a bit
careful with the delay that we choose in order not to upset the very careful
balance we've previously established to fix the original bug, but again, I
think this will work well in that case.
If I'm right about the cause, then its something that will affect RHEL 5.1 as
well, so we ought to try and get it fixed now I think.
Created attachment 164161 [details]
Revised patch, that fixes some bugs in the previous version.
when the glock workqueue finishes it's work on the glock, it drops the
reference count. However gfs2_glock_dq() didn't ever grab a reference to the
glock before it scheduled the work. This was causing the glock's reference
count to reach zero
while it was still in use, and caused panics on mount. This version of the
patch adds the grabs a reference before it queues the work in gfs2_glock_dq()
The bug still exists with the patch. It looks like the same run_queue issue,
but this one is in gfs2_glock_nq. Here is the call trace of the process with the
d_doio D f7d52800 2076 2906 2903
f52e7b14 00000082 00000000 f7d52800 00000000 f7d52800 f52e7000 ea7e195a
0000003f f5c787c0 f5c7896c c2010080 00000000 f5c6d040 06000000 c04d5d03
c23d406c c04d6b2c f52e7b48 0001ea25 00000000 c20fdc3c 0006101a c20fdc3c
[<f8c8c923>] gfs2_writepages+0x0/0x38 [gfs2]
[<f8c85dfb>] inode_go_sync+0x44/0xbe [gfs2]
[<f8c849ba>] gfs2_glock_drop_th+0x1c/0x111 [gfs2]
[<f8c84f4a>] run_queue+0xbf/0x249 [gfs2]
[<f8c8541f>] gfs2_glock_nq+0x154/0x19a [gfs2]
[<f8c865b1>] gfs2_glock_nq_atime+0x106/0x2ec [gfs2]
[<f8c8c9ab>] gfs2_prepare_write+0x50/0x23b [gfs2]
This is actually a different bug, although it looks similar. This can only
happen in the upstream code as its the page lock/glock bug which we fixed ages
ago in RHEL, but for which the upstream fix is in Nick Piggin's patch set. That
patch set should have been pushed to Linus at his last merge window, but its
still pending since Nick decided not to push it due to there being lots of other
VM changes at the time.
So I think we are probably safe to push the patch in its current form to
upstream now as well as RHEL.
I guess we can close this, or mark as a dup of the other bz?
*** This bug has been marked as a duplicate of 248480 ***