Created attachment 120268 [details]
The patch to fix this issue - gfs_i_sem.patch.
Description of problem:
When there are multiple processes read and/or write to *the same file* on *the
SMP same node* under direct IO, they could end up deadlocking each other with
the following thread back trace:
#0 [1002a1edb18] schedule at ffffffff8030332e
#1 [1002a1edbf0] wait_for_completion at ffffffff803034ff
#2 [1002a1edc70] glock_wait_internal at ffffffffa02e34a5
#3 [1002a1edcb0] gfs_glock_nq at ffffffffa02e3cd2
#4 [1002a1edcf0] do_write_direct at ffffffffa02f7310
#5 [1002a1edd80] avc_has_perm at ffffffff801ce330
#6 [1002a1eddb0] write_chan at ffffffff80228b97
#7 [1002a1ede00] walk_vm at ffffffffa02f6cca
#8 [1002a1eded0] gfs_write at ffffffffa02f7deb
#9 [1002a1edf10] vfs_write at ffffffff80177098
#10 [1002a1edf40] sys_pwrite64 at ffffffff80177275
#11 [1002a1edf80] system_call at ffffffff80110052
#0 [1001d8bf988] schedule at ffffffff8030332e
#1 [1001d8bfa60] __sched_text_start at ffffffff80302637
#2 [1001d8bfac0] __down_failed at ffffffff80303c13
#3 [1001d8bfb10] .text.lock.direct_io at ffffffff80198222
#4 [1001d8bfbb0] gfs_direct_IO at ffffffffa02f6093
#5 [1001d8bfc30] generic_file_direct_IO at ffffffff8015a12e
#6 [1001d8bfc70] __generic_file_aio_read at ffffffff8015aa69
#7 [1001d8bfce0] generic_file_read at ffffffff8015acbe
#8 [1001d8bfd60] glock_wait_internal at ffffffffa02e360d
#9 [1001d8bfda0] gfs_glock_nq at ffffffffa02e3cd2
#10 [1001d8bfde0] do_read_direct at ffffffffa02f7092
#11 [1001d8bfe40] walk_vm at ffffffffa02f6cca
#12 [1001d8bff10] vfs_read at ffffffff80176ebb
#13 [1001d8bff40] sys_pread64 at ffffffff801771ff
#14 [1001d8bff80] system_call at ffffffff80110052
Note that my test machine is x86_64 but this is a platform independent problem.
Version-Release number of selected component (if applicable):
Will upload the test case (originally written by Stephen Tweedie -
firstname.lastname@example.org) with trivial modifications to run on GFS.
Steps to Reproduce:
1. Compile the test program (make)
2. On one GFS node (SMP), run
shell> ./verify-data -w file-name-on-gfs-file-system
3. On another GFS node (SMP), run
shell> ./verify-data -r same-file-as-in-step-2.
4. When the output frozen, using "crash" command to check what the threads.
5. At this point on, any access to the same file will hang forever, includeing
the above two processes. The processes are un-killable and filesytem can't
umount until reboot.
The deadlock is caused by:
1. Writer has obtained VFS layer's i_sem(aphore), then tries to get the
exclusive gfs glock.
2. Reader has obtained gfs shared glock and passes its control to blockdev layer.
3. Block device layer prepares the direct IO under reader's context that tries
to obtain the i_sem(aphore).
4. Deadlock while writer waits for exclusive gfs glock and reader waits for i_sem.
I had been "fixing" this problem from reader side that:
1. Ask GFS's do_read_direct() to grab/release i_sem before glock (GFS-kernel).
2. Tell __blockdev_direct_IO to bypass all the i_sem grabbing/releasing code
(2.6.9-22.ELsmp base kernel).
This brought in un-necessary complications (such as performance hits and/or
packaging issues since we had to change base kernel). It just occurred to me
today that this could be easily fixed in GFS writer code as the uploaded patch.
After the testing, I'm pretty confident that this can be shipped together with
bz 169154 without any base kernel complication.
Code in CVS already.
Note that this is read-write deadlock, differing from write-truncate deadlock.
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.