Bug 429092 - GFS: generic_file_write_nolock() called with opened journal transaction
GFS: generic_file_write_nolock() called with opened journal transaction
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: gfs-kmod (Show other bugs)
All Linux
low Severity high
: ---
: 5.5
Assigned To: Robert Peterson
Cluster QE
Depends On:
  Show dependency treegraph
Reported: 2008-01-17 03:10 EST by Monakhov Dmitriy
Modified: 2010-01-11 22:28 EST (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2009-09-30 10:33:52 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Monakhov Dmitriy 2008-01-17 03:10:40 EST
Description of problem:
Gfs writing codepath calling generic_file_write_nolock() with opened journal
transaction. This is strongly prohibited actions because most probably result in
VM related actions such as balance_dirty_pages_ratelimited_nr which result in 
switching between different fs superblocks while current->journal != NULL.
Sooner or later this result in BUG() triggering or EIO and other bad things.
In my case i've got EIO from ext3_write_inode() and droping DIRTY inode flag
with following calltrace:
 Call Trace:
  [<ffffffff880521b6>] :ext3:ext3_write_inode+0x2b/0x47
  [<ffffffff80030db2>] __writeback_single_inode+0x1e3/0x322
  [<ffffffff80023af4>] mempool_alloc+0x3a/0xf0
  [<ffffffff80021610>] sync_sb_inodes+0x1a9/0x267
  [<ffffffff800514ba>] writeback_inodes+0x82/0xd8
  [<ffffffff800c85dc>] balance_dirty_pages_ratelimited_nr+0x11a/0x1fa
  [<ffffffff8000fa2f>] generic_file_buffered_write+0x56c/0x6a0
  [<ffffffff80016118>] __generic_file_aio_write_nolock+0x36c/0x3b8
  [<ffffffff883f4aae>] :gfs:gfs_dreread+0x87/0xc7
  [<ffffffff883f4b26>] :gfs:gfs_dread+0x38/0x57
  [<ffffffff800c5822>] generic_file_aio_write_nolock+0x20/0x6c

  [<ffffffff800c5bee>] generic_file_write_nolock+0x8f/0xa8
  [<ffffffff8009b42b>] autoremove_wake_function+0x0/0x2e
  [<ffffffff8841c890>] :gfs:gfs_trans_begin_i+0x13c/0x1b2
  [<ffffffff8840f769>] :gfs:do_write_buf+0x455/0x690
  [<ffffffff8840ef68>] :gfs:walk_vm+0x110/0x317
  [<ffffffff8840f314>] :gfs:do_write_buf+0x0/0x690
  [<ffffffff8002fea8>] __clear_user+0x12/0x50
  [<ffffffff8840f21b>] :gfs:__gfs_write+0xac/0xc6
  [<ffffffff800164fb>] vfs_write+0xad/0x153
  [<ffffffff80016b98>] sys_write+0x49/0xbf
  [<ffffffff8005e166>] system_call+0x7e/0x83

testcase: in fact test case is fantastically trivial, i'm wondering why nobody
catch it before. write big file to gfs fs (file_size >= 2 * ram_size )
dd if=/dev/zero of=/mnt_gfs bs=1M count=10240

buggy place:
   if (alloc_required) {
...      gfs_trans_begin(..)
         struct iovec local_iov = { .iov_base = buf, .iov_len = size };

                if (!iocb) {
                        count = generic_file_write_nolock(file, &local_iov, 1,
<<                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 
 << generic_file_xxx_write_nolock called with runing transaction(curent->journal
!= NULL). this is absolutaly prohobited.  
            } else {
                        count = generic_file_aio_write_nolock(iocb,
                                &local_iov, 1, offset);
                        if (count == -EIOCBQUEUED)
                                count = wait_on_sync_kiocb(iocb);
Comment 3 Steve Whitehouse 2008-02-06 06:47:34 EST
Is this really RHEL 4? balance_dirty_pages_ratelimited_nr doesn't appear to
exist in RHEL4, but only in RHEL-5. Can you confirm the kernel version?
Comment 4 Monakhov Dmitriy 2008-02-06 06:53:44 EST
yes it was RHEL5
Comment 6 Robert Peterson 2008-02-07 10:19:46 EST
Well, gfs1 isn't part of the base kernel, so there should be no need
to post the change to rhkernel-list.  So no resistance.
Comment 8 Robert Peterson 2008-02-07 14:55:47 EST
Dmitriy, I tried the dd command you recommended to recreate the problem,
but GFS did not fail.  We can see the problem and we're researching the
best way to fix it.  IMHO, there is not a "good" way to fix it because of
the lock ordering required by GFS.  So we are trying to find a viable
alternative that will make it work properly.  By the way, GFS2 does not
have this problem.
Comment 9 Monakhov Dmitriy 2008-02-08 00:57:04 EST
>We can see the problem and we're researching the best way to fix it.
As a temporary solution gfs_writei() may be used for all cases. For example 
following trivial patch has fixed my issue, but with significant performance
drawback, because gfs_writei() optimized for journaled mode.

--- ./fs/gfs/ops_file.c~orig    2008-02-08 08:48:19.000000000 +0300
+++ ./fs/gfs/ops_file.c 2008-02-08 08:49:07.000000000 +0300
@@ -851,7 +851,7 @@ do_do_write_buf(struct file *file, char 
-       if (journaled ||
+       if (1 || journaled || /* always use this branch */
            (gfs_is_stuffed(ip) && !test_bit(GIF_PAGED, &ip->i_flags) &&
             *offset + size <= sdp->sd_sb.sb_bsize - sizeof(struct gfs_dinode))) {
>By the way, GFS2 does no thave this problem
Yes, i do know about this, but unfortunately IMHO currently GFS2 is not ready
for production. 
Comment 13 Robert Peterson 2009-09-30 10:33:52 EDT
No one has complained about this problem and no one has been able
to recreate it for a very long time.  Changing the code to fix it
would only put customers at risk.  If the problem is recreatable,
we can re-address the issue at that time.  I'm closing as WONTFIX.

Note You need to log in before you can comment on or make changes to this bug.