429092 – GFS: generic_file_write_nolock() called with opened journal transaction

Bug 429092 - GFS: generic_file_write_nolock() called with opened journal transaction

Summary: GFS: generic_file_write_nolock() called with opened journal transaction

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	gfs-kmod
Sub Component:
Version:	5.0
Hardware:	All
OS:	Linux
Priority:	low
Severity:	high
Target Milestone:	---
Target Release:	5.5
Assignee:	Robert Peterson
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2008-01-17 08:10 UTC by Monakhov Dmitriy
Modified:	2010-01-12 03:28 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2009-09-30 14:33:52 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Monakhov Dmitriy 2008-01-17 08:10:40 UTC

Description of problem:
Gfs writing codepath calling generic_file_write_nolock() with opened journal
transaction. This is strongly prohibited actions because most probably result in
VM related actions such as balance_dirty_pages_ratelimited_nr which result in 
switching between different fs superblocks while current->journal != NULL.
Sooner or later this result in BUG() triggering or EIO and other bad things.
In my case i've got EIO from ext3_write_inode() and droping DIRTY inode flag
with following calltrace:
 Call Trace:
  [<ffffffff880521b6>] :ext3:ext3_write_inode+0x2b/0x47
  [<ffffffff80030db2>] __writeback_single_inode+0x1e3/0x322
  [<ffffffff80023af4>] mempool_alloc+0x3a/0xf0
  [<ffffffff80021610>] sync_sb_inodes+0x1a9/0x267
  [<ffffffff800514ba>] writeback_inodes+0x82/0xd8
  [<ffffffff800c85dc>] balance_dirty_pages_ratelimited_nr+0x11a/0x1fa
  [<ffffffff8000fa2f>] generic_file_buffered_write+0x56c/0x6a0
  [<ffffffff80016118>] __generic_file_aio_write_nolock+0x36c/0x3b8
  [<ffffffff883f4aae>] :gfs:gfs_dreread+0x87/0xc7
  [<ffffffff883f4b26>] :gfs:gfs_dread+0x38/0x57
  [<ffffffff800c5822>] generic_file_aio_write_nolock+0x20/0x6c

  [<ffffffff800c5bee>] generic_file_write_nolock+0x8f/0xa8
  [<ffffffff8009b42b>] autoremove_wake_function+0x0/0x2e
  [<ffffffff8841c890>] :gfs:gfs_trans_begin_i+0x13c/0x1b2
  [<ffffffff8840f769>] :gfs:do_write_buf+0x455/0x690
  [<ffffffff8840ef68>] :gfs:walk_vm+0x110/0x317
  [<ffffffff8840f314>] :gfs:do_write_buf+0x0/0x690
  [<ffffffff8002fea8>] __clear_user+0x12/0x50
  [<ffffffff8840f21b>] :gfs:__gfs_write+0xac/0xc6
  [<ffffffff800164fb>] vfs_write+0xad/0x153
  [<ffffffff80016b98>] sys_write+0x49/0xbf
  [<ffffffff8005e166>] system_call+0x7e/0x83


testcase: in fact test case is fantastically trivial, i'm wondering why nobody
catch it before. write big file to gfs fs (file_size >= 2 * ram_size )
dd if=/dev/zero of=/mnt_gfs bs=1M count=10240

buggy place:
do_do_write_buf
...
   if (alloc_required) {
...      gfs_trans_begin(..)
   }
         struct iovec local_iov = { .iov_base = buf, .iov_len = size };

                if (!iocb) {
                        count = generic_file_write_nolock(file, &local_iov, 1,
offset);
<<                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 
 << generic_file_xxx_write_nolock called with runing transaction(curent->journal
!= NULL). this is absolutaly prohobited.  
            } else {
                        count = generic_file_aio_write_nolock(iocb,
                                &local_iov, 1, offset);
                        if (count == -EIOCBQUEUED)
                                count = wait_on_sync_kiocb(iocb);
                }
}

Comment 3 Steve Whitehouse 2008-02-06 11:47:34 UTC

Is this really RHEL 4? balance_dirty_pages_ratelimited_nr doesn't appear to
exist in RHEL4, but only in RHEL-5. Can you confirm the kernel version?

Comment 4 Monakhov Dmitriy 2008-02-06 11:53:44 UTC

yes it was RHEL5

Comment 6 Robert Peterson 2008-02-07 15:19:46 UTC

Well, gfs1 isn't part of the base kernel, so there should be no need
to post the change to rhkernel-list.  So no resistance.

Comment 8 Robert Peterson 2008-02-07 19:55:47 UTC

Dmitriy, I tried the dd command you recommended to recreate the problem,
but GFS did not fail.  We can see the problem and we're researching the
best way to fix it.  IMHO, there is not a "good" way to fix it because of
the lock ordering required by GFS.  So we are trying to find a viable
alternative that will make it work properly.  By the way, GFS2 does not
have this problem.

Comment 9 Monakhov Dmitriy 2008-02-08 05:57:04 UTC

>We can see the problem and we're researching the best way to fix it.
As a temporary solution gfs_writei() may be used for all cases. For example 
following trivial patch has fixed my issue, but with significant performance
drawback, because gfs_writei() optimized for journaled mode.

--- ./fs/gfs/ops_file.c~orig    2008-02-08 08:48:19.000000000 +0300
+++ ./fs/gfs/ops_file.c 2008-02-08 08:49:07.000000000 +0300
@@ -851,7 +851,7 @@ do_do_write_buf(struct file *file, char 
                brelse(dibh);
        }
 
-       if (journaled ||
+       if (1 || journaled || /* always use this branch */
            (gfs_is_stuffed(ip) && !test_bit(GIF_PAGED, &ip->i_flags) &&
             *offset + size <= sdp->sd_sb.sb_bsize - sizeof(struct gfs_dinode))) {
 
>By the way, GFS2 does no thave this problem
Yes, i do know about this, but unfortunately IMHO currently GFS2 is not ready
for production.

Comment 13 Robert Peterson 2009-09-30 14:33:52 UTC

No one has complained about this problem and no one has been able
to recreate it for a very long time.  Changing the code to fix it
would only put customers at risk.  If the problem is recreatable,
we can re-address the issue at that time.  I'm closing as WONTFIX.

Note You need to log in before you can comment on or make changes to this bug.