Description of problem: Gfs writing codepath calling generic_file_write_nolock() with opened journal transaction. This is strongly prohibited actions because most probably result in VM related actions such as balance_dirty_pages_ratelimited_nr which result in switching between different fs superblocks while current->journal != NULL. Sooner or later this result in BUG() triggering or EIO and other bad things. In my case i've got EIO from ext3_write_inode() and droping DIRTY inode flag with following calltrace: Call Trace: [<ffffffff880521b6>] :ext3:ext3_write_inode+0x2b/0x47 [<ffffffff80030db2>] __writeback_single_inode+0x1e3/0x322 [<ffffffff80023af4>] mempool_alloc+0x3a/0xf0 [<ffffffff80021610>] sync_sb_inodes+0x1a9/0x267 [<ffffffff800514ba>] writeback_inodes+0x82/0xd8 [<ffffffff800c85dc>] balance_dirty_pages_ratelimited_nr+0x11a/0x1fa [<ffffffff8000fa2f>] generic_file_buffered_write+0x56c/0x6a0 [<ffffffff80016118>] __generic_file_aio_write_nolock+0x36c/0x3b8 [<ffffffff883f4aae>] :gfs:gfs_dreread+0x87/0xc7 [<ffffffff883f4b26>] :gfs:gfs_dread+0x38/0x57 [<ffffffff800c5822>] generic_file_aio_write_nolock+0x20/0x6c [<ffffffff800c5bee>] generic_file_write_nolock+0x8f/0xa8 [<ffffffff8009b42b>] autoremove_wake_function+0x0/0x2e [<ffffffff8841c890>] :gfs:gfs_trans_begin_i+0x13c/0x1b2 [<ffffffff8840f769>] :gfs:do_write_buf+0x455/0x690 [<ffffffff8840ef68>] :gfs:walk_vm+0x110/0x317 [<ffffffff8840f314>] :gfs:do_write_buf+0x0/0x690 [<ffffffff8002fea8>] __clear_user+0x12/0x50 [<ffffffff8840f21b>] :gfs:__gfs_write+0xac/0xc6 [<ffffffff800164fb>] vfs_write+0xad/0x153 [<ffffffff80016b98>] sys_write+0x49/0xbf [<ffffffff8005e166>] system_call+0x7e/0x83 testcase: in fact test case is fantastically trivial, i'm wondering why nobody catch it before. write big file to gfs fs (file_size >= 2 * ram_size ) dd if=/dev/zero of=/mnt_gfs bs=1M count=10240 buggy place: do_do_write_buf ... if (alloc_required) { ... gfs_trans_begin(..) } struct iovec local_iov = { .iov_base = buf, .iov_len = size }; if (!iocb) { count = generic_file_write_nolock(file, &local_iov, 1, offset); << ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ << generic_file_xxx_write_nolock called with runing transaction(curent->journal != NULL). this is absolutaly prohobited. } else { count = generic_file_aio_write_nolock(iocb, &local_iov, 1, offset); if (count == -EIOCBQUEUED) count = wait_on_sync_kiocb(iocb); } }
Is this really RHEL 4? balance_dirty_pages_ratelimited_nr doesn't appear to exist in RHEL4, but only in RHEL-5. Can you confirm the kernel version?
yes it was RHEL5
Well, gfs1 isn't part of the base kernel, so there should be no need to post the change to rhkernel-list. So no resistance.
Dmitriy, I tried the dd command you recommended to recreate the problem, but GFS did not fail. We can see the problem and we're researching the best way to fix it. IMHO, there is not a "good" way to fix it because of the lock ordering required by GFS. So we are trying to find a viable alternative that will make it work properly. By the way, GFS2 does not have this problem.
>We can see the problem and we're researching the best way to fix it. As a temporary solution gfs_writei() may be used for all cases. For example following trivial patch has fixed my issue, but with significant performance drawback, because gfs_writei() optimized for journaled mode. --- ./fs/gfs/ops_file.c~orig 2008-02-08 08:48:19.000000000 +0300 +++ ./fs/gfs/ops_file.c 2008-02-08 08:49:07.000000000 +0300 @@ -851,7 +851,7 @@ do_do_write_buf(struct file *file, char brelse(dibh); } - if (journaled || + if (1 || journaled || /* always use this branch */ (gfs_is_stuffed(ip) && !test_bit(GIF_PAGED, &ip->i_flags) && *offset + size <= sdp->sd_sb.sb_bsize - sizeof(struct gfs_dinode))) { >By the way, GFS2 does no thave this problem Yes, i do know about this, but unfortunately IMHO currently GFS2 is not ready for production.
No one has complained about this problem and no one has been able to recreate it for a very long time. Changing the code to fix it would only put customers at risk. If the problem is recreatable, we can re-address the issue at that time. I'm closing as WONTFIX.