Description of problem: This new backtrace showed up on all four of my nodes 4-5 times during a test run. I'm not sure which test case was running at the time. Each node was running an independent load with multiple test cases running at once. GFS2: fsid=morph-cluster:brawl0.0: warning: assertion "al->al_alloced" failed GFS2: fsid=morph-cluster:brawl0.0: function = alloc_page_backing, file = /builddir/build/BUILD/gfs2-kmod-1.79/_kmod_build_PAE/ops_vm.c, line = 94 [<f8d99d62>] gfs2_assert_warn_i+0x7e/0x113 [gfs2] [<f8d92814>] gfs2_sharewrite_nopage+0x24c/0x2bb [gfs2] [<f8d9260b>] gfs2_sharewrite_nopage+0x43/0x2bb [gfs2] [<c045f2de>] __handle_mm_fault+0x1d0/0xb62 [<f8d85f51>] gfs2_glock_nq+0x16b/0x18b [gfs2] [<c042de3a>] lock_timer_base+0x15/0x2f [<c04e2822>] prio_tree_insert+0x1b/0x1f2 [<c0609726>] do_page_fault+0x2a5/0x5d3 [<c0609481>] do_page_fault+0x0/0x5d3 [<c0405a71>] error_code+0x39/0x40 ======================= Version-Release number of selected component (if applicable): kernel-2.6.18-79.el5 kmod-gfs2-1.79-1.4.el5 How reproducible: Unknown
Raising the flags since this is a recent regression.
I ran through brawl again and found that the messages only showed up while the tests were running on a file system with a 1k block size.
I ran the test cases from d_io one at a time and it looks like the tag "genesis_reg" is the reproducer. genesis -i 30s -n 1000 -d 100 -p 10 -L flock -s 1048576 -w /mnt/gfs2
Take a look at gfs2_write_alloc_required() as I suspect that you'll find the answer in the recent changes to that function.
I traced the cause of this assert-warning to a code-change to gfs2_write_alloc_required() as part of the patch to bug 253990. @@ -1226,8 +1193,13 @@ int gfs2_write_alloc_required(struct gfs do_div(lblock_stop, bsize); } else { unsigned int shift = sdp->sd_sb.sb_bsize_shift; + u64 end_of_file = (ip->i_di.di_size + sdp->sd_sb.sb_bsize - 1) >> shift; lblock = offset >> shift; lblock_stop = (offset + len + sdp->sd_sb.sb_bsize - 1) >> shift; + if (lblock_stop > end_of_file) { + *alloc_required = 1; + return 0; + } } for (; lblock < lblock_stop; lblock += extlen) { error = gfs2_extent_map(&ip->i_inode, lblock, &new, &dblock, &extlen); if (error) return error; if (!dblock) { *alloc_required = 1; return 0; } } Here, we check if the requested write is beyond the end of the file, if yes, we assume allocation is required and set alloc_required = 1. This saves the looping call to gfs2_extent_map below to determine if the underlying disk blocks are alloced or not. However, in the case where we trip this assert warning, the disk-blocks are already alloced beyond the end of file, but we still set alloc_required = 1. gfs2 then goes on to alloc_page_backing() to find that the blocks are already alloced and trips the warning. One solution is the remove the assert-warning. There's a little bit of wasteful work being done to determine if the blocks are already allocated, but it doesn't break anything. Another way is to amend the patch above to consider the case where blocks beyond the end of the file are allocated, and if so, return alloc_required = 0. Steve/Bob, your thoughts?
I guess the question is why those blocks are beyond the end of the file and apparently already allocated? I wonder if its a result of truncate not truncating to the correct boundary perhaps. Provided we are sure that the fact that the blocks already exist is harmless, then I'm happy just to comment out the warning.
Created attachment 296422 [details] program to recreate bug Paths, filenames and numbers are hard-coded and there's no error checking whatsoever. Just make sure you mkfs.gfs2 with blocksize 1024
Posted patch to comment out the assert warning to rhkernel-list http://post-office.corp.redhat.com/archives/rhkernel-list/2008-March/msg00241.html
in kernel-$NEW_VER You can download this test kernel from http://people.redhat.com/dzickus/el5
in kernel-2.6.18-85.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0314.html