Bug 432824 - GFS2: warning: assertion "al->al_alloced" failed in alloc_page_backing
Summary: GFS2: warning: assertion "al->al_alloced" failed in alloc_page_backing
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.2
Hardware: All
OS: Linux
low
low
Target Milestone: rc
: ---
Assignee: Don Zickus
QA Contact: GFS Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-02-14 16:40 UTC by Nate Straz
Modified: 2008-05-21 15:09 UTC (History)
5 users (show)

Fixed In Version: RHBA-2008-0314
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-05-21 15:09:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
program to recreate bug (1.23 KB, text/x-csrc)
2008-02-29 21:42 UTC, Abhijith Das
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2008:0314 0 normal SHIPPED_LIVE Updated kernel packages for Red Hat Enterprise Linux 5.2 2008-05-20 18:43:34 UTC

Description Nate Straz 2008-02-14 16:40:59 UTC
Description of problem:

This new backtrace showed up on all four of my nodes 4-5 times during a test
run.  I'm not sure which test case was running at the time.  Each node was
running an independent load with multiple test cases running at once.

GFS2: fsid=morph-cluster:brawl0.0: warning: assertion "al->al_alloced" failed
GFS2: fsid=morph-cluster:brawl0.0:   function = alloc_page_backing, file =
/builddir/build/BUILD/gfs2-kmod-1.79/_kmod_build_PAE/ops_vm.c, line = 94
 [<f8d99d62>] gfs2_assert_warn_i+0x7e/0x113 [gfs2]
 [<f8d92814>] gfs2_sharewrite_nopage+0x24c/0x2bb [gfs2]
 [<f8d9260b>] gfs2_sharewrite_nopage+0x43/0x2bb [gfs2]
 [<c045f2de>] __handle_mm_fault+0x1d0/0xb62
 [<f8d85f51>] gfs2_glock_nq+0x16b/0x18b [gfs2]
 [<c042de3a>] lock_timer_base+0x15/0x2f
 [<c04e2822>] prio_tree_insert+0x1b/0x1f2
 [<c0609726>] do_page_fault+0x2a5/0x5d3
 [<c0609481>] do_page_fault+0x0/0x5d3
 [<c0405a71>] error_code+0x39/0x40
 =======================

Version-Release number of selected component (if applicable):
kernel-2.6.18-79.el5
kmod-gfs2-1.79-1.4.el5

How reproducible:
Unknown

Comment 1 Nate Straz 2008-02-14 18:15:19 UTC
Raising the flags since this is a recent regression. 

Comment 3 Nate Straz 2008-02-19 16:53:36 UTC
I ran through brawl again and found that the messages only showed up while the
tests were running on a file system with a 1k block size.

Comment 4 Nate Straz 2008-02-19 17:45:20 UTC
I ran the test cases from d_io one at a time and it looks like the tag
"genesis_reg" is the reproducer.

genesis -i 30s -n 1000 -d 100 -p 10  -L flock -s 1048576  -w /mnt/gfs2

Comment 5 Steve Whitehouse 2008-02-22 16:07:38 UTC
Take a look at gfs2_write_alloc_required() as I suspect that you'll find the
answer in the recent changes to that function.

Comment 6 Abhijith Das 2008-02-26 20:34:02 UTC
I traced the cause of this assert-warning to a code-change to
gfs2_write_alloc_required() as part of the patch to bug 253990.

@@ -1226,8 +1193,13 @@ int gfs2_write_alloc_required(struct gfs
 		do_div(lblock_stop, bsize);
 	} else {
 		unsigned int shift = sdp->sd_sb.sb_bsize_shift;
+		u64 end_of_file = (ip->i_di.di_size + sdp->sd_sb.sb_bsize - 1) >> shift;
 		lblock = offset >> shift;
 		lblock_stop = (offset + len + sdp->sd_sb.sb_bsize - 1) >> shift;
+		if (lblock_stop > end_of_file) {
+			*alloc_required = 1;
+			return 0;
+		}
 	}
 
 	for (; lblock < lblock_stop; lblock += extlen) {
		error = gfs2_extent_map(&ip->i_inode, lblock, &new, &dblock, &extlen);
		if (error)
			return error;

		if (!dblock) {
			*alloc_required = 1;
			return 0;
		}
	}

Here, we check if the requested write is beyond the end of the file, if yes, we
assume allocation is required and set alloc_required = 1. This saves the looping
call to gfs2_extent_map below to determine if the underlying disk blocks are
alloced or not.

However, in the case where we trip this assert warning, the disk-blocks are
already alloced beyond the end of file, but we still set alloc_required = 1.
gfs2 then goes on to alloc_page_backing() to find that the blocks are already
alloced and trips the warning.

One solution is the remove the assert-warning. There's a little bit of wasteful
work being done to determine if the blocks are already allocated, but it doesn't
break anything.

Another way is to amend the patch above to consider the case where blocks beyond
the end of the file are allocated, and if so, return alloc_required = 0.

Steve/Bob, your thoughts?

Comment 7 Steve Whitehouse 2008-02-26 23:01:20 UTC
I guess the question is why those blocks are beyond the end of the file and
apparently already allocated? I wonder if its a result of truncate not
truncating to the correct boundary perhaps.

Provided we are sure that the fact that the blocks already exist is harmless,
then I'm happy just to comment out the warning.


Comment 8 Abhijith Das 2008-02-29 21:42:11 UTC
Created attachment 296422 [details]
program to recreate bug

Paths, filenames and numbers are hard-coded and there's no error checking
whatsoever.
Just make sure you mkfs.gfs2 with blocksize 1024

Comment 9 Abhijith Das 2008-03-08 22:31:00 UTC
Posted patch to comment out the assert warning to rhkernel-list
http://post-office.corp.redhat.com/archives/rhkernel-list/2008-March/msg00241.html

Comment 10 Don Zickus 2008-03-12 19:41:38 UTC
in kernel-$NEW_VER
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 11 Don Zickus 2008-03-12 20:00:17 UTC
in kernel-2.6.18-85.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 14 errata-xmlrpc 2008-05-21 15:09:32 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0314.html



Note You need to log in before you can comment on or make changes to this bug.