432824 – GFS2: warning: assertion "al->al_alloced" failed in alloc_page_backing

Bug 432824 - GFS2: warning: assertion "al->al_alloced" failed in alloc_page_backing

Summary: GFS2: warning: assertion "al->al_alloced" failed in alloc_page_backing

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.2
Hardware:	All
OS:	Linux
Priority:	low
Severity:	low
Target Milestone:	rc
Target Release:	---
Assignee:	Don Zickus
QA Contact:	GFS Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2008-02-14 16:40 UTC by Nate Straz
Modified:	2008-05-21 15:09 UTC (History)
CC List:	5 users (show)
Fixed In Version:	RHBA-2008-0314
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2008-05-21 15:09:32 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
program to recreate bug (1.23 KB, text/x-csrc) 2008-02-29 21:42 UTC, Abhijith Das	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2008:0314	0	normal	SHIPPED_LIVE	Updated kernel packages for Red Hat Enterprise Linux 5.2	2008-05-20 18:43:34 UTC

Description Nate Straz 2008-02-14 16:40:59 UTC

Description of problem:

This new backtrace showed up on all four of my nodes 4-5 times during a test
run.  I'm not sure which test case was running at the time.  Each node was
running an independent load with multiple test cases running at once.

GFS2: fsid=morph-cluster:brawl0.0: warning: assertion "al->al_alloced" failed
GFS2: fsid=morph-cluster:brawl0.0:   function = alloc_page_backing, file =
/builddir/build/BUILD/gfs2-kmod-1.79/_kmod_build_PAE/ops_vm.c, line = 94
 [<f8d99d62>] gfs2_assert_warn_i+0x7e/0x113 [gfs2]
 [<f8d92814>] gfs2_sharewrite_nopage+0x24c/0x2bb [gfs2]
 [<f8d9260b>] gfs2_sharewrite_nopage+0x43/0x2bb [gfs2]
 [<c045f2de>] __handle_mm_fault+0x1d0/0xb62
 [<f8d85f51>] gfs2_glock_nq+0x16b/0x18b [gfs2]
 [<c042de3a>] lock_timer_base+0x15/0x2f
 [<c04e2822>] prio_tree_insert+0x1b/0x1f2
 [<c0609726>] do_page_fault+0x2a5/0x5d3
 [<c0609481>] do_page_fault+0x0/0x5d3
 [<c0405a71>] error_code+0x39/0x40
 =======================

Version-Release number of selected component (if applicable):
kernel-2.6.18-79.el5
kmod-gfs2-1.79-1.4.el5

How reproducible:
Unknown

Comment 1 Nate Straz 2008-02-14 18:15:19 UTC

Raising the flags since this is a recent regression.

Comment 3 Nate Straz 2008-02-19 16:53:36 UTC

I ran through brawl again and found that the messages only showed up while the
tests were running on a file system with a 1k block size.

Comment 4 Nate Straz 2008-02-19 17:45:20 UTC

I ran the test cases from d_io one at a time and it looks like the tag
"genesis_reg" is the reproducer.

genesis -i 30s -n 1000 -d 100 -p 10  -L flock -s 1048576  -w /mnt/gfs2

Comment 5 Steve Whitehouse 2008-02-22 16:07:38 UTC

Take a look at gfs2_write_alloc_required() as I suspect that you'll find the
answer in the recent changes to that function.

Comment 6 Abhijith Das 2008-02-26 20:34:02 UTC

I traced the cause of this assert-warning to a code-change to
gfs2_write_alloc_required() as part of the patch to bug 253990.

@@ -1226,8 +1193,13 @@ int gfs2_write_alloc_required(struct gfs
 		do_div(lblock_stop, bsize);
 	} else {
 		unsigned int shift = sdp->sd_sb.sb_bsize_shift;
+		u64 end_of_file = (ip->i_di.di_size + sdp->sd_sb.sb_bsize - 1) >> shift;
 		lblock = offset >> shift;
 		lblock_stop = (offset + len + sdp->sd_sb.sb_bsize - 1) >> shift;
+		if (lblock_stop > end_of_file) {
+			*alloc_required = 1;
+			return 0;
+		}
 	}
 
 	for (; lblock < lblock_stop; lblock += extlen) {
		error = gfs2_extent_map(&ip->i_inode, lblock, &new, &dblock, &extlen);
		if (error)
			return error;

		if (!dblock) {
			*alloc_required = 1;
			return 0;
		}
	}

Here, we check if the requested write is beyond the end of the file, if yes, we
assume allocation is required and set alloc_required = 1. This saves the looping
call to gfs2_extent_map below to determine if the underlying disk blocks are
alloced or not.

However, in the case where we trip this assert warning, the disk-blocks are
already alloced beyond the end of file, but we still set alloc_required = 1.
gfs2 then goes on to alloc_page_backing() to find that the blocks are already
alloced and trips the warning.

One solution is the remove the assert-warning. There's a little bit of wasteful
work being done to determine if the blocks are already allocated, but it doesn't
break anything.

Another way is to amend the patch above to consider the case where blocks beyond
the end of the file are allocated, and if so, return alloc_required = 0.

Steve/Bob, your thoughts?

Comment 7 Steve Whitehouse 2008-02-26 23:01:20 UTC

I guess the question is why those blocks are beyond the end of the file and
apparently already allocated? I wonder if its a result of truncate not
truncating to the correct boundary perhaps.

Provided we are sure that the fact that the blocks already exist is harmless,
then I'm happy just to comment out the warning.

Comment 8 Abhijith Das 2008-02-29 21:42:11 UTC

Created attachment 296422 [details]
program to recreate bug

Paths, filenames and numbers are hard-coded and there's no error checking
whatsoever.
Just make sure you mkfs.gfs2 with blocksize 1024

Comment 9 Abhijith Das 2008-03-08 22:31:00 UTC

Posted patch to comment out the assert warning to rhkernel-list
http://post-office.corp.redhat.com/archives/rhkernel-list/2008-March/msg00241.html

Comment 10 Don Zickus 2008-03-12 19:41:38 UTC

in kernel-$NEW_VER
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 11 Don Zickus 2008-03-12 20:00:17 UTC

in kernel-2.6.18-85.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 14 errata-xmlrpc 2008-05-21 15:09:32 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0314.html

Note You need to log in before you can comment on or make changes to this bug.