Bug 465151 - GFS: madvise system call causes assertion
GFS: madvise system call causes assertion
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: gfs-kmod (Show other bugs)
All Linux
urgent Severity urgent
: rc
: ---
Assigned To: Chris Feist
Cluster QE
: ZStream
: 464837 (view as bug list)
Depends On:
Blocks: 465325 465327
  Show dependency treegraph
Reported: 2008-10-01 16:09 EDT by Abhijith Das
Modified: 2010-10-23 00:52 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2009-01-20 16:18:04 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
RHEL5.3 version of the proposed patch (2.97 KB, patch)
2008-10-01 18:10 EDT, Abhijith Das
no flags Details | Diff

  None (edit)
Description Abhijith Das 2008-10-01 16:09:35 EDT
Description of problem:

Since the madvise system call was enabled by the patch to bug 429343, it's possible for a inode glock holder to never get dequeued through gfs_readpage. This causes an assertion (bug 464837)

GFS: fsid=cl102a:gfs1.1: warning: assertion "(gh->gh_flags & LM_FLAG_ANY) || (tmp_gh->gh_flags & LM_FLAG_ANY)" failed
GFS: fsid=cl102a:gfs1.1: function = add_to_queue
GFS: fsid=cl102a:gfs1.1: file = /builddir/build/BUILD/gfs-kmod-0.1.23/_kmod_build_/src/gfs/glock.c, line = 1418
GFS: fsid=cl102a:gfs1.1: time = 1222739610

I can't think of a way to make madvise work, and reverting the patch mentioned above and returning ENOSYS to madvise is the currently proposed solution to this problem.

I'll post a patch as soon as I verify that everything works as expected.
Comment 1 Chris Feist 2008-10-01 16:44:34 EDT
Setting acks.
Comment 2 Abhijith Das 2008-10-01 18:10:50 EDT
Created attachment 319171 [details]
RHEL5.3 version of the proposed patch

This patch reverts the patch to bz 429343. Don't log any warnings/errors and simply return ENOSYS when you arrive at gfs_readpage without the inode glock held. (madvise syscall case)
Comment 3 Kiersten (Kerri) Anderson 2008-10-02 09:56:31 EDT
Pushing out a 5.2.z release, so need to address in 5.3 as well.
Comment 4 Kiersten (Kerri) Anderson 2008-10-02 10:15:16 EDT
Guess this should be a blocker instead of an exception due to 5.2.z activity.
Comment 5 Chris Feist 2008-10-02 14:54:25 EDT
This has been committed and is being marked modified so we can get a zstream bug.
Comment 8 Abhijith Das 2008-10-07 11:27:52 EDT
*** Bug 464837 has been marked as a duplicate of this bug. ***
Comment 9 Chris Evich 2008-10-07 13:12:05 EDT
I'm probably completely misunderstanding this, but won't reversing the entire patch regress the 429343 fix?  

Isn't this:

+		return -ENOSYS;

all that we need to do?
Comment 10 Abhijith Das 2008-10-07 13:41:20 EDT

Take a look at https://bugzilla.redhat.com/show_bug.cgi?id=429343#c5
The "easier solution" is the one we've done here.

In the patch in bz 429343, doing

+  return -ENOSYS;

is not correct. We shouldn't attempt to grab the inode glock if it's not already held in gfs_readpage. Notice that we don't free (dequeue) the inode glock until gfs_readpage is called again. Returning AOP_TRUNCATED_PAGE when gfs_readpage is called through the madvise code, will not cause gfs_readpage to be called again and this inode glock will never be freed. Replacing AOP_TRUNCATED_PAGE with -ENOSYS will definitely not solve this and we'll end up with the same unfreed glock.

Hope this clears things for you.
Also, it's worth noting that the original bug 429343 was the assert in gfs_readpage.

if (gfs_assert_warn(sdp, gfs_glock_is_locked_by_me(ip->i_gl))) {
	return -ENOSYS;

After the revert+change with the patch in comment #2, the only change we'll have is
- if (gfs_assert_warn(sdp, gfs_glock_is_locked_by_me(ip->i_gl))) {
+ if (!gfs_glock_is_locked_by_me(ip->i_gl)) {
Comment 13 errata-xmlrpc 2009-01-20 16:18:04 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.