241096 – GFS: bug in gfs truncate

Bug 241096 - GFS: bug in gfs truncate

Summary: GFS: bug in gfs truncate

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	gfs-kmod
Sub Component:
Version:	5.0
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Robert Peterson
QA Contact:	GFS Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2007-05-24 04:38 UTC by Vasily Averin
Modified:	2010-01-12 03:26 UTC (History)
CC List:	2 users (show)
Fixed In Version:	RHBA-2007-0577
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2007-11-07 20:06:59 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2007:0577	0	normal	SHIPPED_LIVE	gfs-kmod bug fix update	2007-10-30 15:36:56 UTC

Description Vasily Averin 2007-05-24 04:38:04 UTC

Alexey Kuznetsov <kuznet.ac.ru> from Virtuozzo/OpenVz linux kernel team
has discowered following issue:

[GFS] bug in gfs truncate

GFS enforces a strange invariant: tree of indirect blocks
must be constructed for all the blocks potentially allocateable
inside inode size. Technically this is a stupid requirement,
but both gfs kernel and user space gfs_fsck consider inodes
with bad size as totally invalid and destroy them.

It means that GFS has to be especially careful to maintain
correct inode size. One point of failure is truncate():
if quota does not allow to grow inode size, it leaves i_size
at wrong value. Subsequent write to inode moves wrong i_size
to disk size and height of tree remains not updated. Such inode
is invalid and it is destroyed by subsequent fsck.

diff -urp ../gfs-kmod-0.1.16-orig/gfs-kernel-0.1.16/src/gfs/ops_inode.c
gfs-kernel-0.1.16/src/gfs/ops_inode.c
--- ../gfs-kmod-0.1.16-orig/gfs-kernel-0.1.16/src/gfs/ops_inode.c	2007-05-20
19:18:19.000000000 +0400
+++ gfs-kernel-0.1.16/src/gfs/ops_inode.c	2007-05-20 00:43:03.000000000 +0400
@@ -1395,8 +1507,11 @@ gfs_setattr(struct dentry *dentry, struc
 		}
 
 		error = gfs_truncatei(ip, attr->ia_size, gfs_truncator_page);
-		if (error)
+		if (error) {
+			if (inode->i_size != ip->i_di.di_size)
+				i_size_write(inode, ip->i_di.di_size);
 			goto fail;
+		}
 
 		if ((sdp->sd_vfs->s_flags & MS_SYNCHRONOUS) &&
 		    !gfs_is_jdata(ip))

Comment 1 Robert Peterson 2007-06-26 14:51:45 UTC

I looked at the code and I don't see how this can be a problem.

The suggested code is fixing the inode size on error, but this is
the wrong place to do it.  All the functions, like gfs_truncatei
have error paths that are supposed to clean up after themselves and
as far as I can see, the code is doing that properly.

I ran this quick test to see if my theory was correct.  I put a 1M
quota on a gfs file system, then exceeded that quota.  The inode
information seemed to be correct.  Here's what I did:

[root@trin-10 root]# mount -tgfs /dev/trin_vg/hell /mnt/hell/
[root@trin-10 root]# gfs_quota limit -l 1 -u testmonkey -f /mnt/hell
[root@trin-10 root]# chmod 777 /mnt/hell
[root@trin-10 root]# su testmonkey
[testmonkey@trin-10 root]$ cd /mnt/hell
[testmonkey@trin-10 hell]$ mkdir monkey
[testmonkey@trin-10 hell]$ cd monkey/
[testmonkey@trin-10 monkey]$ dd if=/dev/zero of=/mnt/hell/1000K bs=1K count=1000
1000+0 records in
1000+0 records out
1024000 bytes (1.0 MB) copied, 0.20244 seconds, 5.1 MB/s
[testmonkey@trin-10 monkey]$ cp /home/devel/gfs2-2.6.git.tgz /mnt/hell/monkey/
GFS: fsid=bob_cluster2:test_gfs.0: quota exceeded for user 500
cp: writing `/mnt/hell/monkey/gfs2-2.6.git.tgz': Disk quota exceeded
[testmonkey@trin-10 monkey]$ exit
[root@trin-10 root]# ls -lr /mnt/hell/monkey/gfs2-2.6.git.tgz 
-rw-r--r-- 1 testmonkey testmonkey 16384 Jun 26 09:27
/mnt/hell/monkey/gfs2-2.6.git.tgz
[root@trin-10 ../gfs-kernel/src/gfs]# stat !$
stat /mnt/hell/monkey/gfs2-2.6.git.tgz
  File: `/mnt/hell/monkey/gfs2-2.6.git.tgz'
  Size: 16384           Blocks: 40         IO Block: 4096   regular file
Device: fe03h/65027d    Inode: 30          Links: 1
Access: (0644/-rw-r--r--)  Uid: (  500/testmonkey)   Gid: (  500/testmonkey)
Access: 2007-06-26 09:27:16.000000000 -0500
Modify: 2007-06-26 09:27:16.000000000 -0500
Change: 2007-06-26 09:27:16.000000000 -0500

Maybe I misunderstood what the problem is.  Please give an
example of how to recreate the quota problem and what bad value(s)
you are seeing.

Comment 2 Monakhov Dmitriy 2007-06-27 12:25:54 UTC

>All the functions, like gfs_truncatei
>have error paths that are supposed to clean up after themselves and
>as far as I can see, the code is doing that properly.
Actually error path just not exist. So we have to fix i_size explicitly.

Simple test case:
/* gcc  trunc.c -otrunc */
#include <unistd.h>
#include <unistd.h>
#include <fcntl.h>
int main(int argc, char** argv)
{
        int fd = open(argv[1], O_RDWR|O_CREAT|O_TRUNC, 0666);
        lseek(fd, 1024*1024*1024, SEEK_SET);
        /* i_size = 0, di_size = 0 */
        ftruncate(fd, 1024*1024*1024); /* this have to fail because of quota*/
        /* after trucate failed we have:i_size = 1073741824, di_size = 0 */
        write(fd, "", 1);
        close(fd);
}

# mount /dev/vzvg/gfs  /mnt/
su testmonkey;
$ touch /mnt/file # create file for later test
$ dd if=/dev/zero of=/mnt/BIG_FILE # write until -EDQUOT
$ ./trunc /mnt/file # test itself 
$ exit
# sync;sync;sync # force commit journal
#umount /mnt/; gfs_fsck /dev/vzvg/gfs
Initializing fsck
Clearing journals (this may take a while).....
Journals cleared.
Starting pass1
Found unused inode marked in-use
Clear unused inode at block 30? (y/n) y
Pass1 complete      
Starting pass1b
Pass1b complete      
Starting pass1c
Pass1c complete      
Starting pass2
Pass2 complete      
Starting pass3
Pass3 complete      
Starting pass4
Pass4 complete      
Starting pass5
ondisk and fsck bitmaps differ at block 30
Fix bitmap for block 30? (y/n) y
Succeeded.
RG #1 free count inconsistent: is 63398 should be 63399
RG #1 used inode count inconsistent: is 8 should be 7
Update resource group counts? (y/n) y
Resource group counts updated
Pass5 complete      
Writing changes to disk

Comment 3 RHEL Program Management 2007-06-29 22:03:34 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 4 Robert Peterson 2007-06-29 22:06:07 UTC

After looking at the code path and how gfs calls vmtruncate, I agree
that this is the correct fix.  Crosswrite for gfs2 has already been
proposed, but not yet in the git tree yet.
The suggested patch was tested on system trin-10 and fixes the problem.
I'll try to squeeze it into RHEL5.1 but we're right at the cutoff, so
I don't know if it will make it there or 5.2.

Comment 6 Robert Peterson 2007-07-11 22:16:53 UTC

The fix was tested on system roth-01 using information in the bugzilla.
The fix was committed to RHEL5 and RHEL51 branches.  Changing the status
to modified.

Comment 9 errata-xmlrpc 2007-11-07 20:06:59 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0577.html

Note You need to log in before you can comment on or make changes to this bug.