Bug 210362 - gfs_quota hangs in do_list()
Summary: gfs_quota hangs in do_list()
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: gfs
Version: 4
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Abhijith Das
QA Contact: GFS Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2006-10-11 17:53 UTC by Lenny Maiorani
Modified: 2010-01-12 03:13 UTC (History)
0 users

Fixed In Version: RHBA-2007-0139
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-05-10 21:00:16 UTC
Embargoed:


Attachments (Terms of Use)
strace of the process during the hang (7.71 MB, text/plain)
2006-10-11 17:53 UTC, Lenny Maiorani
no flags Details
gfs quota helper program - to fix quota file (5.15 KB, application/x-gzip)
2006-10-12 21:58 UTC, Abhijith Das
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2007:0139 0 normal SHIPPED_LIVE GFS bug fix update 2007-05-10 20:59:07 UTC
Red Hat Product Errata RHBA-2007:0142 0 normal SHIPPED_LIVE GFS-kernel bug fix update 2007-05-10 21:12:20 UTC

Description Lenny Maiorani 2006-10-11 17:53:53 UTC
Description of problem:
gfs_quota is hanging when run with these parameters:
'gfs_quota list -n -k -f /path'

Version-Release number of selected component (if applicable):
GFS-6.1.5-0.x86_64.rpm

How reproducible:
always on this specific filesystem

Steps to Reproduce:
1. just run the command, unsure of how we arrived at this state as cannot get
this to happen on any other filesystems
2.
3.
  
Actual results:
gfs_quota hangs indefinitely

Expected results:
return the quota list

Additional info:
strace: 
open("/mnt/iGrid/frii_snap", O_RDONLY)  = 3
ioctl(3, 0x4723, 0x7fbfffe2fc)          = 0
ioctl(3, 0x472d, 0x7fbfffe2d0)          = 352
ioctl(3, 0x472d, 0x7fbfffe6b0)          = 88
fstat(1, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x2a95557000
ioctl(3, 0x472d, 0x7fbfffe6b0)          = 88
ioctl(3, 0x472d, 0x7fbfffe6b0)          = 88
ioctl(3, 0x472d, 0x7fbfffe6b0)          = 88
ioctl(3, 0x472d, 0x7fbfffe6b0)          = 88
ioctl(3, 0x472d, 0x7fbfffe6b0)          = 88
ioctl(3, 0x472d, 0x7fbfffe6b0)          = 88
ioctl(3, 0x472d, 0x7fbfffe6b0)          = 88
ioctl(3, 0x472d, 0x7fbfffe6b0)          = 88
ioctl(3, 0x472d, 0x7fbfffe6b0)          = 88
ioctl(3, 0x472d, 0x7fbfffe6b0)          = 88
ioctl(3, 0x472d, 0x7fbfffe6b0)          = 88
ioctl(3, 0x472d, 0x7fbfffe6b0)          = 88
ioctl(3, 0x472d, 0x7fbfffe6b0)          = 88



From this I know that it is in an infinite loop in do_list() with ioctl()
returning 88.

Comment 1 Lenny Maiorani 2006-10-11 17:53:56 UTC
Created attachment 138263 [details]
strace of the process during the hang

Comment 2 Abhijith Das 2006-10-11 22:14:22 UTC
I think I know what the problem is here, but the only way to verify it would be
to let 'gfs_quota list ...' run until it finishes. (yeah, I don't think it's
hung/looping). Can you please try running the command again (without the strace)
and leaving it to its misery for sometime (maybe 'time' it for curiosity's sake)
and see if it terminates?

Theory:
The hidden quota file contains alternating user and group quota information.
Each info block is 88 bytes in GFS1. The first 88 bytes will contain quota info
for uid=0. The second 88 bytes will contain infor for gid=0. Third and fourth 88
bytes for uid=1 and gid=1 respectively, and so on.
For uid=x, the offset in the quota file for x's quota info will be 
off_uid_x = x * 2 * 88 
and for gid=y, the offset in the quota file for y's quota info will be 
off_gid_y = (y * 2 * 88) + 88
As is obvious, the maxiumum value for x or y will determine the size of the
quota file. For example, if you do something like 
'gfs_quota limit -u 10000000 -l 1048576 -k -f /mnt/gfs/' the size of the quota
file becomes 10000000 * 2 * 88 = 1760000000 which is approx ~1.7GB!!

The way gfs_quota works:
It reads the uids first (odd numbered quota info blocks) followed by the gids
(even numbered quota info blocks). Basically, making two passes over the quota
file in 88*2 byte increments, reading 88 bytes each time.

For a large quota file, this operation might seem like it's hung/looping. It's
not. And the tons of strace output (ioctl(3, 0x472d, 0x7fbfffe6b0)          =
88) you see actually makes sense as gfs_quota moves along merrily across the
quota file.

On your specific fs, this high-valued uid/gid could've been added intentionally
or accidentally or otherwise (worrysome). In any case, there are some things we
need to do to prevent this from happening;
a) check for valid existing uid/gid before adding into the quota file
b) set a max limit on the numerical uid/gid we support (and thereby the maximum
size of the quota file itself)
c) provide a way of resetting the quota file (either through gfs_fsck or
gfs_quota) in case such a thing should happen.

In your case, the damage is already done. If the data in the fs is not critical,
you can do a gfs_mkfs and life goes on as usual. If the data is critical, we
need to repair your filesystem by looking at the quota file and
editing/truncating it by hand.

Comment 3 Lenny Maiorani 2006-10-12 17:20:47 UTC
How can we look at the hidden quota file? 

The data on the fs is critical so gfs_mkfs is out of the question. 

Comment 4 Abhijith Das 2006-10-12 21:58:03 UTC
Created attachment 138388 [details]
gfs quota helper program - to fix quota file

Extract tarball on the gfs-machine and run 'make' to get the gfs_qh program.
README explains the output and behavior of the program.

Please run the following first:
[root@niobe-05 test]# ./gfs_qh stat <gfs mountpoint>

The di_size value shows the size of your quota file in bytes.

Next run:
[root@niobe-05 test]# ./gfs_qh dump -f /tmp/foo <gfs mountpoint>
If your quota file is very large, this could take a while. Please compress this
/tmp/foo file and attach to bz or email.

The program has two actions COPY and TRUNC which will overwrite (with another
quota file) and truncate the quota file (to specified size) respectively.
Truncating the quota file could cause loss of quota information and the
gfs_quota tool should be used subsequently to input the quota information
again. Please use these two functions carefully.

Comment 5 Lenny Maiorani 2006-10-12 22:38:01 UTC
This is a 22GB filesystem and it appears that the quota file is 755GB! Am I
reading this correctly?

I am not going to perform the DUMP because I don't know what the outcome might
be and regardless, I don't have space on that machine for a 755GB file.

Suggestions? Maybe truncate the file to 0 bytes and rebuild it?


[root@frii01 gfs_qh]# cat /tmp/gfs_qh_stat 
----------------------------
  *** GFS QUOTA DINODE ***
----------------------------
  mh_magic = 0x01161970
  mh_type = 4
  mh_generation = 16770
  mh_format = 400
  mh_incarn = 0
  no_formal_ino = 24
  no_addr = 24
  di_mode = 0600
  di_uid = 0
  di_gid = 0
  di_nlink = 1
  di_size = 755914243920
  di_blocks = 141
  di_atime = 1156274359
  di_mtime = 1160677746
  di_ctime = 1160677746
  di_major = 0
  di_minor = 0
  di_rgrp = 0
  di_goal_rgrp = 18022400
  di_goal_dblk = 0
  di_goal_mblk = 48604
  di_flags = 0x00000001
  di_payload_format = 1500
  di_type = 1
  di_height = 4
  di_incarn = 0
  di_pad = 0
  di_depth = 0
  di_entries = 0
  no_formal_ino = 0
  no_addr = 0
  di_eattr = 0
  di_reserved =
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00


Comment 6 Abhijith Das 2006-10-13 15:06:59 UTC
Truncating the quota file seems like the only option right now. You will however
lose all quota-related limits/warns when you truncate it to zero.

You need to do these two steps and optionally add quota information using
gfs_quota after that.
[root@niobe-05 test]# ./gfs_qh trunc -t 0 /mnt/gfs/
[root@niobe-05 test]# gfs_quota init -f /mnt/gfs/



Comment 10 Abhijith Das 2007-01-10 18:04:13 UTC
CVSROOT:	/cvs/cluster
Module name:	cluster
Branch: 	RHEL4
Changes by:	adas	2007-01-10 17:53:54

Modified files:
	gfs-kernel/src/gfs: ioctl.c 
	gfs/gfs_quota  : Makefile main.c gfs_quota.h 
Added files:
	gfs/gfs_quota  : layout.c 

Log message:
	fix bz 210362: We don't run through the entire gfs_quota sparse file to do a
list operation anymore. We get the layout of the gfs_quota file on disk and only
read quota information off the data blocks that are actually in use. Also added
functionality to GFS_IOCTL_SUPER to provide the metadata of the hidden quota file.

Patches:
http://sourceware.org/cgi-bin/cvsweb.cgi/cluster/gfs-kernel/src/gfs/ioctl.c.diff?cvsroot=cluster&only_with_tag=RHEL4&r1=1.7.2.4&r2=1.7.2.5
http://sourceware.org/cgi-bin/cvsweb.cgi/cluster/gfs/gfs_quota/layout.c.diff?cvsroot=cluster&only_with_tag=RHEL4&r1=NONE&r2=1.1.2.1
http://sourceware.org/cgi-bin/cvsweb.cgi/cluster/gfs/gfs_quota/Makefile.diff?cvsroot=cluster&only_with_tag=RHEL4&r1=1.3&r2=1.3.2.1
http://sourceware.org/cgi-bin/cvsweb.cgi/cluster/gfs/gfs_quota/main.c.diff?cvsroot=cluster&only_with_tag=RHEL4&r1=1.2&r2=1.2.2.1
http://sourceware.org/cgi-bin/cvsweb.cgi/cluster/gfs/gfs_quota/gfs_quota.h.diff?cvsroot=cluster&only_with_tag=RHEL4&r1=1.2&r2=1.2.2.1



Comment 11 Lenny Maiorani 2007-01-12 20:38:12 UTC
Is this backwards compatible? If I have a running filesystem with quota files
already existing, can I put this patch in and then have it not take so long? Do
I need to rebuild my quota files?

Comment 12 Abhijith Das 2007-01-12 20:44:44 UTC
Yes. The quota file itself hasn't changed so you should be able to apply these
patches and have 'gfs_quota list' work without taking long. Please note that
this fix includes a patch to the gfs-kernel (GFS filesystem kernel module) as
well, without which the 'gfs_quota list' functionality won't work.

Comment 15 Red Hat Bugzilla 2007-05-10 21:00:16 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0139.html



Note You need to log in before you can comment on or make changes to this bug.