Description of problem: gfs_quota is hanging when run with these parameters: 'gfs_quota list -n -k -f /path' Version-Release number of selected component (if applicable): GFS-6.1.5-0.x86_64.rpm How reproducible: always on this specific filesystem Steps to Reproduce: 1. just run the command, unsure of how we arrived at this state as cannot get this to happen on any other filesystems 2. 3. Actual results: gfs_quota hangs indefinitely Expected results: return the quota list Additional info: strace: open("/mnt/iGrid/frii_snap", O_RDONLY) = 3 ioctl(3, 0x4723, 0x7fbfffe2fc) = 0 ioctl(3, 0x472d, 0x7fbfffe2d0) = 352 ioctl(3, 0x472d, 0x7fbfffe6b0) = 88 fstat(1, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2a95557000 ioctl(3, 0x472d, 0x7fbfffe6b0) = 88 ioctl(3, 0x472d, 0x7fbfffe6b0) = 88 ioctl(3, 0x472d, 0x7fbfffe6b0) = 88 ioctl(3, 0x472d, 0x7fbfffe6b0) = 88 ioctl(3, 0x472d, 0x7fbfffe6b0) = 88 ioctl(3, 0x472d, 0x7fbfffe6b0) = 88 ioctl(3, 0x472d, 0x7fbfffe6b0) = 88 ioctl(3, 0x472d, 0x7fbfffe6b0) = 88 ioctl(3, 0x472d, 0x7fbfffe6b0) = 88 ioctl(3, 0x472d, 0x7fbfffe6b0) = 88 ioctl(3, 0x472d, 0x7fbfffe6b0) = 88 ioctl(3, 0x472d, 0x7fbfffe6b0) = 88 ioctl(3, 0x472d, 0x7fbfffe6b0) = 88 From this I know that it is in an infinite loop in do_list() with ioctl() returning 88.
Created attachment 138263 [details] strace of the process during the hang
I think I know what the problem is here, but the only way to verify it would be to let 'gfs_quota list ...' run until it finishes. (yeah, I don't think it's hung/looping). Can you please try running the command again (without the strace) and leaving it to its misery for sometime (maybe 'time' it for curiosity's sake) and see if it terminates? Theory: The hidden quota file contains alternating user and group quota information. Each info block is 88 bytes in GFS1. The first 88 bytes will contain quota info for uid=0. The second 88 bytes will contain infor for gid=0. Third and fourth 88 bytes for uid=1 and gid=1 respectively, and so on. For uid=x, the offset in the quota file for x's quota info will be off_uid_x = x * 2 * 88 and for gid=y, the offset in the quota file for y's quota info will be off_gid_y = (y * 2 * 88) + 88 As is obvious, the maxiumum value for x or y will determine the size of the quota file. For example, if you do something like 'gfs_quota limit -u 10000000 -l 1048576 -k -f /mnt/gfs/' the size of the quota file becomes 10000000 * 2 * 88 = 1760000000 which is approx ~1.7GB!! The way gfs_quota works: It reads the uids first (odd numbered quota info blocks) followed by the gids (even numbered quota info blocks). Basically, making two passes over the quota file in 88*2 byte increments, reading 88 bytes each time. For a large quota file, this operation might seem like it's hung/looping. It's not. And the tons of strace output (ioctl(3, 0x472d, 0x7fbfffe6b0) = 88) you see actually makes sense as gfs_quota moves along merrily across the quota file. On your specific fs, this high-valued uid/gid could've been added intentionally or accidentally or otherwise (worrysome). In any case, there are some things we need to do to prevent this from happening; a) check for valid existing uid/gid before adding into the quota file b) set a max limit on the numerical uid/gid we support (and thereby the maximum size of the quota file itself) c) provide a way of resetting the quota file (either through gfs_fsck or gfs_quota) in case such a thing should happen. In your case, the damage is already done. If the data in the fs is not critical, you can do a gfs_mkfs and life goes on as usual. If the data is critical, we need to repair your filesystem by looking at the quota file and editing/truncating it by hand.
How can we look at the hidden quota file? The data on the fs is critical so gfs_mkfs is out of the question.
Created attachment 138388 [details] gfs quota helper program - to fix quota file Extract tarball on the gfs-machine and run 'make' to get the gfs_qh program. README explains the output and behavior of the program. Please run the following first: [root@niobe-05 test]# ./gfs_qh stat <gfs mountpoint> The di_size value shows the size of your quota file in bytes. Next run: [root@niobe-05 test]# ./gfs_qh dump -f /tmp/foo <gfs mountpoint> If your quota file is very large, this could take a while. Please compress this /tmp/foo file and attach to bz or email. The program has two actions COPY and TRUNC which will overwrite (with another quota file) and truncate the quota file (to specified size) respectively. Truncating the quota file could cause loss of quota information and the gfs_quota tool should be used subsequently to input the quota information again. Please use these two functions carefully.
This is a 22GB filesystem and it appears that the quota file is 755GB! Am I reading this correctly? I am not going to perform the DUMP because I don't know what the outcome might be and regardless, I don't have space on that machine for a 755GB file. Suggestions? Maybe truncate the file to 0 bytes and rebuild it? [root@frii01 gfs_qh]# cat /tmp/gfs_qh_stat ---------------------------- *** GFS QUOTA DINODE *** ---------------------------- mh_magic = 0x01161970 mh_type = 4 mh_generation = 16770 mh_format = 400 mh_incarn = 0 no_formal_ino = 24 no_addr = 24 di_mode = 0600 di_uid = 0 di_gid = 0 di_nlink = 1 di_size = 755914243920 di_blocks = 141 di_atime = 1156274359 di_mtime = 1160677746 di_ctime = 1160677746 di_major = 0 di_minor = 0 di_rgrp = 0 di_goal_rgrp = 18022400 di_goal_dblk = 0 di_goal_mblk = 48604 di_flags = 0x00000001 di_payload_format = 1500 di_type = 1 di_height = 4 di_incarn = 0 di_pad = 0 di_depth = 0 di_entries = 0 no_formal_ino = 0 no_addr = 0 di_eattr = 0 di_reserved = 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Truncating the quota file seems like the only option right now. You will however lose all quota-related limits/warns when you truncate it to zero. You need to do these two steps and optionally add quota information using gfs_quota after that. [root@niobe-05 test]# ./gfs_qh trunc -t 0 /mnt/gfs/ [root@niobe-05 test]# gfs_quota init -f /mnt/gfs/
CVSROOT: /cvs/cluster Module name: cluster Branch: RHEL4 Changes by: adas 2007-01-10 17:53:54 Modified files: gfs-kernel/src/gfs: ioctl.c gfs/gfs_quota : Makefile main.c gfs_quota.h Added files: gfs/gfs_quota : layout.c Log message: fix bz 210362: We don't run through the entire gfs_quota sparse file to do a list operation anymore. We get the layout of the gfs_quota file on disk and only read quota information off the data blocks that are actually in use. Also added functionality to GFS_IOCTL_SUPER to provide the metadata of the hidden quota file. Patches: http://sourceware.org/cgi-bin/cvsweb.cgi/cluster/gfs-kernel/src/gfs/ioctl.c.diff?cvsroot=cluster&only_with_tag=RHEL4&r1=1.7.2.4&r2=1.7.2.5 http://sourceware.org/cgi-bin/cvsweb.cgi/cluster/gfs/gfs_quota/layout.c.diff?cvsroot=cluster&only_with_tag=RHEL4&r1=NONE&r2=1.1.2.1 http://sourceware.org/cgi-bin/cvsweb.cgi/cluster/gfs/gfs_quota/Makefile.diff?cvsroot=cluster&only_with_tag=RHEL4&r1=1.3&r2=1.3.2.1 http://sourceware.org/cgi-bin/cvsweb.cgi/cluster/gfs/gfs_quota/main.c.diff?cvsroot=cluster&only_with_tag=RHEL4&r1=1.2&r2=1.2.2.1 http://sourceware.org/cgi-bin/cvsweb.cgi/cluster/gfs/gfs_quota/gfs_quota.h.diff?cvsroot=cluster&only_with_tag=RHEL4&r1=1.2&r2=1.2.2.1
Is this backwards compatible? If I have a running filesystem with quota files already existing, can I put this patch in and then have it not take so long? Do I need to rebuild my quota files?
Yes. The quota file itself hasn't changed so you should be able to apply these patches and have 'gfs_quota list' work without taking long. Please note that this fix includes a patch to the gfs-kernel (GFS filesystem kernel module) as well, without which the 'gfs_quota list' functionality won't work.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0139.html