+++ This bug was initially created as a clone of Bug #1764129 +++ Description of problem: When the quota_fsck script is run on a gluster filesystem with mismatch, it errors out with the followinf errors in a certain scenario. python quota_fsck.py /bricks/brick4/v > output_brick getfattr: Removing leading '/' from absolute path names getfattr: Removing leading '/' from absolute path names Traceback (most recent call last): File "quota_fsck.py", line 374, in <module> walktree(brick_path, hard_link_dict) File "quota_fsck.py", line 317, in walktree verify_file_xattr(pathname, stbuf) File "quota_fsck.py", line 236, in verify_file_xattr print_msg(QUOTA_SIZE_MISMATCH, path, xattr_dict, stbuf) File "quota_fsck.py", line 64, in print_msg print('%-24s %-60s %-12i %-12i' % {"Size Mismatch", path, xattr_dict['contri_size'], KeyError: 'contri_size' ============================ Version-Release number of selected component (if applicable): mainline on release-7 as well. How reproducible: Steps to Reproduce: 1. Create a quota mismatch 2. run the fsck script on the backend 3. it will error out. Actual results: python quota_fsck.py /bricks/brick4/v > output_brick getfattr: Removing leading '/' from absolute path names getfattr: Removing leading '/' from absolute path names Traceback (most recent call last): File "quota_fsck.py", line 374, in <module> walktree(brick_path, hard_link_dict) File "quota_fsck.py", line 317, in walktree verify_file_xattr(pathname, stbuf) File "quota_fsck.py", line 236, in verify_file_xattr print_msg(QUOTA_SIZE_MISMATCH, path, xattr_dict, stbuf) File "quota_fsck.py", line 64, in print_msg print('%-24s %-60s %-12i %-12i' % {"Size Mismatch", path, xattr_dict['contri_size'], KeyError: 'contri_size' ============================ Expected results: should not error out. Additional info: --- Additional comment from Worker Ant on 2019-10-22 10:06:02 UTC --- REVIEW: https://review.gluster.org/23586 (Scripts: quota_fsck script KeyError: 'contri_size') posted (#2) for review on master by hari gowtham --- Additional comment from Worker Ant on 2019-10-22 23:56:41 UTC --- REVIEW: https://review.gluster.org/23586 (Scripts: quota_fsck script KeyError: 'contri_size') merged (#3) on master by Atin Mukherjee --- Additional comment from Worker Ant on 2019-10-24 12:16:01 UTC --- REVIEW: https://review.gluster.org/23608 (scripts: quota_fsck script TypeError: %d format:not dict) posted (#1) for review on master by hari gowtham --- Additional comment from Worker Ant on 2019-11-06 13:12:14 UTC --- REVIEW: https://review.gluster.org/23608 (scripts: quota_fsck script TypeError: %d format:not dict) merged (#2) on master by Amar Tumballi
QE - do we have automation for this script?
Could you provide me the detailed steps to reproduce this issue?
(In reply to Arthy Loganathan from comment #9) > Could you provide me the detailed steps to reproduce this issue? Hi Arthy, this seems to be a accounting mismatch issue in quota, and that's when I think the error was seen while running the script (quota_fsck.py). Now, we don't have a documented way of reproducing the accounting mismatch. But it is usually seen if the quota is set on a directory which has a lot of sub branches and files under it. So a rough step is, 1. Create a lot of files with emphasis on number of files and the depth of directory and files as well as size that'd cause the quota to take time to crawl. 2. Then enable quota and put a limit. 3. Now, quota will take it's time to crawl the backend and in that duration, if the quota daemon is killed or if the underlying host went down, that will stop the crawling process in it's tracks. 4. This can cause the accounting mismatch. ( Provided crawling hasn't finished it's task). 5. The script could be used then. Now, this is a rough way of re-creating a accounting mismatch.
1. gluster volume create testvol_replicated replica 3 10.70.46.157:/bricks/brick0/testvol_replicated_brick0 10.70.46.56:/bricks/brick0/testvol_replicated_brick1 10.70.47.142:/bricks/brick0/testvol_replicated_brick2 --mode=script 2. gluster volume start testvol_replicated --mode=script 3. Created IO using, /usr/bin/env python /usr/share/glustolibs/io/scripts/file_dir_ops.py create_deep_dirs_with_files -d 10 -l 10 -n 10 -f 10 /mnt/testvol_replicated_glusterfs 4. gluster volume quota testvol_replicated enable 5. gluster volume quota testvol_replicated limit-usage / 4GB --mode=script 6. Kill the quota daemon. 7. python quota_fsck.py /bricks/brick0/testvol_replicated_brick0 No errors are seen while running quota_fsck.py during mismatch. Size Mismatch /bricks/brick0/testvol_replicated_brick0/user1/dir0/dir0/dir0/dir0/dir0/dir0/dir0/dir0/dir0/dir0 {'file_count': 0, 'dir_count': 1, 'version': u'2', 'parents': {u'298f6dd8-fd81-434c-bd8f-6c5125553ca7': {'contri_file_count': 0, 'contri_size': 0, 'contri_dir_count': 1}}, 'size': 0} 2655232 mismatch Size Mismatch /bricks/brick0/testvol_replicated_brick0/user1/dir0/dir0/dir0/dir0/dir0/dir0/dir0/dir0/dir0/dir1 {'file_count': 0, 'dir_count': 1, 'version': u'2', 'parents': {u'298f6dd8-fd81-434c-bd8f-6c5125553ca7': {'contri_file_count': 0, 'contri_size': 0, 'contri_dir_count': 1}}, 'size': 0} 3197952 mismatch Size Mismatch /bricks/brick0/testvol_replicated_brick0/user1/dir0/dir0/dir0/dir0/dir0/dir0/dir0/dir0/dir0/dir2 {'file_count': 0, 'dir_count': 1, 'version': u'2', 'parents': {u'298f6dd8-fd81-434c-bd8f-6c5125553ca7': {'contri_file_count': 0, 'contri_size': 0, 'contri_dir_count': 1}}, 'size': 0} 6302720 mismatch Size Mismatch /bricks/brick0/testvol_replicated_brick0/user1/dir0/dir0/dir0/dir0/dir0/dir0/dir0/dir0/dir0/dir3 {'file_count': 0, 'dir_count': 1, 'version': u'2', 'parents': {u'298f6dd8-fd81-434c-bd8f-6c5125553ca7': {'contri_file_count': 0, 'contri_size': 0, 'contri_dir_count': 1}}, 'size': 0} 4206592 mismatch Since quota feature is deprecated, this scenario is not automated. Verified the fix in, glusterfs-server-6.0-47.el7rhgs.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (glusterfs bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:5603