Bug 1786681

Summary: quota_fsck script KeyError: 'contri_size'
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: hari gowtham <hgowtham>
Component: quotaAssignee: Srijan Sivakumar <ssivakum>
Status: CLOSED ERRATA QA Contact: Arthy Loganathan <aloganat>
Severity: low Docs Contact:
Priority: unspecified    
Version: rhgs-3.5CC: bugs, giridhar.ramaraju, pprakash, puebele, rhs-bugs, rkothiya, sheggodu, ssivakum, storage-qa-internal
Target Milestone: ---Keywords: Reopened, ZStream
Target Release: RHGS 3.5.z Batch Update 3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-6.0-38 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: 1764129 Environment:
Last Closed: 2020-12-17 04:50:48 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1764129    
Bug Blocks:    

Description hari gowtham 2019-12-27 06:58:22 UTC
+++ This bug was initially created as a clone of Bug #1764129 +++

Description of problem:
When the quota_fsck script is run on a gluster filesystem with mismatch,
it errors out with the followinf errors in a certain scenario.

python quota_fsck.py /bricks/brick4/v > output_brick
getfattr: Removing leading '/' from absolute path names
getfattr: Removing leading '/' from absolute path names
Traceback (most recent call last):
  File "quota_fsck.py", line 374, in <module>
    walktree(brick_path, hard_link_dict)
  File "quota_fsck.py", line 317, in walktree
    verify_file_xattr(pathname, stbuf)
  File "quota_fsck.py", line 236, in verify_file_xattr
    print_msg(QUOTA_SIZE_MISMATCH, path, xattr_dict, stbuf)
  File "quota_fsck.py", line 64, in print_msg
    print('%-24s %-60s %-12i %-12i' % {"Size Mismatch", path, xattr_dict['contri_size'],
KeyError: 'contri_size'
============================

Version-Release number of selected component (if applicable):
mainline on release-7 as well.

How reproducible:


Steps to Reproduce:
1. Create a quota mismatch
2. run the fsck script on the backend
3. it will error out.

Actual results:
python quota_fsck.py /bricks/brick4/v > output_brick
getfattr: Removing leading '/' from absolute path names
getfattr: Removing leading '/' from absolute path names
Traceback (most recent call last):
  File "quota_fsck.py", line 374, in <module>
    walktree(brick_path, hard_link_dict)
  File "quota_fsck.py", line 317, in walktree
    verify_file_xattr(pathname, stbuf)
  File "quota_fsck.py", line 236, in verify_file_xattr
    print_msg(QUOTA_SIZE_MISMATCH, path, xattr_dict, stbuf)
  File "quota_fsck.py", line 64, in print_msg
    print('%-24s %-60s %-12i %-12i' % {"Size Mismatch", path, xattr_dict['contri_size'],
KeyError: 'contri_size'
============================

Expected results:
should not error out.


Additional info:

--- Additional comment from Worker Ant on 2019-10-22 10:06:02 UTC ---

REVIEW: https://review.gluster.org/23586 (Scripts: quota_fsck script KeyError: 'contri_size') posted (#2) for review on master by hari gowtham

--- Additional comment from Worker Ant on 2019-10-22 23:56:41 UTC ---

REVIEW: https://review.gluster.org/23586 (Scripts: quota_fsck script KeyError: 'contri_size') merged (#3) on master by Atin Mukherjee

--- Additional comment from Worker Ant on 2019-10-24 12:16:01 UTC ---

REVIEW: https://review.gluster.org/23608 (scripts: quota_fsck script TypeError: %d format:not dict) posted (#1) for review on master by hari gowtham

--- Additional comment from Worker Ant on 2019-11-06 13:12:14 UTC ---

REVIEW: https://review.gluster.org/23608 (scripts: quota_fsck script TypeError: %d format:not dict) merged (#2) on master by Amar Tumballi

Comment 2 Yaniv Kaul 2020-05-05 14:21:03 UTC
QE - do we have automation for this script?

Comment 9 Arthy Loganathan 2020-09-15 10:44:15 UTC
Could you provide me the detailed steps to reproduce this issue?

Comment 10 Srijan Sivakumar 2020-09-15 11:57:24 UTC
 (In reply to Arthy Loganathan from comment #9)
> Could you provide me the detailed steps to reproduce this issue?

 Hi Arthy, this seems to be a accounting mismatch issue in quota, and that's when I think the error was seen while running the script (quota_fsck.py). Now, we don't have a documented way of reproducing the accounting mismatch. But it is usually seen if the quota is set on a directory which has a lot of sub branches and files under it.

 So a rough step is,
 1. Create a lot of files with emphasis on number of files and the depth of directory and files as well as size that'd cause the quota to take time to crawl.
 2. Then enable quota and put a limit.
 3. Now, quota will take it's time to crawl the backend and in that duration, if the quota daemon is killed or if the underlying host went down, that will stop the crawling process in it's tracks.
 4. This can cause the accounting mismatch. ( Provided crawling hasn't finished it's task).
 5. The script could be used then.

Now, this is a rough way of re-creating a accounting mismatch.

Comment 11 Arthy Loganathan 2020-11-09 07:57:52 UTC

1. gluster volume create testvol_replicated replica 3 10.70.46.157:/bricks/brick0/testvol_replicated_brick0 10.70.46.56:/bricks/brick0/testvol_replicated_brick1 10.70.47.142:/bricks/brick0/testvol_replicated_brick2 --mode=script

2. gluster volume start testvol_replicated --mode=script

3. Created IO using,
   /usr/bin/env python /usr/share/glustolibs/io/scripts/file_dir_ops.py create_deep_dirs_with_files -d 10 -l 10 -n 10 -f 10 /mnt/testvol_replicated_glusterfs

4. gluster volume quota testvol_replicated enable

5. gluster volume quota testvol_replicated limit-usage / 4GB  --mode=script

6. Kill the quota daemon.

7. python quota_fsck.py /bricks/brick0/testvol_replicated_brick0

No errors are seen while running quota_fsck.py during mismatch.

           Size Mismatch /bricks/brick0/testvol_replicated_brick0/user1/dir0/dir0/dir0/dir0/dir0/dir0/dir0/dir0/dir0/dir0 {'file_count': 0, 'dir_count': 1, 'version': u'2', 'parents': {u'298f6dd8-fd81-434c-bd8f-6c5125553ca7': {'contri_file_count': 0, 'contri_size': 0, 'contri_dir_count': 1}}, 'size': 0}      2655232
mismatch
           Size Mismatch /bricks/brick0/testvol_replicated_brick0/user1/dir0/dir0/dir0/dir0/dir0/dir0/dir0/dir0/dir0/dir1 {'file_count': 0, 'dir_count': 1, 'version': u'2', 'parents': {u'298f6dd8-fd81-434c-bd8f-6c5125553ca7': {'contri_file_count': 0, 'contri_size': 0, 'contri_dir_count': 1}}, 'size': 0}      3197952
mismatch
           Size Mismatch /bricks/brick0/testvol_replicated_brick0/user1/dir0/dir0/dir0/dir0/dir0/dir0/dir0/dir0/dir0/dir2 {'file_count': 0, 'dir_count': 1, 'version': u'2', 'parents': {u'298f6dd8-fd81-434c-bd8f-6c5125553ca7': {'contri_file_count': 0, 'contri_size': 0, 'contri_dir_count': 1}}, 'size': 0}      6302720
mismatch
           Size Mismatch /bricks/brick0/testvol_replicated_brick0/user1/dir0/dir0/dir0/dir0/dir0/dir0/dir0/dir0/dir0/dir3 {'file_count': 0, 'dir_count': 1, 'version': u'2', 'parents': {u'298f6dd8-fd81-434c-bd8f-6c5125553ca7': {'contri_file_count': 0, 'contri_size': 0, 'contri_dir_count': 1}}, 'size': 0}      4206592
mismatch

Since quota feature is deprecated, this scenario is not automated.

Verified the fix in,
glusterfs-server-6.0-47.el7rhgs.x86_64

Comment 13 errata-xmlrpc 2020-12-17 04:50:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (glusterfs bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:5603