Description of problem:
Output from du -sh does not agree with the reported free space from gluster quota list.
[root@node1 ~]# gluster volume quota volume1 list
Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded?
/node1 5.0TB 80%(4.0TB) 3.5TB 1.5TB No No
Actual usage from hosts perspective:-
[root@host1 mount]# du -sh .
No split-brain issues
[root@node2 ~]# gluster volume heal volume1 info split-brain
Number of entries in split-brain: 0
Number of entries in split-brain: 0
Version-Release number of selected component (if applicable):
Moving the component to 'quota' as this issue seems to be with quota
There are 2 metrics here, Used space and available space.
a) du -sh will give the used space (not effected by quota limit).
this should approx match with the used space given by quota list on /
There is a discrepancy here i believe 593G vs 3.5T
b) df -h should give the available space. When quota limit is enabled (with deem-statfs option on) then the available space should honor the quota limit.
The available space on df -h does seems to match with limit-set (1.5 T)
without deem-statfs it shows 19T
Filesystem Size Used Avail Use% Mounted on
cuarhstor-vip:/oracle_backups/cuaorc001a 5.0T 3.6T 1.5T 71% /backup
# gluster volume set oracle_backups features.quota-deem-statfs off
Back on host
cuarhstor-vip:/oracle_backups/cuaorc001a 25T 5.9T 19T 24% /backup
So, I would like you to confirm if used space accounting(a) is indeed the problem.
If that is the case then, it would help to know :
1) backend xattr value on the bricks.
# getfattr -d -m. -e hex path
we need this detail for path with mismatch of used space
2) what operations were being run that led to the scenario
From the logs I also see ,
[2016-05-29 07:14:42.213879] W [quota.c:3439:quota_statfs_validate_cbk] 0-oracle_backups-quota: quota context is not present in inode (gfid:00000000-0000-0000-0000-000000000001)
[2016-05-29 07:14:45.322440] W [marker-quota.c:2070:mq_initiate_quota_txn] (-->/usr/lib64/glusterfs/126.96.36.199rhs/xlator/features/locks.so(pl_truncate_cbk+0xf1) [0x7f399ecb25f1] (-->/usr/lib64/glusterfs/188.8.131.52rhs/xlator/performance/io-threads.so(iot_ftruncate_cbk+0xcc) [0x7f399ea93c9c] (-->/usr/lib64/glusterfs/184.108.40.206rhs/xlator/features/marker.so(marker_ftruncate_cbk+0x16c) [0x7f399e66f3dc]))) 0-oracle_backups-marker: could not allocate contribution node for (<gfid:b5d0b684-5f33-4efd-a281-dfa784f30349>) parent: ((null))
[2016-05-29 12:01:05.125254] W [quota.c:3447:quota_statfs_validate_cbk] 0-oracle_backups-quota: size key not present in dict
Need to see why quota context is not available for root,
When size key is not avaialable, there seems to be a bug in quota_statfs_validate_cbk , where we incorrectly update the quota context with wrong size. I will look further into this.
Thanks for looking into this. What exactly do you need from me to 'confirm if used space accounting is indeed the problem.'? Are you just interested in the problem from their perspective? Whether it's the used space or free space they're concerned with, or is there a specific test that you would like them to perform to determine the answer to your question?
(In reply to Cal Calhoun from comment #4)
> Hello Sanoj;
> Thanks for looking into this. What exactly do you need from me to
> 'confirm if used space accounting is indeed the problem.'? Are you just
> interested in the problem from their perspective? Whether it's the used
> space or free space they're concerned with, or is there a specific test that
> you would like them to perform to determine the answer to your question?
Correct, both actually. Wanted to know user perspective as well as the output of getfattr command on dirs where quota list accounting does not match with du accounting.
getfattr -d -m. -e hex <brick_path>/node1
@ Sanoj: Yes, the customer indicates that the output of the du -sh command run against the brick reflects approximately what they believe the correct size should be.
Thanks for looking into this.
Created attachment 1365825 [details]
Attaching scripts to determine if the issue is with accounting
Created attachment 1365826 [details]
log accounting script