1196026 – quota: Used field shows data in "PB" after deleting data from volume(happens again)

Bug 1196026 - quota: Used field shows data in "PB" after deleting data from volume(happens again)

Summary: quota: Used field shows data in "PB" after deleting data from volume(happens ...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	quota
Sub Component:
Version:	mainline
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Assignee:	Raghavendra G
QA Contact:
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1336304 (view as bug list)
Depends On:	1061068
Blocks:
TreeView+	depends on / blocked

Reported:	2015-02-25 06:30 UTC by Vijaikumar Mallikarjuna
Modified:	2018-11-21 03:05 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Clone Of:	1061068
Environment:
Last Closed:	2018-11-21 03:05:56 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Comment 1 Vijaikumar Mallikarjuna 2016-04-07 01:37:29 UTC

Patch:http://review.gluster.org/#/c/13907/ fixes the problem

Comment 4 Manikandan 2016-06-29 05:26:24 UTC

*** Bug 1336304 has been marked as a duplicate of this bug. ***

Comment 5 Dan 2016-08-18 21:22:41 UTC

Good day.

We are still seeing this same issue on 3.8.1 on multiple volumes. Is this a confirmed fix? If so, what versions of gluster has it been added to?

When the quota value changes, there are no errors in the logs or any other information that indicates there is an issue.

Environment:
2 primary gluster nodes on RHEL7.2
2 secondary gluster nodes on RHEL7.2 configured for geo-replication


One volume configuration:
Volume Name: appdata
Type: Replicate
Volume ID: 3bffc422-1bf3-4593-bf2e-399e0b3e2a7f
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 192.168.66.102:/data/gluster/appdata/brick1
Brick2: 192.168.66.101:/data/gluster/appdata/brick1
Options Reconfigured:
changelog.changelog: on
geo-replication.ignore-pid-check: on
geo-replication.indexing: on
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on
cluster.enable-shared-storage: enable
# gluster volume status appdata
Status of volume: appdata
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 192.168.66.102:/data/gluster/
appdata/brick1                              49192     0          Y       8502 
Brick 192.168.66.101:/data/gluster/
appdata/brick1                              49194     0          Y       6600 
Self-heal Daemon on localhost               N/A       N/A        Y       22714
Quota Daemon on localhost                   N/A       N/A        Y       18396
Self-heal Daemon on 10.0.0.102              N/A       N/A        Y       7343 
Quota Daemon on 10.0.0.102                  N/A       N/A        Y       26234


Quota status post issue:
                  Path                   Hard-limit  Soft-limit      Used  Available  Soft-limit exceeded? Hard-limit exceeded?
-------------------------------------------------------------------------------------------------------------------------------
/                                          6.0GB     80%(4.8GB) 16384.0PB   6.0GB              No                   No


Could the 16384.0PB be the result of an evaluation expecting an unsigned integer but receiving a -1?

We have just updated to 3.8.2 but have yet to work on replicating the issue.

Thank you in advance for your assistance.

Regards,
Dan

Comment 6 Manikandan 2016-08-19 05:26:23 UTC

Hi Dan,

Good day too.

Was there an 'rm -rf *' performed on mount point? Are you testing it on a fresh install of 3.8.1? Could you also let us know the steps to reproduce.

--
Regards,
Manikandan Selvaganesh.

Comment 7 Dan 2016-08-19 20:30:29 UTC

We have a few gluster volumes with this issue. One of them has a series of 'find' commands that executes an 'rm' on the output. the other gluster volume has data manipulated by an application and we are waiting for the code to verify.

The 3.8.1 install had been in place for a couple weeks and the affected volumes had been in use for a week or so before the issue occurred. It is sporadic when the issue appears. We have disabled and re-enabled quotas between occurrences to get the volumes functional again.

Currently, a non production volume has its quota enabled since we patched to 3.8.2. The issue has yet to introduce itself again.

Thank you for looking into this.

Regards,
Dan

Comment 8 Dan 2016-08-24 19:21:50 UTC

We just had the non production volume hit the bug again some time in the last 24 hours. Quota said that we were using the 16.3 XB of data whilst df showed all 100GB and du show 18GB. On the georeplicated volume, df showed 93GB used and du showed 18GB. The quota log on the primary volume stopped logging at log rotation on 8/21 so we have no information on the quota behavior. The quota has been enabled for over a week without issue until now.

We did verify that the other application is executing an 'rm' within sftp. So far all applications are executing deletes/removes.

I saw that 3.8.3 was just released. I have not read the release notes yet. Hopefully there is a fix for this. 

Please let me know if you have identified anything or if you need more data from me.

Thank you.

Regards,
Dan

Comment 9 Manikandan 2016-08-25 06:55:28 UTC

Hi,

Can you paste the output of 'gluster v info', 'gluster v quota <VOLNAME> list' and the output of df -h?

It would also be great if you could attach the logs on the system where you are hitting the issue.

--
Regards,
Manikandan Selvaganesh.

Comment 10 Dan 2016-08-31 22:08:42 UTC

We had the issue again. Here is the requested information.

# gluster volume info exports_nonprod
Volume Name: exports_nonprod
Type: Replicate
Volume ID: 6a5ac071-2f33-47b7-9630-ec644e723906
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 192.168.66.102:/data/gluster/exports_nonprod/brick1
Brick2: 192.168.66.101:/data/gluster/exports_nonprod/brick1
Options Reconfigured:
features.quota-deem-statfs: on
changelog.changelog: on
geo-replication.ignore-pid-check: on
geo-replication.indexing: on
features.inode-quota: on
features.quota: on
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on
cluster.enable-shared-storage: enable

# gluster volume quota exports_nonprod list 
                  Path                   Hard-limit  Soft-limit      Used  Available  Soft-limit exceeded? Hard-limit exceeded?
-------------------------------------------------------------------------------------------------------------------------------
/                                         50.0GB     80%(40.0GB) 16384.0PB  73.8GB              No                   No

# df -h /run/gluster/exports_nonprod
Filesystem                     Size  Used Avail Use% Mounted on
localhost:exports_nonprod   50G   50G     0 100% /run/gluster/exports_nonprod

# du -sh /run/gluster/exports_nonprod
18G	/run/gluster/exports_nonprod

# du -sh /data/gluster/exports_nonprod
19G	/data/gluster/exports_nonprod

# du -csh /data/gluster/exports_nonprod/brick1/.[!.]* /data/gluster/exports_nonprod/brick1/* | sort -h 
8.0K	/data/gluster/exports_nonprod/brick1/P
16K	/data/gluster/exports_nonprod/brick1/C
16K	/data/gluster/exports_nonprod/brick1/.trashcan
48K	/data/gluster/exports_nonprod/brick1/u
56K	/data/gluster/exports_nonprod/brick1/d
56K	/data/gluster/exports_nonprod/brick1/t
19G	/data/gluster/exports_nonprod/brick1/.glusterfs
19G	total


The other item that I have found interesting is each time we have encountered this issue, the current log has only one entry. I find that interesting. 

# cat /var/log/glusterfs/quota-mount-exports_nonprod.log
[2016-08-28 08:42:12.082238] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing

We are looking at gluster 3.8.3 currently as well.

Thank you again. I hope you can assist!

Regards,
Dan

Comment 11 Dan 2016-09-01 19:24:16 UTC

Looking at the main gluster servers today, 'df' showed use of the 50GB quota at 8.5GB whilst 'du' showed 21GB. The 21GB lines up with what our geo-replication servers report. Strange that it under-reports.

Regards,
Dan

Comment 12 Dan 2016-09-07 20:12:20 UTC

Any further news?

Regards,
Dan

Comment 14 Dan 2016-10-07 15:37:16 UTC

Any updates?

Thank you,
Dan

Comment 15 hari gowtham 2018-11-21 03:05:56 UTC

Hi,

The above issue is because of negative accounting. It can happen because of a few combination of operations. We have fixed the operations that we are aware of causing this.  

Being an accounting issue, you can disable and enable quota to get the accounting done right. Or use the quota fsck script to fix the issues. 

fsck script: https://review.gluster.org/#/c/glusterfs/19179/

Closing this bug as quota is not being actively developed. If seen again the above work around can be used to fix it.

-Hari.

Note You need to log in before you can comment on or make changes to this bug.