848253 – quota: brick process kill allows quota limit cross

Bug 848253 - quota: brick process kill allows quota limit cross

Summary: quota: brick process kill allows quota limit cross

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterfs
Sub Component:
Version:	unspecified
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Bug Updates Notification Mailing List
QA Contact:	Saurabh
Docs Contact:
URL:
Whiteboard:
Depends On:	821725
Blocks:
TreeView+	depends on / blocked

Reported:	2012-08-15 01:48 UTC by Vidya Sakar
Modified:	2016-01-19 06:10 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	Known Issue
Doc Text:	Cause: when quota limit is set on a distributed volume, and if a brick goes down while I/O is happening, there is a chance that the effective 'quota limit' can be exceeded as distribute translator would not see the contribution from the offline brick. Consequence: 'quota limit' gets exceeded. Workaround (if any): use 'replication' in case one needs 100% consistency when a node goes down. Result: When replicate is used, it would take care of single brick failure, and quota limit will be maintained as is.
Clone Of:	821725
Environment:
Last Closed:	2013-10-21 04:47:02 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Vidya Sakar 2012-08-15 01:48:25 UTC

+++ This bug was initially created as a clone of Bug #821725 +++

Description of problem:
volume type: distribute-replicate(2x2)
number of nodes: 2
[root@RHS-71 ~]# gluster volume status dist-rep-quota
Status of volume: dist-rep-quota
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick 172.17.251.71:/export/dr-q			24017	Y	4078
Brick 172.17.251.72:/export/drr-q			24010	Y	3183
Brick 172.17.251.71:/export/ddr-q			24018	Y	3252
Brick 172.17.251.72:/export/ddrr-q			24011	Y	3189
NFS Server on localhost					38467	Y	3965
Self-heal Daemon on localhost				N/A	Y	3942
NFS Server on 172.17.251.74				38467	Y	32306
Self-heal Daemon on 172.17.251.74			N/A	Y	32292
NFS Server on 172.17.251.73				38467	Y	6940
Self-heal Daemon on 172.17.251.73			N/A	Y	6926
NFS Server on 172.17.251.72				38467	Y	3475
Self-heal Daemon on 172.17.251.72			N/A	Y	3461

the problem happens when that quota limit is getting crossed when on one of the brick is brought down 

Version-Release number of selected component (if applicable):
3.3.0qa40


How reproducible:
always

Steps to Reproduce:
1. put a limit of 2GB on the root of the volume

2. from nfs mount try to add data inside a directory upto to 1GB in size.(many files)

3. now kill one of the process using "kill <pid>", in this case brick brought down was for "172.17.251.71:/export/dr-q"

4. from nfs mount still try to keep adding data

  
Actual results:
the data is allowed get added and cross the limit

Expected results:
the quota limit should still be honored

Additional info:
even after bringing the brick process back the data addition was successfully happening, until self-heal is not triggered using "find . | xargs stat" over nfs mount.

Comment 2 Amar Tumballi 2013-02-04 11:14:29 UTC

Need to have an extra flag set on the directory when quota limit is reached on a directory.

Extremely hard to keep the information about the lost brick for quota computation, in non-replicated setup. But we need an enhancement to handle quota limit set flag in xattr when its ~95% of limit value. That way, we will not be missing quota limit by a large margin.

This issue should just be marked as Known-Issues, and be handled by setting the 'right expectation' in admin, than that of technical solution, which would make the performance crawl, and solution never satisfying for 100% user base.

Keeping this bug open till we have better understanding of requirement.

Comment 4 Vivek Agarwal 2013-10-17 08:58:34 UTC

Per discussion with Saurabh, moving this out of u1 list as there is a design change

Comment 5 Vivek Agarwal 2013-10-21 04:47:02 UTC

Per discussion with the PM and QE, not to be supported.

Note You need to log in before you can comment on or make changes to this bug.