Bug 1278419

Summary: Data Tiering:Data Loss:File migrations(flushing of data) to cold tier fails on detach tier with quota limits reached
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Nag Pavan Chilakam <nchilaka>
Component: tierAssignee: Bug Updates Notification Mailing List <rhs-bugs>
Status: CLOSED ERRATA QA Contact: Anil Shah <ashah>
Severity: urgent Docs Contact:
Priority: urgent    
Version: rhgs-3.1CC: byarlaga, nchilaka, rcyriac, rhs-bugs, sankarshan, storage-qa-internal, vmallika
Target Milestone: ---Keywords: ZStream
Target Release: RHGS 3.1.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1266841 Environment:
Last Closed: 2016-03-01 05:53:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1265623, 1266841, 1267812    
Bug Blocks: 1260783, 1260923    

Description Nag Pavan Chilakam 2015-11-05 12:35:41 UTC
+++ This bug was initially created as a clone of Bug #1266841 +++

Description of problem:
======================
When quota limits are reached and then we try to do a detach tier.
The detach tier completes but fails to flush data to cold tier.
All the files are listed as failed



Version-Release number of selected component (if applicable):
==========================================================
[root@zod ~]# rpm -qa|grep gluster
glusterfs-3.7.4-0.43.gitf139283.el7.centos.x86_64
glusterfs-fuse-3.7.4-0.43.gitf139283.el7.centos.x86_64
glusterfs-debuginfo-3.7.4-0.33.git1d02d4b.el7.centos.x86_64
glusterfs-api-3.7.4-0.43.gitf139283.el7.centos.x86_64
glusterfs-client-xlators-3.7.4-0.43.gitf139283.el7.centos.x86_64
glusterfs-server-3.7.4-0.43.gitf139283.el7.centos.x86_64
glusterfs-cli-3.7.4-0.43.gitf139283.el7.centos.x86_64
glusterfs-libs-3.7.4-0.43.gitf139283.el7.centos.x86_64
[root@zod ~]# gluster --version
glusterfs 3.7.4 built on Sep 19 2015 01:30:43
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.
[root@zod ~]# 


Steps to Reproduce:
==================
1.create a tier volume
2.enable quota and set hardlimit of root volume / to say 10gb
3. Now enable ctr and set the demote freq to say 1500sec
3.now fill the hardlimit to full
4. Now issue a detach tier start

It can be seen that the detach tier completes but fails as all the files would have failed to be flushed to cold tier. Only link files would have been created in cold bricks but no actual data movement would have happened





Eg:
Cold brick:
/rhs/brick3/angola:
total 8
---------T. 2 root root    0 Sep 28 12:47 file.11
---------T. 2 root root    0 Sep 28 12:47 file.12
---------T. 2 root root    0 Sep 28 12:47 file.13
---------T. 2 root root    0 Sep 28 12:47 file.15
---------T. 2 root root    0 Sep 28 12:47 file.32
---------T. 2 root root    0 Sep 28 12:47 file.33
---------T. 2 root root    0 Sep 28 12:47 file.35
---------T. 2 root root    0 Sep 28 12:47 file.42
---------T. 2 root root    0 Sep 28 12:47 file.44
---------T. 2 root root    0 Sep 28 12:47 file.45
---------T. 2 root root    0 Sep 28 12:47 file.54
---------T. 2 root root    0 Sep 28 12:47 file.62
---------T. 2 root root    0 Sep 28 12:47 file.63
drwxr-xr-x. 2 root root 8192 Sep 28 12:46 hotdir1


hot brick:
/rhs/brick7/angola_hot:
total 1269588
-rw-r-Sr-T. 2 root root 100000000 Sep 28 12:40 file.31
-rw-r-Sr-T. 2 root root 100000000 Sep 28 12:41 file.38
-rw-r-Sr-T. 2 root root 100000000 Sep 28 12:41 file.41
-rw-r-Sr-T. 2 root root 100000000 Sep 28 12:41 file.43
-rw-r-Sr-T. 2 root root 100000000 Sep 28 12:41 file.47
-rw-r-Sr-T. 2 root root 100000000 Sep 28 12:41 file.49
-rw-r-Sr-T. 2 root root 100000000 Sep 28 12:41 file.52
-rw-r-Sr-T. 2 root root 100000000 Sep 28 12:41 file.57
-rw-r-Sr-T. 2 root root 100000000 Sep 28 12:44 file.6
-rw-r-Sr-T. 2 root root 100000000 Sep 28 12:41 file.64
-rw-r-Sr-T. 2 root root 100000000 Sep 28 12:41 file.66
-rw-r-Sr-T. 2 root root 100000000 Sep 28 12:42 file.68
-rw-r-Sr-T. 2 root root 100000000 Sep 28 12:44 file.8
drwxr-xr-x. 2 root root      8192 Sep 28 12:37 hotdir1






Mount point, after commit of detach tier:
========================================
[root@localhost angola]# du -sh *
0	file.1
0	file.10
0	file.11
0	file.12
0	file.13
0	file.14
0	file.15
0	file.2
0	file.3
0	file.31
0	file.32
0	file.33
0	file.34
0	file.35
0	file.36
0	file.37
0	file.38
0	file.39
0	file.4
0	file.40
0	file.41
0	file.42
0	file.43
0	file.44
0	file.45
0	file.46
0	file.47
0	file.48
0	file.49
0	file.5
0	file.50
0	file.51
0	file.52
0	file.53
0	file.54
0	file.55
0	file.56
0	file.57
0	file.58
0	file.59
0	file.6
0	file.60
0	file.61
0	file.62
0	file.63
0	file.64
0	file.65
0	file.66
0	file.67

--- Additional comment from nchilaka on 2015-09-28 03:46:18 EDT ---

sosreports.eng.blr.redhat.com:/home/repo/sosreports/bug.1266841

--- Additional comment from Vijaikumar Mallikarjuna on 2015-09-28 05:55:29 EDT ---

Hi Nag Pavan,

What is the re-balance status? does it show any failure numbers. In this case it is expected that files that have failed to migrate will can be lost when commit is performed

--- Additional comment from nchilaka on 2015-09-28 07:17:19 EDT ---

Yes, there are failures under rebalance.
But given that detach tier must just move files b/w bricks and the end user doesn't bother about how it is done. We must not be seeing any failures.
May be I should change the title.
In Short "quotas must not stop data being moved from hot to cold tier during a detach tier"

--- Additional comment from Vijaikumar Mallikarjuna on 2015-10-01 02:29:18 EDT ---

Patch submitted: http://review.gluster.org/#/c/12266/

Comment 4 Anil Shah 2015-12-03 09:33:02 UTC
Bug verified on build glusterfs-3.7.5-8.el7rhgs.x86_64

Comment 6 errata-xmlrpc 2016-03-01 05:53:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0193.html