Bug 1291566

Summary: first file created after hot tier full fails to create, but later ends up as a stale erroneous file (file with ???????????)
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Joseph Elwin Fernandes <josferna>
Component: tierAssignee: Joseph Elwin Fernandes <josferna>
Status: CLOSED CURRENTRELEASE QA Contact: Nag Pavan Chilakam <nchilaka>
Severity: high Docs Contact:
Priority: high    
Version: rhgs-3.1CC: dlambrig, nchilaka, rcyriac, rhs-bugs, rkavunga, sankarshan, storage-qa-internal
Target Milestone: ---Keywords: Reopened, Triaged, ZStream
Target Release: RHGS 3.1.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.8rc2 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1289163
: 1293348 (view as bug list) Environment:
Last Closed: 2016-06-16 13:50:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1289163    
Bug Blocks: 1277154, 1293348    
Attachments:
Description Flags
qe validation log none

Comment 1 Vijay Bellur 2015-12-15 08:05:53 UTC
REVIEW: http://review.gluster.org/12969 (tier/dht : Multiple issues in HOT-TIER full) posted (#1) for review on master by Joseph Fernandes

Comment 2 Vijay Bellur 2015-12-20 14:21:36 UTC
REVIEW: http://review.gluster.org/12969 (tier/dht : Properly free file descriptors during data migration) posted (#2) for review on master by Joseph Fernandes

Comment 3 Vijay Bellur 2015-12-21 13:20:13 UTC
COMMIT: http://review.gluster.org/12969 committed in master by Dan Lambright (dlambrig) 
------
commit 9691ea1b203c82386ececc3c5ea9adad39304d7b
Author: Joseph Fernandes <josferna>
Date:   Tue Dec 15 13:32:29 2015 +0530

    tier/dht : Properly free file descriptors during data migration
    
    While tier migration, free src and dst fd's when create of
    destination or open of source fails.
    
    Change-Id: I62978a669c6c9fbab5fed9df2716b9b2ba00ddf1
    BUG: 1291566
    Signed-off-by: Joseph Fernandes <josferna>
    Reviewed-on: http://review.gluster.org/12969
    Tested-by: Gluster Build System <jenkins.com>
    Tested-by: NetBSD Build System <jenkins.org>
    Reviewed-by: N Balachandran <nbalacha>
    Reviewed-by: Dan Lambright <dlambrig>
    Tested-by: Dan Lambright <dlambrig>

Comment 7 Vivek Agarwal 2015-12-23 06:46:32 UTC
*** Bug 1289163 has been marked as a duplicate of this bug. ***

Comment 8 Nag Pavan Chilakam 2015-12-24 13:16:44 UTC
Following is my finding on the latest build(where the fix is supposed to be availbale):[root@zod dummy]# rpm -qa|grep gluster
glusterfs-api-3.7.5-13.el7rhgs.x86_64
glusterfs-client-xlators-3.7.5-13.el7rhgs.x86_64
glusterfs-server-3.7.5-13.el7rhgs.x86_64
glusterfs-3.7.5-13.el7rhgs.x86_64
glusterfs-cli-3.7.5-13.el7rhgs.x86_64
glusterfs-debuginfo-3.7.5-12.el7rhgs.x86_64
glusterfs-libs-3.7.5-13.el7rhgs.x86_64
glusterfs-fuse-3.7.5-13.el7rhgs.x86_64
[root@zod dummy]# 


I used the same steps to validate and found the following observations/issues:
(refer the steps mentioned at the beginning) 
1) at step 3 and 4, now when I create a file and it exceeds the disk capacity previously it used to fail out after the disk limit is hit saying input/output error, But now I don't see that happening, instead the mount point hangs there. A CLEAR CASE OF REGRESSION

2)Keeping the state as it is, I opened another terminal of same client and now tried step 5  and following is what i see:
  a) the database entry is made for the file even though file create fails as below:
[root@rhs-client1 bztest]# touch x1
touch: cannot touch ‘x1’: No space left on device
[root@rhs-client1 bztest]# ls
gogy5  gogy7  gogy8  gony  leg1  leg2  leg3  leg4  leg5  new1  new2  new3  new4  tile1  tile2  tile3  tile4  x10  x2  x3  x4  x5  x6  x7  x8  x9

====>database entry is there <========================

>>>>>>>>>>>> HOTBRICK#2 <<<<<<<<==
cd1833b2-abfe-446a-8090-87abba9a7a6c|1450961884|272248|0|0|0|0|0|0|0|0
de8c1aad-8b0f-46dd-8890-9c6af4588b5e|1450962132|339551|0|0|0|0|0|0|0|0
5ba281ad-e194-4d75-a0b3-f68bd409020a|1450962204|62682|0|0|0|0|0|0|0|0
cd1833b2-abfe-446a-8090-87abba9a7a6c|00000000-0000-0000-0000-000000000001|new1|0|0
de8c1aad-8b0f-46dd-8890-9c6af4588b5e|00000000-0000-0000-0000-000000000001|x2|0|0
5ba281ad-e194-4d75-a0b3-f68bd409020a|00000000-0000-0000-0000-000000000001|x1|0|0
###############################
Thu Dec 24 18:37:48 IST 2015
/dummy/brick101/bztest_hot:



b)however, I don't see the file getting created.


Conclusion: It is a partial fix

Comment 9 Nag Pavan Chilakam 2015-12-24 13:23:00 UTC
Created attachment 1109206 [details]
qe validation log

Comment 10 Nag Pavan Chilakam 2015-12-28 05:32:44 UTC
Moving BZ to failed_qa due to it being a partial fix

Comment 11 Joseph Elwin Fernandes 2015-12-30 06:52:46 UTC
(In reply to nchilaka from comment #8)
> Following is my finding on the latest build(where the fix is supposed to be
> availbale):[root@zod dummy]# rpm -qa|grep gluster
> glusterfs-api-3.7.5-13.el7rhgs.x86_64
> glusterfs-client-xlators-3.7.5-13.el7rhgs.x86_64
> glusterfs-server-3.7.5-13.el7rhgs.x86_64
> glusterfs-3.7.5-13.el7rhgs.x86_64
> glusterfs-cli-3.7.5-13.el7rhgs.x86_64
> glusterfs-debuginfo-3.7.5-12.el7rhgs.x86_64
> glusterfs-libs-3.7.5-13.el7rhgs.x86_64
> glusterfs-fuse-3.7.5-13.el7rhgs.x86_64
> [root@zod dummy]# 
> 
> 
> I used the same steps to validate and found the following
> observations/issues:
> (refer the steps mentioned at the beginning) 
> 1) at step 3 and 4, now when I create a file and it exceeds the disk
> capacity previously it used to fail out after the disk limit is hit saying
> input/output error, But now I don't see that happening, instead the mount
> point hangs there. A CLEAR CASE OF REGRESSION

  Suggest you to create a new bug for this.

> 
> 2)Keeping the state as it is, I opened another terminal of same client and
> now tried step 5  and following is what i see:
>   a) the database entry is made for the file even though file create fails
> as below:
> [root@rhs-client1 bztest]# touch x1
> touch: cannot touch ‘x1’: No space left on device
> [root@rhs-client1 bztest]# ls
> gogy5  gogy7  gogy8  gony  leg1  leg2  leg3  leg4  leg5  new1  new2  new3 
> new4  tile1  tile2  tile3  tile4  x10  x2  x3  x4  x5  x6  x7  x8  x9
> 
> ====>database entry is there <========================
> 
> >>>>>>>>>>>> HOTBRICK#2 <<<<<<<<==
> cd1833b2-abfe-446a-8090-87abba9a7a6c|1450961884|272248|0|0|0|0|0|0|0|0
> de8c1aad-8b0f-46dd-8890-9c6af4588b5e|1450962132|339551|0|0|0|0|0|0|0|0
> 5ba281ad-e194-4d75-a0b3-f68bd409020a|1450962204|62682|0|0|0|0|0|0|0|0
> cd1833b2-abfe-446a-8090-87abba9a7a6c|00000000-0000-0000-0000-
> 000000000001|new1|0|0
> de8c1aad-8b0f-46dd-8890-9c6af4588b5e|00000000-0000-0000-0000-
> 000000000001|x2|0|0
> 5ba281ad-e194-4d75-a0b3-f68bd409020a|00000000-0000-0000-0000-
> 000000000001|x1|0|0
> ###############################
> Thu Dec 24 18:37:48 IST 2015
> /dummy/brick101/bztest_hot:
> 
> 
> 
> b)however, I don't see the file getting created.
> 
> 
> Conclusion: It is a partial fix

 This is a clear issue of recording in the wind path and not the unwind path. This cannot be fixed right now as it would require substantial code change in CTR. Even this bug https://bugzilla.redhat.com/show_bug.cgi?id=1289118 was deferred for the same reason.

Comment 12 Joseph Elwin Fernandes 2015-12-31 10:11:30 UTC
(In reply to Joseph Elwin Fernandes from comment #11)
> (In reply to nchilaka from comment #8)
> > Following is my finding on the latest build(where the fix is supposed to be
> > availbale):[root@zod dummy]# rpm -qa|grep gluster
> > glusterfs-api-3.7.5-13.el7rhgs.x86_64
> > glusterfs-client-xlators-3.7.5-13.el7rhgs.x86_64
> > glusterfs-server-3.7.5-13.el7rhgs.x86_64
> > glusterfs-3.7.5-13.el7rhgs.x86_64
> > glusterfs-cli-3.7.5-13.el7rhgs.x86_64
> > glusterfs-debuginfo-3.7.5-12.el7rhgs.x86_64
> > glusterfs-libs-3.7.5-13.el7rhgs.x86_64
> > glusterfs-fuse-3.7.5-13.el7rhgs.x86_64
> > [root@zod dummy]# 
> > 
> > 
> > I used the same steps to validate and found the following
> > observations/issues:
> > (refer the steps mentioned at the beginning) 
> > 1) at step 3 and 4, now when I create a file and it exceeds the disk
> > capacity previously it used to fail out after the disk limit is hit saying
> > input/output error, But now I don't see that happening, instead the mount
> > point hangs there. A CLEAR CASE OF REGRESSION
> 
>   Suggest you to create a new bug for this.

I was able to reproduce this issue. Looking into it. Requesting you to create a new bug for this issue.

Comment 13 Nag Pavan Chilakam 2016-01-04 06:30:44 UTC
raised a new bug "1295293 - first file created after hot tier full fails to create, but gets database entry " for the former part, which was not fixed.

As the later part was fixed and now a new bug is raised for the former part of the problem, moving bz to verified

Comment 14 Nag Pavan Chilakam 2016-01-04 06:31:44 UTC
changing the title according to my previous comment

Comment 16 errata-xmlrpc 2016-03-01 06:03:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0193.html

Comment 17 Niels de Vos 2016-06-16 13:50:45 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report.

glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user