Bug 1293224

Summary: Disperse: Disperse volume (cold vol) crashes while writing files on tier volume
Product: [Community] GlusterFS Reporter: Ashish Pandey <aspandey>
Component: disperseAssignee: Ashish Pandey <aspandey>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.7.6CC: bugs, pkarampu
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.7.7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1293223 Environment:
Last Closed: 2016-04-19 07:22:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1293223, 1293228    
Bug Blocks:    

Description Ashish Pandey 2015-12-21 06:39:57 UTC
+++ This bug was initially created as a clone of Bug #1293223 +++

Description of problem:

Disperse volume crashes while trying to write multiple files using multiple threads on fuse mounted tier volume.

Version-Release number of selected component (if applicable):
[root@apandey glusterfs]# glusterfs --version
glusterfs 3.8dev built on Dec 21 2015 10:49:16
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2013 Red Hat, Inc. <http://www.redhat.com/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.


How reproducible:
100%

Steps to Reproduce:
1. Create a tier volume with 2 X (4+2) disperse volume and 6 X (2) replica volume.
2. Mount it through fuse.
3. start writing various files with multiple threads on mount point.
crefi --multi -n 10 -b 10 -d 10 --max=1024k --min=5k --random -T 5 -t text -I 5 --fop=create /mnt/gfs

Actual results:
After some time (1 min to 30 min) CRASH happens in disperse volume.


Expected results:
No Crash should be there and all the read, write and modify operation should be successful. 

Additional info:
[root@apandey glusterfs]# gluster v info
 
Volume Name: vol
Type: Tier
Volume ID: a9007561-0c50-463c-b37d-59f3992f339e
Status: Started
Number of Bricks: 24
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 6 x 2 = 12
Brick1: apandey:/brick/gluster/r12
Brick2: apandey:/brick/gluster/r11
Brick3: apandey:/brick/gluster/r10
Brick4: apandey:/brick/gluster/r9
Brick5: apandey:/brick/gluster/r8
Brick6: apandey:/brick/gluster/r7
Brick7: apandey:/brick/gluster/r6
Brick8: apandey:/brick/gluster/r5
Brick9: apandey:/brick/gluster/r4
Brick10: apandey:/brick/gluster/r3
Brick11: apandey:/brick/gluster/r2
Brick12: apandey:/brick/gluster/r1
Cold Tier:
Cold Tier Type : Distributed-Disperse
Number of Bricks: 2 x (4 + 2) = 12
Brick13: apandey:/brick/gluster/v1
Brick14: apandey:/brick/gluster/v2
Brick15: apandey:/brick/gluster/v3
Brick16: apandey:/brick/gluster/v4
Brick17: apandey:/brick/gluster/v5
Brick18: apandey:/brick/gluster/v6
Brick19: apandey:/brick/gluster/v7
Brick20: apandey:/brick/gluster/v8
Brick21: apandey:/brick/gluster/v9
Brick22: apandey:/brick/gluster/v10
Brick23: apandey:/brick/gluster/v11
Brick24: apandey:/brick/gluster/v12
Options Reconfigured:
cluster.tier-demote-frequency: 60
cluster.tier-promote-frequency: 60
cluster.write-freq-threshold: 1
cluster.read-freq-threshold: 1
features.record-counters: on
cluster.watermark-hi: 5
cluster.watermark-low: 1
cluster.tier-mode: cache
features.ctr-enabled: on
diagnostics.client-log-level: WARNING
performance.readdir-ahead: on

Comment 1 Vijay Bellur 2015-12-22 11:46:03 UTC
REVIEW: http://review.gluster.org/13066 (cluster/ec: Get size and config for invalid inode) posted (#1) for review on release-3.7 by Ashish Pandey (aspandey)

Comment 2 Vijay Bellur 2016-01-13 12:27:59 UTC
COMMIT: http://review.gluster.org/13066 committed in release-3.7 by Pranith Kumar Karampuri (pkarampu) 
------
commit b6b68fb15efc614e3718cbc42c6231ee9ac2593b
Author: Ashish Pandey <aspandey>
Date:   Mon Dec 21 16:04:20 2015 +0530

    cluster/ec: Get size and config for invalid inode
    
    Problem:
    After creating an inode and before linking it
    to inode table, if there is a request to setattr
    for that file, it fails and leads to crash.
    Before linking inode to inode table ia_type is IA_INVAL
    which will casue have_size and have_config as zero.
    
    Solution:
    Check and get size and config if an inode is invalid
    
    master-
    http://review.gluster.org/#/c/13039/
    
    Change-Id: I0c0e564940b1b9f351369a76ab14f6b4aa81f23b
    BUG: 1293224
    Signed-off-by: Ashish Pandey <aspandey>
    Reviewed-on: http://review.gluster.org/13066
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>
    Tested-by: Pranith Kumar Karampuri <pkarampu>

Comment 3 Vijay Bellur 2016-01-14 08:48:00 UTC
REVIEW: http://review.gluster.org/13239 (cluster/ec: Handle non-existent config xattr for non regular files) posted (#1) for review on release-3.7 by Xavier Hernandez (xhernandez)

Comment 4 Vijay Bellur 2016-01-14 08:55:04 UTC
REVIEW: http://review.gluster.org/13239 (cluster/ec: Handle non-existent config xattr for non regular files) posted (#2) for review on release-3.7 by Xavier Hernandez (xhernandez)

Comment 5 Vijay Bellur 2016-01-15 10:03:17 UTC
REVIEW: http://review.gluster.org/13239 (cluster/ec: Handle non-existent config xattr for non regular files) posted (#3) for review on release-3.7 by Xavier Hernandez (xhernandez)

Comment 6 Vijay Bellur 2016-01-20 07:04:03 UTC
COMMIT: http://review.gluster.org/13239 committed in release-3.7 by Pranith Kumar Karampuri (pkarampu) 
------
commit 626534e94b4ac07b99a2cc479f004935664a09a2
Author: Xavier Hernandez <xhernandez>
Date:   Thu Jan 14 09:36:33 2016 +0100

    cluster/ec: Handle non-existent config xattr for non regular files
    
    Since we now try to get the 'trusted.ec.config' xattr for inodes of
    type IA_INVAL (these inodes will be set to some valid type later),
    if that inode corresponds to a non regular file, the xattr won't
    exist and we will handle this as an error when it's not.
    
    This patch solves the problem by only considering errors for inodes
    that are already known to be regular files.
    
    > Change-Id: Id72f314e209459236d75cf087fc51e09943756b4
    > BUG: 1293223
    > Signed-off-by: Xavier Hernandez <xhernandez>
    > Reviewed-on: http://review.gluster.org/13238
    
    Change-Id: I48a475ce889607e9b909f699b5d7f75b0657cb22
    BUG: 1293224
    Signed-off-by: Xavier Hernandez <xhernandez>
    Reviewed-on: http://review.gluster.org/13239
    Smoke: Gluster Build System <jenkins.com>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.com>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>
    Tested-by: Pranith Kumar Karampuri <pkarampu>

Comment 7 Vijay Bellur 2016-02-15 10:07:46 UTC
REVIEW: http://review.gluster.org/13447 (cluster/ec: Fix invalid config check for directories) posted (#1) for review on release-3.7 by Xavier Hernandez (xhernandez)

Comment 8 Vijay Bellur 2016-03-02 11:03:00 UTC
COMMIT: http://review.gluster.org/13447 committed in release-3.7 by Pranith Kumar Karampuri (pkarampu) 
------
commit 6662b0a9cf19fdbe75e67f70a98df44d4852467c
Author: Xavier Hernandez <xhernandez>
Date:   Mon Feb 15 10:59:29 2016 +0100

    cluster/ec: Fix invalid config check for directories
    
    The trusted.ec.config xattr is not defined for directories. However
    sometimes it could be requested because the inode type of a directory
    can temporarily be IA_INVAL.
    
    Requesting such xattr using the xattrop fop when it doesn't exist,
    returns a config value full of 0's, which is invalid and caused some
    fops to fail.
    
    This patch filters out this case by ignoring config xattr == 0.
    
    > Change-Id: Ied51c35b313ea8c3eeae27812f9bae61d3808e92
    > Reviewed-on: http://review.gluster.org/13446
    > BUG: 1293223
    > Signed-off-by: Xavier Hernandez <xhernandez>
    
    Change-Id: I42d06119d8f51c34ddb910380af7acd670f6244e
    BUG: 1293224
    Signed-off-by: Xavier Hernandez <xhernandez>
    Reviewed-on: http://review.gluster.org/13447
    Smoke: Gluster Build System <jenkins.com>
    CentOS-regression: Gluster Build System <jenkins.com>
    Reviewed-by: Ashish Pandey <aspandey>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>

Comment 9 Mike McCune 2016-03-28 22:17:27 UTC
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions

Comment 10 Kaushal 2016-04-19 07:22:29 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.9, please open a new bug report.

glusterfs-3.7.9 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://www.gluster.org/pipermail/gluster-users/2016-March/025922.html
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 11 Kaushal 2016-04-19 07:51:26 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.7, please open a new bug report.

glusterfs-3.7.7 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://www.gluster.org/pipermail/gluster-users/2016-February/025292.html
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user