Bug 1236065

Summary:	Disperse volume: FUSE I/O error after self healing the failed disk files
Product:	[Community] GlusterFS	Reporter:	Xavi Hernandez <jahernan>
Component:	disperse	Assignee:	Xavi Hernandez <jahernan>
Status:	CLOSED CURRENTRELEASE	QA Contact:
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	mainline	CC:	bugs, mdfakkeer, pkarampu
Target Milestone:	---	Keywords:	Triaged
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	glusterfs-3.8rc2	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:	1235964	Environment:
Last Closed:	2016-06-16 13:17:09 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1235964

Description Xavi Hernandez 2015-06-26 13:02:21 UTC

+++ This bug was initially created as a clone of Bug #1235964 +++

Description of problem:
In a 3 x (4 + 2) = 18 distributed disperse volume, there are
input/output error of some files on fuse mount after simulating the
following scenario

1.   Simulate the disk failure by killing the disk pid and again adding
the same disk after formatting the drive
2.   Try to read the recovered or healed file after 2 bricks/nodes were
brought down
Version-Release number of selected component (if applicable):

glusterfs 3.7.2 built on Jun 19 2015 16:33:27
Repository revision: git://git.gluster.com/glusterfs.git
<http://git.gluster.com/glusterfs.git>
Copyright (coffee) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU
General Public License

How reproducible:

Steps to Reproduce:

1. create a 3x(4+2) disperse volume across nodes
2. FUSE mount on the client and start creating files/directories with mkdir and rsync/dd
3. simulate the disk failure by killing pid of any disk on one node and add again the same disk after formatting the drive
4. start volume by force
5. self haling adding the file name with 0 bytes in newly formatted drive
6. wait more time to finish self healing, but self healing is not happening the file lies on 0 bytes
7. Try to read same file from client, now the file name with 0 byte try to recovery and recovery completed. Get the md5sum of the file with all client live and the result is positive
8. Now, bring down 2 of the node
9. Now try to get the mdsum of same recoverd file, client throws I/O error


Actual results:
I/O error on the recovered file

Expected results:
There should not be any IO erro

Additional info:

admin@node001:~$ sudo gluster volume info

Volume Name: vaulttest21
Type: Distributed-Disperse
Volume ID: ac6a374d-a0a2-405c-823d-0672fd92f0af
Status: Started
Number of Bricks: 3 x (4 + 2) = 18
Transport-type: tcp
Bricks:
Brick1: 10.1.2.1:/media/disk1
Brick2: 10.1.2.2:/media/disk1
Brick3: 10.1.2.3:/media/disk1
Brick4: 10.1.2.4:/media/disk1
Brick5: 10.1.2.5:/media/disk1
Brick6: 10.1.2.6:/media/disk1
Brick7: 10.1.2.1:/media/disk2
Brick8: 10.1.2.2:/media/disk2
Brick9: 10.1.2.3:/media/disk2
Brick10: 10.1.2.4:/media/disk2
Brick11: 10.1.2.5:/media/disk2
Brick12: 10.1.2.6:/media/disk2
Brick13: 10.1.2.1:/media/disk3
Brick14: 10.1.2.2:/media/disk3
Brick15: 10.1.2.3:/media/disk3
Brick16: 10.1.2.4:/media/disk3
Brick17: 10.1.2.5:/media/disk3
Brick18: 10.1.2.6:/media/disk3
Options Reconfigured:
performance.readdir-ahead: on

*_After simulated the disk failure( node3- disk2) and adding aging by
formatting the drive _*

admin@node003:~$ date

Thu Jun 25 *16:21:58* IST 2015


admin@node003:~$ ls -l -h /media/disk2

total 1.6G

drwxr-xr-x 3 root root   22 Jun 25 16:18 1

*-rw-r--r-- 2 root root    0 Jun 25 16:17 up1*

*-rw-r--r-- 2 root root    0 Jun 25 16:17 up2*

-rw-r--r-- 2 root root 797M Jun 25 16:03 up3

-rw-r--r-- 2 root root 797M Jun 25 16:04 up4

--

admin@node003:~$ date

Thu Jun 25 *16:25:09* IST 2015


admin@node003:~$ ls -l -h  /media/disk2

total 1.6G

drwxr-xr-x 3 root root   22 Jun 25 16:18 1

*-rw-r--r-- 2 root root    0 Jun 25 16:17 up1*

*-rw-r--r-- 2 root root    0 Jun 25 16:17 up2*

-rw-r--r-- 2 root root 797M Jun 25 16:03 up3

-rw-r--r-- 2 root root 797M Jun 25 16:04 up4


admin@node003:~$ date

Thu Jun 25 *16:41:25* IST 2015


admin@node003:~$  ls -l -h  /media/disk2

total 1.6G

drwxr-xr-x 3 root root   22 Jun 25 16:18 1

-rw-r--r-- 2 root root    0 Jun 25 16:17 up1

-rw-r--r-- 2 root root    0 Jun 25 16:17 up2

-rw-r--r-- 2 root root 797M Jun 25 16:03 up3

-rw-r--r-- 2 root root 797M Jun 25 16:04 up4


*after waiting nearly 20 minutes, self healing is not recovered the full
data junk . Then try to read the file using md5sum*
*
*
root@mas03:/mnt/gluster# time md5sum up1
4650543ade404ed5a1171726e76f8b7c  up1

real    1m58.010s
user    0m6.243s
sys     0m0.778s

*corrupted junk starts growing*

admin@node003:~$ ls -l -h  /media/disk2
total 2.6G
drwxr-xr-x 3 root root   22 Jun 25 16:18 1
-rw-r--r-- 2 root root 797M Jun 25 15:57 up1
-rw-r--r-- 2 root root    0 Jun 25 16:17 up2
-rw-r--r-- 2 root root 797M Jun 25 16:03 up3
-rw-r--r-- 2 root root 797M Jun 25 16:04 up4

*_To verify healed file after two node 5 & 6 taken offline_*

root@mas03:/mnt/gluster# time md5sum up1
md5sum: up1:*Input/output error*

Comment 1 Anand Avati 2015-08-05 22:06:01 UTC

REVIEW: http://review.gluster.org/11844 (cluster/ec: Fix tracking of good bricks) posted (#1) for review on master by Xavier Hernandez (xhernandez)

Comment 2 Anand Avati 2015-08-06 07:24:36 UTC

REVIEW: http://review.gluster.org/11844 (cluster/ec: Fix tracking of good bricks) posted (#2) for review on master by Xavier Hernandez (xhernandez)

Comment 3 Anand Avati 2015-08-06 17:12:27 UTC

COMMIT: http://review.gluster.org/11844 committed in master by Pranith Kumar Karampuri (pkarampu) 
------
commit 7298b622ab39c2e78d6d745ae8b6e8413e1d9f1a
Author: Xavier Hernandez <xhernandez>
Date:   Wed Aug 5 23:42:41 2015 +0200

    cluster/ec: Fix tracking of good bricks
    
    The bitmask of good and bad bricks was kept in the context of the
    corresponding inode or fd. This was problematic when an external
    process (another client or the self-heal process) did heal the
    bricks but no one changed the bitmaks of other clients.
    
    This patch removes the bitmask stored in the context and calculates
    which bricks are healthy after locking them and doing the initial
    xattrop. After that, it's updated using the result of each fop.
    
    Change-Id: I225e31cd219a12af4ca58871d8a4bb6f742b223c
    BUG: 1236065
    Signed-off-by: Xavier Hernandez <xhernandez>
    Reviewed-on: http://review.gluster.org/11844
    Tested-by: NetBSD Build System <jenkins.org>
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>

Comment 4 Niels de Vos 2016-06-16 13:17:09 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report.

glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user