Bug 1190058

Summary: folder "trusted.ec.version" can't be healed after lookup
Product: [Community] GlusterFS Reporter: SuperGC <download007>
Component: disperseAssignee: Xavi Hernandez <jahernan>
Status: CLOSED NOTABUG QA Contact:
Severity: medium Docs Contact:
Priority: unspecified    
Version: mainlineCC: bugs, gluster-bugs
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-02-11 07:18:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description SuperGC 2015-02-06 08:21:54 UTC
Description of problem:

Disperse volume "trusted.ec.version" is different sometimes and can't be healed after lookup, which cause lookup failed when some brick server down.

Version-Release number of selected component (if applicable):
3.6.2

How reproducible:


Steps to Reproduce:
1.create a disperse volume(2+1)

[root@localhost ~]# gluster volume info
 
Volume Name: test
Type: Disperse
Volume ID: 24bcea9a-31b3-4333-a17d-776d27d89e8a
Status: Started
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: 192.168.130.132:/data/brick1
Brick2: 192.168.130.132:/data/brick2
Brick3: 192.168.130.132:/data/brick3

2.change trusted.ec version of one brick manually
[root@localhost ~]# setfattr -n trusted.ec.version -v 0x0000000000000003 /data/brick2/
In fact, "trusted.ec.version" is changed to invalid value when I test kill some brick server, but I can't reproduce it everytime. So I changed it manually.

[root@localhost ~]# getfattr -m . -d -e hex /data/brick{1,2,3}
getfattr: Removing leading '/' from absolute path names
# file: data/brick1
trusted.blusterfs.dht=0x000000010000000000000000ffffffff
trusted.blusterfs.volume-id=0x24bcea9a31b34333a17d776d27d89e8a
trusted.ec.version=0x000000000000000c
trusted.gfid=0x00000000000000000000000000000001

# file: data/brick2
trusted.blusterfs.dht=0x000000010000000000000000ffffffff
trusted.blusterfs.volume-id=0x24bcea9a31b34333a17d776d27d89e8a
trusted.ec.version=0x0000000000000003
trusted.gfid=0x00000000000000000000000000000001

# file: data/brick3
trusted.blusterfs.dht=0x000000010000000000000000ffffffff
trusted.blusterfs.volume-id=0x24bcea9a31b34333a17d776d27d89e8a
trusted.ec.version=0x000000000000000c
trusted.gfid=0x00000000000000000000000000000001


3.mount the disperse volume at /home/mnt

4.ll /home/mnt
[root@localhost ~]# getfattr -m . -d -e hex /data/brick{1,2,3}
getfattr: Removing leading '/' from absolute path names
# file: data/brick1
trusted.blusterfs.dht=0x000000010000000000000000ffffffff
trusted.blusterfs.volume-id=0x24bcea9a31b34333a17d776d27d89e8a
trusted.ec.version=0x000000000000000c
trusted.gfid=0x00000000000000000000000000000001

# file: data/brick2
trusted.blusterfs.dht=0x000000010000000000000000ffffffff
trusted.blusterfs.volume-id=0x24bcea9a31b34333a17d776d27d89e8a
trusted.ec.version=0x0000000000000003
trusted.gfid=0x00000000000000000000000000000001

# file: data/brick3
trusted.blusterfs.dht=0x000000010000000000000000ffffffff
trusted.blusterfs.volume-id=0x24bcea9a31b34333a17d776d27d89e8a
trusted.ec.version=0x000000000000000c
trusted.gfid=0x00000000000000000000000000000001

Actual results:
data/brick2 trusted.ec.version=0x0000000000000003 is still invalid.

Expected results:
data/brick2 trusted.ec.version healed to 0x000000000000000c

Additional info:
If there is none of bricks broken down, ll /home/mnt report no error. But, if kill the /data/brick3 brick server process, ll /home/mnt will cause an Input/Output error  like this:
[root@localhost ~]# gluster volume status
Status of volume: test
Bluster process                                         Port    Online  Pid
------------------------------------------------------------------------------
Brick 192.168.130.132:/data/brick1                      49152   Y       6158
Brick 192.168.130.132:/data/brick2                      49153   Y       4031
Brick 192.168.130.132:/data/brick3                      N/A     N       N/A
NFS Server on localhost                                 2049    Y       10921
 
Task Status of Volume test
------------------------------------------------------------------------------
There are no active volume tasks
 
[root@localhost ~]# ll /home/mnt 
ls: cannot access /home/mnt: Input/output error

I hope it can be fixed, thank you

Comment 1 Xavi Hernandez 2015-02-10 14:26:42 UTC
This is not a bug. It's how it works.

If you have a 2 + 1 dispersed volume, and one of the bricks is damaged (brick2 has been manually damaged), another failure on another brick (in this case brick3) makes the file or directory inaccessible (you see an Input/Output error).

If you need to support more than one brick failure, you should use a bigger configuration, like a 4 + 2.

Regarding the healing problem, current version of dispersed volumes do not automatically heal its files and directories. They must be manually healed. The recommended way to do a full volume heal is this:

    find <mount point> -d -exec getfattr -h -n trusted.ec.heal {} \;

It's expected that version 3.7 will integrate automatic self-healing for dispersed volumes.

Comment 2 SuperGC 2015-02-11 07:18:27 UTC
(In reply to Xavier Hernandez from comment #1)
> This is not a bug. It's how it works.
> 
> If you have a 2 + 1 dispersed volume, and one of the bricks is damaged
> (brick2 has been manually damaged), another failure on another brick (in
> this case brick3) makes the file or directory inaccessible (you see an
> Input/Output error).
> 
> If you need to support more than one brick failure, you should use a bigger
> configuration, like a 4 + 2.
> 
> Regarding the healing problem, current version of dispersed volumes do not
> automatically heal its files and directories. They must be manually healed.
> The recommended way to do a full volume heal is this:
> 
>     find <mount point> -d -exec getfattr -h -n trusted.ec.heal {} \;
> 
> It's expected that version 3.7 will integrate automatic self-healing for
> dispersed volumes.

Thanks, it solve my problems!

In 3.6.2, the files can integrate automatic self-healing but folder can not, so I thought it was a bug.

Comment 3 Xavi Hernandez 2015-02-11 08:20:03 UTC
Files are automatically healed on first access, however directories are a bit more complex because they need to be recursively healed, otherwise there was occasions where an 'ls' did return old or invalid contents.

It's for this reason that the preferred way to heal directories is using the 'find' command in a post-order or depth-first mode (-d flag). This heals directory contents first and then the directory itself.

Sorry for the inconveniences.