Description of problem: Disperse volume "trusted.ec.version" is different sometimes and can't be healed after lookup, which cause lookup failed when some brick server down. Version-Release number of selected component (if applicable): 3.6.2 How reproducible: Steps to Reproduce: 1.create a disperse volume(2+1) [root@localhost ~]# gluster volume info Volume Name: test Type: Disperse Volume ID: 24bcea9a-31b3-4333-a17d-776d27d89e8a Status: Started Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: 192.168.130.132:/data/brick1 Brick2: 192.168.130.132:/data/brick2 Brick3: 192.168.130.132:/data/brick3 2.change trusted.ec version of one brick manually [root@localhost ~]# setfattr -n trusted.ec.version -v 0x0000000000000003 /data/brick2/ In fact, "trusted.ec.version" is changed to invalid value when I test kill some brick server, but I can't reproduce it everytime. So I changed it manually. [root@localhost ~]# getfattr -m . -d -e hex /data/brick{1,2,3} getfattr: Removing leading '/' from absolute path names # file: data/brick1 trusted.blusterfs.dht=0x000000010000000000000000ffffffff trusted.blusterfs.volume-id=0x24bcea9a31b34333a17d776d27d89e8a trusted.ec.version=0x000000000000000c trusted.gfid=0x00000000000000000000000000000001 # file: data/brick2 trusted.blusterfs.dht=0x000000010000000000000000ffffffff trusted.blusterfs.volume-id=0x24bcea9a31b34333a17d776d27d89e8a trusted.ec.version=0x0000000000000003 trusted.gfid=0x00000000000000000000000000000001 # file: data/brick3 trusted.blusterfs.dht=0x000000010000000000000000ffffffff trusted.blusterfs.volume-id=0x24bcea9a31b34333a17d776d27d89e8a trusted.ec.version=0x000000000000000c trusted.gfid=0x00000000000000000000000000000001 3.mount the disperse volume at /home/mnt 4.ll /home/mnt [root@localhost ~]# getfattr -m . -d -e hex /data/brick{1,2,3} getfattr: Removing leading '/' from absolute path names # file: data/brick1 trusted.blusterfs.dht=0x000000010000000000000000ffffffff trusted.blusterfs.volume-id=0x24bcea9a31b34333a17d776d27d89e8a trusted.ec.version=0x000000000000000c trusted.gfid=0x00000000000000000000000000000001 # file: data/brick2 trusted.blusterfs.dht=0x000000010000000000000000ffffffff trusted.blusterfs.volume-id=0x24bcea9a31b34333a17d776d27d89e8a trusted.ec.version=0x0000000000000003 trusted.gfid=0x00000000000000000000000000000001 # file: data/brick3 trusted.blusterfs.dht=0x000000010000000000000000ffffffff trusted.blusterfs.volume-id=0x24bcea9a31b34333a17d776d27d89e8a trusted.ec.version=0x000000000000000c trusted.gfid=0x00000000000000000000000000000001 Actual results: data/brick2 trusted.ec.version=0x0000000000000003 is still invalid. Expected results: data/brick2 trusted.ec.version healed to 0x000000000000000c Additional info: If there is none of bricks broken down, ll /home/mnt report no error. But, if kill the /data/brick3 brick server process, ll /home/mnt will cause an Input/Output error like this: [root@localhost ~]# gluster volume status Status of volume: test Bluster process Port Online Pid ------------------------------------------------------------------------------ Brick 192.168.130.132:/data/brick1 49152 Y 6158 Brick 192.168.130.132:/data/brick2 49153 Y 4031 Brick 192.168.130.132:/data/brick3 N/A N N/A NFS Server on localhost 2049 Y 10921 Task Status of Volume test ------------------------------------------------------------------------------ There are no active volume tasks [root@localhost ~]# ll /home/mnt ls: cannot access /home/mnt: Input/output error I hope it can be fixed, thank you
This is not a bug. It's how it works. If you have a 2 + 1 dispersed volume, and one of the bricks is damaged (brick2 has been manually damaged), another failure on another brick (in this case brick3) makes the file or directory inaccessible (you see an Input/Output error). If you need to support more than one brick failure, you should use a bigger configuration, like a 4 + 2. Regarding the healing problem, current version of dispersed volumes do not automatically heal its files and directories. They must be manually healed. The recommended way to do a full volume heal is this: find <mount point> -d -exec getfattr -h -n trusted.ec.heal {} \; It's expected that version 3.7 will integrate automatic self-healing for dispersed volumes.
(In reply to Xavier Hernandez from comment #1) > This is not a bug. It's how it works. > > If you have a 2 + 1 dispersed volume, and one of the bricks is damaged > (brick2 has been manually damaged), another failure on another brick (in > this case brick3) makes the file or directory inaccessible (you see an > Input/Output error). > > If you need to support more than one brick failure, you should use a bigger > configuration, like a 4 + 2. > > Regarding the healing problem, current version of dispersed volumes do not > automatically heal its files and directories. They must be manually healed. > The recommended way to do a full volume heal is this: > > find <mount point> -d -exec getfattr -h -n trusted.ec.heal {} \; > > It's expected that version 3.7 will integrate automatic self-healing for > dispersed volumes. Thanks, it solve my problems! In 3.6.2, the files can integrate automatic self-healing but folder can not, so I thought it was a bug.
Files are automatically healed on first access, however directories are a bit more complex because they need to be recursively healed, otherwise there was occasions where an 'ls' did return old or invalid contents. It's for this reason that the preferred way to heal directories is using the 'find' command in a post-order or depth-first mode (-d flag). This heals directory contents first and then the directory itself. Sorry for the inconveniences.