Bug 1236050
Summary: | Disperse volume: fuse mount hung after self healing | ||||||
---|---|---|---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Backer <mdfakkeer> | ||||
Component: | disperse | Assignee: | Pranith Kumar K <pkarampu> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 3.7.2 | CC: | bugs, gluster-bugs, jahernan, pkarampu, rkavunga | ||||
Target Milestone: | --- | Keywords: | Triaged | ||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | glusterfs-3.7.4 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1251446 (view as bug list) | Environment: | |||||
Last Closed: | 2015-09-09 09:38:04 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 1251446 | ||||||
Bug Blocks: | 1248533 | ||||||
Attachments: |
|
Description
Backer
2015-06-26 12:35:22 UTC
hi Backer, Could you try this test with 3.7.3 please. We fixed 2-3 hang bugs so it would be great if you could let us know if it still happens. Meanwhile Xavi and I are going to work on 1235964 you raised. Do you hangout on #gluster IRC? It would be great to know your feedback about 3.7.3 to see what you think about the stability of EC. We feel EC is almost ready for production with 3.7.3 release based on our tests in lab. Pranith I have tested the 3.7.3 as well as 3.7.2 nightly build( glusterfs-3.7.2-20150726.b639cb9.tar.gz) for the I/O error and handout issue. I found that 3.7.3 has the data corruption issue which is not present is 3.7.2 nightly build( glusterfs-3.7.2-20150707.36f24f5.tar.gz). Data has been corrupted after replacing the failed drive and running the self heal. Even we find the data corruption after the recovery of node failure ,When unavailable data chunks has been copied by proactive self heal daemon. You can reproduce the bug through the following steps Steps to reproduce: 1. create a 3x(4+2) disperse volume across nodes 2. FUSE mount on the client and start creating files/directories with mkdir and rsync/dd 3. Now, bring down 2 of the nodes(node 5 & 6) 4. write some files(eg filenew1, filenew2). The files will be available only on 4 nodes( node 1,2,3 & 4 ) 5. calculate the md5sum of filenew1 and filenew2 6. Now bring up the failed/down 2 nodes( node 5 & 6) 6. Pro active Self healing will create unavailable data chunks on 2 nodes (node 5 & 6). 7. Once finish the self healing, bring down another two nodes (node 1 & 2) 8. Now try to get the mdsum of same recovered file, there will be a mismatch in md5sum value. But this bug is not available in 3.7.2 nightly build (glusterfs-3.7.2-20150707.36f24f5.tar.gz) Also i would like to know, why the proactive self healing is not happening after replacing the failed drives. I have to manually run the volume heal command for healing the unavailable files. hi Backer, Thanks for the quick reply. Based on your comment, I am assuming no hangs are observed. Auto-healing of replace-brick/disk-replacement is something we are working for 3.7.4, until then you need to execute "gluster volume heal ec2 full". As for the data corruption bug, I am not able to re-create it: Let me know if I missed any step: root@localhost - ~ 14:48:24 :) ⚡ glusterd && gluster volume create ec2 disperse 6 redundancy 2 `hostname`:/home/gfs/ec_{0..5} force && gluster volume start ec2 && mount -t glusterfs `hostname`:/ec2 /mnt/ec2 volume create: ec2: success: please start the volume to access data volume start: ec2: success #I disabled perf-xlators so that reads are served from the bricks always root@localhost - ~ 14:48:38 :( ⚡ ~/.scripts/disable-perf-xl.sh ec2 + gluster volume set ec2 performance.quick-read off volume set: success + gluster volume set ec2 performance.io-cache off volume set: success + gluster volume set ec2 performance.write-behind off volume set: success + gluster volume set ec2 performance.stat-prefetch off volume set: success + gluster volume set ec2 performance.read-ahead off volume set: success + gluster volume set ec2 performance.open-behind off volume set: success root@localhost - ~ 14:48:47 :) ⚡ cd /mnt/ec2/ root@localhost - /mnt/ec2 14:48:59 :) ⚡ gluster v status Status of volume: ec2 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick localhost.localdomain:/home/gfs/ec_0 49152 0 Y 14828 Brick localhost.localdomain:/home/gfs/ec_1 49153 0 Y 14846 Brick localhost.localdomain:/home/gfs/ec_2 49155 0 Y 14864 Brick localhost.localdomain:/home/gfs/ec_3 49156 0 Y 14882 Brick localhost.localdomain:/home/gfs/ec_4 49157 0 Y 14900 Brick localhost.localdomain:/home/gfs/ec_5 49158 0 Y 14918 NFS Server on localhost 2049 0 Y 14937 Task Status of Volume ec2 ------------------------------------------------------------------------------ There are no active volume tasks root@localhost - /mnt/ec2 14:49:02 :) ⚡ kill -9 14918 14900 root@localhost - /mnt/ec2 14:49:11 :) ⚡ dd if=/dev/urandom of=1.txt bs=1M count=2 2+0 records in 2+0 records out 2097152 bytes (2.1 MB) copied, 0.153835 s, 13.6 MB/s root@localhost - /mnt/ec2 14:49:15 :) ⚡ md5sum 1.txt 5ead68d0a60b8134f7daf0e8d1afe19c 1.txt root@localhost - /mnt/ec2 14:49:23 :) ⚡ gluster v start ec2 force volume start: ec2: success root@localhost - /mnt/ec2 14:49:35 :) ⚡ gluster v heal ec2 Launching heal operation to perform index self heal on volume ec2 has been successful Use heal info commands to check status root@localhost - /mnt/ec2 14:49:39 :) ⚡ gluster v heal ec2 info Brick localhost.localdomain:/home/gfs/ec_0/ /1.txt Number of entries: 1 Brick localhost.localdomain:/home/gfs/ec_1/ /1.txt Number of entries: 1 Brick localhost.localdomain:/home/gfs/ec_2/ /1.txt Number of entries: 1 Brick localhost.localdomain:/home/gfs/ec_3/ /1.txt Number of entries: 1 Brick localhost.localdomain:/home/gfs/ec_4/ Number of entries: 0 Brick localhost.localdomain:/home/gfs/ec_5/ Number of entries: 0 root@localhost - /mnt/ec2 14:49:45 :) ⚡ gluster v heal ec2 Launching heal operation to perform index self heal on volume ec2 has been successful Use heal info commands to check status root@localhost - /mnt/ec2 14:49:47 :) ⚡ gluster v heal ec2 info Brick localhost.localdomain:/home/gfs/ec_0/ Number of entries: 0 Brick localhost.localdomain:/home/gfs/ec_1/ Number of entries: 0 Brick localhost.localdomain:/home/gfs/ec_2/ Number of entries: 0 Brick localhost.localdomain:/home/gfs/ec_3/ Number of entries: 0 Brick localhost.localdomain:/home/gfs/ec_4/ Number of entries: 0 Brick localhost.localdomain:/home/gfs/ec_5/ Number of entries: 0 root@localhost - /mnt/ec2 14:49:51 :) ⚡ gluster v status Status of volume: ec2 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick localhost.localdomain:/home/gfs/ec_0 49152 0 Y 14828 Brick localhost.localdomain:/home/gfs/ec_1 49153 0 Y 14846 Brick localhost.localdomain:/home/gfs/ec_2 49155 0 Y 14864 Brick localhost.localdomain:/home/gfs/ec_3 49156 0 Y 14882 Brick localhost.localdomain:/home/gfs/ec_4 49157 0 Y 15173 Brick localhost.localdomain:/home/gfs/ec_5 49158 0 Y 15191 NFS Server on localhost 2049 0 Y 15211 Task Status of Volume ec2 ------------------------------------------------------------------------------ There are no active volume tasks root@localhost - /mnt/ec2 14:49:56 :) ⚡ kill -9 14828 14846 root@localhost - /mnt/ec2 14:50:03 :) ⚡ md5sum 1.txt 5ead68d0a60b8134f7daf0e8d1afe19c 1.txt root@localhost - /mnt/ec2 14:50:06 :) ⚡ cd root@localhost - ~ 14:50:13 :) ⚡ umount /mnt/ec2 root@localhost - ~ 14:50:16 :) ⚡ mount -t glusterfs `hostname`:/ec2 /mnt/ec2 root@localhost - ~ 14:50:19 :) ⚡ md5sum /mnt/ec2/1.txt 5ead68d0a60b8134f7daf0e8d1afe19c /mnt/ec2/1.txt Created attachment 1059799 [details]
Different test scenarios and result
I am getting random test results after disabled and enabled the perf-xlators. Please refer the attachment. root@gfs-tst-08:/home/qubevaultadmin# gluster --version glusterfs 3.7.3 built on Jul 31 2015 17:03:01 Repository revision: git://git.gluster.com/glusterfs.git Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com> GlusterFS comes with ABSOLUTELY NO WARRANTY. You may redistribute copies of GlusterFS under the terms of the GNU General Public License. root@gfs-tst-08:/home/gfsadmin# gluster volume info Volume Name: vaulttest39 Type: Disperse Volume ID: fcbed6b5-0654-489c-a29e-d18f737ac2f7 Status: Started Number of Bricks: 1 x (3 + 1) = 4 Transport-type: tcp Bricks: Brick1: 10.1.2.238:/media/disk1 Brick2: 10.1.2.238:/media/disk2 Brick3: 10.1.2.238:/media/disk3 Brick4: 10.1.2.238:/media/disk4 Options Reconfigured: performance.readdir-ahead: on performance.quick-read: off performance.io-cache: off performance.write-behind: off performance.stat-prefetch: off performance.read-ahead: off performance.open-behind: off gfsadmin@gfs-tst-08:~$ sudo gluster volume status Status of volume: vaulttest39 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.1.2.238:/media/disk1 49152 0 Y 1560 Brick 10.1.2.238:/media/disk2 49153 0 Y 1568 Brick 10.1.2.238:/media/disk3 49154 0 Y 1576 Brick 10.1.2.238:/media/disk4 49155 0 Y 1582 NFS Server on localhost 2049 0 Y 1544 Task Status of Volume vaulttest39 ------------------------------------------------------------------------------ There are no active volume tasks gfsadmin@gfs-tst-08:~$ sudo kill -9 1560 gfsadmin@gfs-tst-08:~$ sudo gluster volume status Status of volume: vaulttest39 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.1.2.238:/media/disk1 N/A N/A N N/A Brick 10.1.2.238:/media/disk2 49153 0 Y 1568 Brick 10.1.2.238:/media/disk3 49154 0 Y 1576 Brick 10.1.2.238:/media/disk4 49155 0 Y 1582 NFS Server on localhost 2049 0 Y 1544 Task Status of Volume vaulttest39 ------------------------------------------------------------------------------ There are no active volume tasks root@gfs-tst-09:/mnt/gluster# dd if=/dev/urandom of=2.txt bs=1M count=2 2+0 records in 2+0 records out 2097152 bytes (2.1 MB) copied, 0.226147 s, 9.3 MB/s root@gfs-tst-09:/mnt/gluster# md5sum 2.txt cd9db53f9c090958ff8c033161576b95 2.txt gfsadmin@gfs-tst-08:~$ ls -l -h /media/disk{1..4} /media/disk1: total 960K -rw-r--r-- 2 root root 683K Aug 6 13:58 1.txt /media/disk2: total 1.9M -rw-r--r-- 2 root root 683K Aug 6 13:58 1.txt -rw-r--r-- 2 root root 683K Aug 6 13:59 2.txt /media/disk3: total 1.9M -rw-r--r-- 2 root root 683K Aug 6 13:58 1.txt -rw-r--r-- 2 root root 683K Aug 6 13:59 2.txt /media/disk4: total 1.9M -rw-r--r-- 2 root root 683K Aug 6 13:58 1.txt -rw-r--r-- 2 root root 683K Aug 6 13:59 2.txt root@gfs-tst-08:/home/gfsadmin# gluster v start vaulttest39 force volume start: vaulttest39: success root@gfs-tst-08:/home/gfsadmin# gluster v heal vaulttest39 Launching heal operation to perform index self heal on volume vaulttest39 has been successful Use heal info commands to check status root@gfs-tst-08:/home/gfsadmin# gluster v heal vaulttest39 info Brick gfs-tst-08:/media/disk1/ Number of entries: 0 Brick gfs-tst-08:/media/disk2/ Number of entries: 0 Brick gfs-tst-08:/media/disk3/ Number of entries: 0 Brick gfs-tst-08:/media/disk4/ Number of entries: 0 root@gfs-tst-08:/home/gfsadmin# gluster v heal vaulttest39 Launching heal operation to perform index self heal on volume vaulttest39 has been successful Use heal info commands to check status root@gfs-tst-08:/home/gfsadmin# gluster v heal vaulttest39 info Brick gfs-tst-08:/media/disk1/ Number of entries: 0 Brick gfs-tst-08:/media/disk2/ Number of entries: 0 Brick gfs-tst-08:/media/disk3/ Number of entries: 0 Brick gfs-tst-08:/media/disk4/ Number of entries: 0 root@gfs-tst-08:/home/gfsadmin# ls -l -h /media/disk{1..4} /media/disk1: total 1004K -rw-r--r-- 2 root root 683K Aug 6 13:58 1.txt -rw-r--r-- 2 root root 683K Aug 6 13:59 2.txt /media/disk2: total 1.9M -rw-r--r-- 2 root root 683K Aug 6 13:58 1.txt -rw-r--r-- 2 root root 683K Aug 6 13:59 2.txt /media/disk3: total 1.9M -rw-r--r-- 2 root root 683K Aug 6 13:58 1.txt -rw-r--r-- 2 root root 683K Aug 6 13:59 2.txt /media/disk4: total 1.9M -rw-r--r-- 2 root root 683K Aug 6 13:58 1.txt -rw-r--r-- 2 root root 683K Aug 6 13:59 2.txt root@gfs-tst-08:/home/gfsadmin# gluster volume status Status of volume: vaulttest39 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.1.2.238:/media/disk1 49152 0 Y 1721 Brick 10.1.2.238:/media/disk2 49153 0 Y 1568 Brick 10.1.2.238:/media/disk3 49154 0 Y 1576 Brick 10.1.2.238:/media/disk4 49155 0 Y 1582 NFS Server on localhost 2049 0 Y 1740 Task Status of Volume vaulttest39 ------------------------------------------------------------------------------ There are no active volume tasks root@gfs-tst-08:/home/gfsadmin# kill -9 1582 root@gfs-tst-08:/home/gfsadmin# gluster volume status Status of volume: vaulttest39 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.1.2.238:/media/disk1 49152 0 Y 1721 Brick 10.1.2.238:/media/disk2 49153 0 Y 1568 Brick 10.1.2.238:/media/disk3 49154 0 Y 1576 Brick 10.1.2.238:/media/disk4 N/A N/A N N/A NFS Server on localhost 2049 0 Y 1740 Task Status of Volume vaulttest39 ------------------------------------------------------------------------------ There are no active volume tasks root@gfs-tst-09:/mnt/gluster# md5sum 2.txt cd9db53f9c090958ff8c033161576b95 2.txt root@gfs-tst-09:/mnt/gluster# md5sum 2.txt cd9db53f9c090958ff8c033161576b95 2.txt root@gfs-tst-09:/mnt/gluster# ls 1.txt 2.txt root@gfs-tst-09:/mnt/gluster# ls 1.txt 2.txt root@gfs-tst-09:/mnt/gluster# md5sum 2.txt 70b40a7e3f5dc85345e466968416cde1 2.txt root@gfs-tst-09:/mnt/gluster# md5sum 2.txt 70b40a7e3f5dc85345e466968416cde1 2.txt root@gfs-tst-09:/mnt/gluster# md5sum 2.txt 70b40a7e3f5dc85345e466968416cde1 2.txt root@gfs-tst-09:/mnt/gluster# I have created a new volume once again and confirmed the bug. root@gfs-tst-08:/home/gfsadmin# gluster volume create vaulttest52 disperse-data 3 redundancy 1 10.1.2.238:/media/disk{1..4} force root@gfs-tst-08:/home/gfsadmin# gluster v start vaulttest52 root@gfs-tst-08:/home/gfsadmin# gluster v status Status of volume: vaulttest52 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.1.2.238:/media/disk1 49172 0 Y 1574 Brick 10.1.2.238:/media/disk2 49173 0 Y 1582 Brick 10.1.2.238:/media/disk3 49174 0 Y 1595 Brick 10.1.2.238:/media/disk4 49175 0 Y 1590 NFS Server on localhost 2049 0 Y 1558 Task Status of Volume vaulttest52 ------------------------------------------------------------------------------ There are no active volume tasks root@gfs-tst-08:/home/gfsadmin# gluster v info Volume Name: vaulttest52 Type: Disperse Volume ID: 0b0b3f8f-acb9-4e2c-a029-fcb89f85b1e7 Status: Started Number of Bricks: 1 x (3 + 1) = 4 Transport-type: tcp Bricks: Brick1: 10.1.2.238:/media/disk1 Brick2: 10.1.2.238:/media/disk2 Brick3: 10.1.2.238:/media/disk3 Brick4: 10.1.2.238:/media/disk4 Options Reconfigured: performance.readdir-ahead: on gfsadmin@gfs-tst-09:/mnt/gluster$ sudo dd if=/dev/urandom of=1.txt bs=1M count=2 2+0 records in 2+0 records out 2097152 bytes (2.1 MB) copied, 0.208704 s, 10.0 MB/s gfsadmin@gfs-tst-09:/mnt/gluster$ md5sum 1.txt 1233b5321315c05abb4668cc9a1d9d25 1.txt root@gfs-tst-08:/home/gfsadmin# ls -l -h /media/disk{1..4} /media/disk1: total 960K -rw-r--r-- 2 root root 683K Aug 6 19:14 1.txt /media/disk2: total 960K -rw-r--r-- 2 root root 683K Aug 6 19:14 1.txt /media/disk3: total 960K -rw-r--r-- 2 root root 683K Aug 6 19:14 1.txt /media/disk4: total 960K -rw-r--r-- 2 root root 683K Aug 6 19:14 1.txt root@gfs-tst-08:/home/gfsadmin# kill -9 1574 root@gfs-tst-08:/home/gfsadmin# gluster v status Status of volume: vaulttest52 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.1.2.238:/media/disk1 N/A N/A N N/A Brick 10.1.2.238:/media/disk2 49173 0 Y 1582 Brick 10.1.2.238:/media/disk3 49174 0 Y 1595 Brick 10.1.2.238:/media/disk4 49175 0 Y 1590 NFS Server on localhost 2049 0 Y 1558 Task Status of Volume vaulttest52 ------------------------------------------------------------------------------ There are no active volume tasks gfsadmin@gfs-tst-09:/mnt/gluster$ sudo dd if=/dev/urandom of=2.txt bs=1M count=2 2+0 records in 2+0 records out 2097152 bytes (2.1 MB) copied, 0.205401 s, 10.2 MB/s gfsadmin@gfs-tst-09:/mnt/gluster$ md5sum 2.txt 9c8b37847622efbf2ec75c683166de97 2.txt root@gfs-tst-08:/home/gfsadmin# ls -l -h /media/disk{1..4} /media/disk1: total 960K -rw-r--r-- 2 root root 683K Aug 6 19:14 1.txt /media/disk2: total 1.9M -rw-r--r-- 2 root root 683K Aug 6 19:14 1.txt -rw-r--r-- 2 root root 683K Aug 6 19:16 2.txt /media/disk3: total 1.9M -rw-r--r-- 2 root root 683K Aug 6 19:14 1.txt -rw-r--r-- 2 root root 683K Aug 6 19:16 2.txt /media/disk4: total 1.4M -rw-r--r-- 2 root root 683K Aug 6 19:14 1.txt -rw-r--r-- 2 root root 683K Aug 6 19:16 2.txt root@gfs-tst-08:/home/gfsadmin# gluster v start vaulttest52 force volume start: vaulttest52: success root@gfs-tst-08:/home/gfsadmin# gluster v status Status of volume: vaulttest52 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.1.2.238:/media/disk1 49172 0 Y 1739 Brick 10.1.2.238:/media/disk2 49173 0 Y 1582 Brick 10.1.2.238:/media/disk3 49174 0 Y 1595 Brick 10.1.2.238:/media/disk4 49175 0 Y 1590 NFS Server on localhost 2049 0 Y 1758 Task Status of Volume vaulttest52 ------------------------------------------------------------------------------ There are no active volume tasks root@gfs-tst-08:/home/gfsadmin# gluster v heal vaulttest52 Launching heal operation to perform index self heal on volume vaulttest52 has been successful Use heal info commands to check status root@gfs-tst-08:/home/gfsadmin# gluster v heal vaulttest52 info Brick gfs-tst-08:/media/disk1/ Number of entries: 0 Brick gfs-tst-08:/media/disk2/ Number of entries: 0 Brick gfs-tst-08:/media/disk3/ Number of entries: 0 Brick gfs-tst-08:/media/disk4/ Number of entries: 0 root@gfs-tst-08:/home/gfsadmin# ls -l -h /media/disk{1..4} /media/disk1: total 728K -rw-r--r-- 2 root root 683K Aug 6 19:14 1.txt -rw-r--r-- 2 root root 683K Aug 6 19:16 2.txt /media/disk2: total 1.4M -rw-r--r-- 2 root root 683K Aug 6 19:14 1.txt -rw-r--r-- 2 root root 683K Aug 6 19:16 2.txt /media/disk3: total 1.4M -rw-r--r-- 2 root root 683K Aug 6 19:14 1.txt -rw-r--r-- 2 root root 683K Aug 6 19:16 2.txt /media/disk4: total 1.4M -rw-r--r-- 2 root root 683K Aug 6 19:14 1.txt -rw-r--r-- 2 root root 683K Aug 6 19:16 2.txt root@gfs-tst-08:/home/gfsadmin# gluster v status Status of volume: vaulttest52 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.1.2.238:/media/disk1 49172 0 Y 1739 Brick 10.1.2.238:/media/disk2 49173 0 Y 1582 Brick 10.1.2.238:/media/disk3 49174 0 Y 1595 Brick 10.1.2.238:/media/disk4 49175 0 Y 1590 NFS Server on localhost 2049 0 Y 1758 Task Status of Volume vaulttest52 ------------------------------------------------------------------------------ There are no active volume tasks root@gfs-tst-08:/home/gfsadmin# kill -9 1590 root@gfs-tst-08:/home/gfsadmin# gluster v status Status of volume: vaulttest52 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.1.2.238:/media/disk1 49172 0 Y 1739 Brick 10.1.2.238:/media/disk2 49173 0 Y 1582 Brick 10.1.2.238:/media/disk3 49174 0 Y 1595 Brick 10.1.2.238:/media/disk4 N/A N/A N N/A NFS Server on localhost 2049 0 Y 1758 Task Status of Volume vaulttest52 ------------------------------------------------------------------------------ There are no active volume tasks gfsadmin@gfs-tst-09:/mnt/gluster$ md5sum 2.txt 96f6f469f4b743b4a575fdc408b5f007 2.txt gfsadmin@gfs-tst-09:/mnt/gluster$ md5sum 2.txt 96f6f469f4b743b4a575fdc408b5f007 2.txt gfsadmin@gfs-tst-09:/mnt/gluster$ md5sum 2.txt 96f6f469f4b743b4a575fdc408b5f007 2.txt gfsadmin@gfs-tst-09:/mnt/gluster$ ls 1.txt 2.txt gfsadmin@gfs-tst-09:/mnt/gluster$ ls 1.txt 2.txt gfsadmin@gfs-tst-09:/mnt/gluster$ ls 1.txt 2.txt gfsadmin@gfs-tst-09:/mnt/gluster$ md5sum 2.txt 96f6f469f4b743b4a575fdc408b5f007 2.txt gfsadmin@gfs-tst-09:/mnt/gluster$ md5sum 2.txt 96f6f469f4b743b4a575fdc408b5f007 2.txt ===================================== MD5SUM has ben changed ==================================== root@gfs-tst-08:/home/gfsadmin# gluster v start vaulttest52 force volume start: vaulttest52: success root@gfs-tst-08:/home/gfsadmin# gluster v status Status of volume: vaulttest52 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.1.2.238:/media/disk1 49172 0 Y 1739 Brick 10.1.2.238:/media/disk2 49173 0 Y 1582 Brick 10.1.2.238:/media/disk3 49174 0 Y 1595 Brick 10.1.2.238:/media/disk4 49175 0 Y 1852 NFS Server on localhost 2049 0 Y 1871 Task Status of Volume vaulttest52 ------------------------------------------------------------------------------ There are no active volume tasks ====================================== disabled perf-xlators ===================================== root@gfs-tst-08:/home/gfsadmin# gluster volume set vaulttest52 performance.quick-read off gluster volume set vaulttest52 performance.io-cache off gluster volume set vaulttest52 performance.write-behind off gluster volume set vaulttest52 performance.stat-prefetch off gluster volume set vaulttest52 performance.read-ahead off gluster volume set vaulttest52 performance.open-behind off volume set: success root@gfs-tst-08:/home/gfsadmin# gluster volume set vaulttest52 performance.io-cache off volume set: success root@gfs-tst-08:/home/gfsadmin# gluster volume set vaulttest52 performance.write-behind off volume set: success root@gfs-tst-08:/home/gfsadmin# gluster volume set vaulttest52 performance.stat-prefetch off volume set: success root@gfs-tst-08:/home/gfsadmin# gluster volume set vaulttest52 performance.read-ahead off volume set: success root@gfs-tst-08:/home/gfsadmin# gluster volume set vaulttest52 performance.open-behind off volume set: success root@gfs-tst-08:/home/gfsadmin# gluster v info Volume Name: vaulttest52 Type: Disperse Volume ID: 0b0b3f8f-acb9-4e2c-a029-fcb89f85b1e7 Status: Started Number of Bricks: 1 x (3 + 1) = 4 Transport-type: tcp Bricks: Brick1: 10.1.2.238:/media/disk1 Brick2: 10.1.2.238:/media/disk2 Brick3: 10.1.2.238:/media/disk3 Brick4: 10.1.2.238:/media/disk4 Options Reconfigured: performance.open-behind: off performance.read-ahead: off performance.stat-prefetch: off performance.write-behind: off performance.io-cache: off performance.quick-read: off performance.readdir-ahead: on root@gfs-tst-08:/home/gfsadmin# gluster v status Status of volume: vaulttest52 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.1.2.238:/media/disk1 49172 0 Y 1739 Brick 10.1.2.238:/media/disk2 49173 0 Y 1582 Brick 10.1.2.238:/media/disk3 49174 0 Y 1595 Brick 10.1.2.238:/media/disk4 49175 0 Y 1852 NFS Server on localhost 2049 0 Y 1871 Task Status of Volume vaulttest52 ------------------------------------------------------------------------------ There are no active volume tasks root@gfs-tst-08:/home/gfsadmin# kill -9 1852 root@gfs-tst-08:/home/gfsadmin# gluster v status Status of volume: vaulttest52 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.1.2.238:/media/disk1 49172 0 Y 1739 Brick 10.1.2.238:/media/disk2 49173 0 Y 1582 Brick 10.1.2.238:/media/disk3 49174 0 Y 1595 Brick 10.1.2.238:/media/disk4 N/A N/A N N/A NFS Server on localhost 2049 0 Y 1871 Task Status of Volume vaulttest52 ------------------------------------------------------------------------------ There are no active volume tasks gfsadmin@gfs-tst-09:/mnt/gluster$ sudo dd if=/dev/urandom of=3.txt bs=5M count=10 10+0 records in 10+0 records out 52428800 bytes (52 MB) copied, 5.40714 s, 9.7 MB/s gfsadmin@gfs-tst-09:/mnt/gluster$ md5sum 3.txt fa9d9d3e298d01c8cf54855968784b83 3.txt root@gfs-tst-08:/home/gfsadmin# gluster v start vaulttest52 force volume start: vaulttest52: success root@gfs-tst-08:/home/gfsadmin# gluster v status Status of volume: vaulttest52 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.1.2.238:/media/disk1 49172 0 Y 1739 Brick 10.1.2.238:/media/disk2 49173 0 Y 1582 Brick 10.1.2.238:/media/disk3 49174 0 Y 1595 Brick 10.1.2.238:/media/disk4 49175 0 Y 2017 NFS Server on localhost N/A N/A N N/A Task Status of Volume vaulttest52 ------------------------------------------------------------------------------ There are no active volume tasks root@gfs-tst-08:/home/gfsadmin# gluster v heal vaulttest52 Launching heal operation to perform index self heal on volume vaulttest52 has been successful Use heal info commands to check status root@gfs-tst-08:/home/gfsadmin# gluster v heal vaulttest52 info Brick gfs-tst-08:/media/disk1/ Number of entries: 0 Brick gfs-tst-08:/media/disk2/ Number of entries: 0 Brick gfs-tst-08:/media/disk3/ Number of entries: 0 Brick gfs-tst-08:/media/disk4/ Number of entries: 0 root@gfs-tst-08:/home/gfsadmin# ls -l -h /media/disk{1..4} /media/disk1: total 33M -rw-r--r-- 2 root root 683K Aug 6 19:14 1.txt -rw-r--r-- 2 root root 683K Aug 6 19:16 2.txt -rw-r--r-- 2 root root 17M Aug 6 19:26 3.txt /media/disk2: total 34M -rw-r--r-- 2 root root 683K Aug 6 19:14 1.txt -rw-r--r-- 2 root root 683K Aug 6 19:16 2.txt -rw-r--r-- 2 root root 17M Aug 6 19:26 3.txt /media/disk3: total 34M -rw-r--r-- 2 root root 683K Aug 6 19:14 1.txt -rw-r--r-- 2 root root 683K Aug 6 19:16 2.txt -rw-r--r-- 2 root root 17M Aug 6 19:26 3.txt /media/disk4: total 1.4M -rw-r--r-- 2 root root 683K Aug 6 19:14 1.txt -rw-r--r-- 2 root root 683K Aug 6 19:16 2.txt -rw-r--r-- 2 root root 17M Aug 6 19:26 3.txt root@gfs-tst-08:/home/gfsadmin# gluster v status Status of volume: vaulttest52 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.1.2.238:/media/disk1 49172 0 Y 1739 Brick 10.1.2.238:/media/disk2 49173 0 Y 1582 Brick 10.1.2.238:/media/disk3 49174 0 Y 1595 Brick 10.1.2.238:/media/disk4 49175 0 Y 2017 NFS Server on localhost 2049 0 Y 2036 Task Status of Volume vaulttest52 ------------------------------------------------------------------------------ There are no active volume tasks root@gfs-tst-08:/home/gfsadmin# kill -9 1582 root@gfs-tst-08:/home/gfsadmin# gluster v status Status of volume: vaulttest52 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.1.2.238:/media/disk1 49172 0 Y 1739 Brick 10.1.2.238:/media/disk2 N/A N/A N N/A Brick 10.1.2.238:/media/disk3 49174 0 Y 1595 Brick 10.1.2.238:/media/disk4 49175 0 Y 2017 NFS Server on localhost 2049 0 Y 2036 Task Status of Volume vaulttest52 ------------------------------------------------------------------------------ There are no active volume tasks gfsadmin@gfs-tst-09:/mnt/gluster$ md5sum 3.txt fa9d9d3e298d01c8cf54855968784b83 3.txt gfsadmin@gfs-tst-09:/mnt/gluster$ md5sum 3.txt fa9d9d3e298d01c8cf54855968784b83 3.txt gfsadmin@gfs-tst-09:/mnt/gluster$ md5sum 3.txt fa9d9d3e298d01c8cf54855968784b83 3.txt gfsadmin@gfs-tst-09:/mnt/gluster$ md5sum 3.txt fa9d9d3e298d01c8cf54855968784b83 3.txt gfsadmin@gfs-tst-09:/mnt/gluster$ ls 1.txt 2.txt 3.txt gfsadmin@gfs-tst-09:/mnt/gluster$ ls 1.txt 2.txt 3.txt gfsadmin@gfs-tst-09:/mnt/gluster$ ls 1.txt 2.txt 3.txt gfsadmin@gfs-tst-09:/mnt/gluster$ md5sum 3.txt ea50603ce500b29c73dca6a9c733eb7a 3.txt gfsadmin@gfs-tst-09:/$ sudo umount /mnt/gluster gfsadmin@gfs-tst-09:/$ sudo mount -t glusterfs 10.1.2.238:/vaulttest52 /mnt/gluster/ gfsadmin@gfs-tst-09:/$ cd /mnt/gluster/ gfsadmin@gfs-tst-09:/mnt/gluster$ md5sum 3.txt ea50603ce500b29c73dca6a9c733eb7a 3.txt After putting ls command in local dir, the md5sum hash has been changed (In reply to Backer from comment #6) Thanks for the detailed description. We have been able to identify the cause of this problem. Self-heal doesn't correctly heal files on volumes where the number of data bricks is not a power of 2. I'll send a patch to solve this. REVIEW: http://review.gluster.org/11869 (cluster/ec: Fix write size in self-heal) posted (#1) for review on release-3.7 by Xavier Hernandez (xhernandez) Can you check if the last patch solves the problem ? COMMIT: http://review.gluster.org/11869 committed in release-3.7 by Pranith Kumar Karampuri (pkarampu) ------ commit fc3da7299dc2adaf66076bfbfebe4a87582f7008 Author: Xavier Hernandez <xhernandez> Date: Fri Aug 7 12:37:52 2015 +0200 cluster/ec: Fix write size in self-heal Self-heal was always using a fixed block size to heal a file. This was incorrect for dispersed volumes with a number of data bricks not being a power of 2. This patch adjusts the block size to a multiple of the stripe size of the volume. It also propagates errors detected during the data heal to stop healing the file and not mark it as healed. This is a backport if http//review.gluster.org/11862 Change-Id: I5104ae4bfed8585ca40cb45831ca20582566370c BUG: 1236050 Signed-off-by: Xavier Hernandez <xhernandez> Reviewed-on: http://review.gluster.org/11869 Tested-by: Gluster Build System <jenkins.com> Tested-by: NetBSD Build System <jenkins.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu> The issue has been solved after apply the attached patch(http://review.gluster.org/11869). Thanks Backer for the confirmation and help with reproducible test case. This patch is merged now. This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.4, please open a new bug report. glusterfs-3.7.4 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/12496 [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user |