Bug 1236050
| Summary: | Disperse volume: fuse mount hung after self healing | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Backer <mdfakkeer> | ||||
| Component: | disperse | Assignee: | Pranith Kumar K <pkarampu> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||
| Severity: | high | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 3.7.2 | CC: | bugs, gluster-bugs, jahernan, pkarampu, rkavunga | ||||
| Target Milestone: | --- | Keywords: | Triaged | ||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | glusterfs-3.7.4 | Doc Type: | Bug Fix | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | |||||||
| : | 1251446 (view as bug list) | Environment: | |||||
| Last Closed: | 2015-09-09 09:38:04 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | 1251446 | ||||||
| Bug Blocks: | 1248533 | ||||||
| Attachments: |
|
||||||
|
Description
Backer
2015-06-26 12:35:22 UTC
hi Backer,
Could you try this test with 3.7.3 please. We fixed 2-3 hang bugs so it would be great if you could let us know if it still happens. Meanwhile Xavi and I are going to work on 1235964 you raised. Do you hangout on #gluster IRC? It would be great to know your feedback about 3.7.3 to see what you think about the stability of EC. We feel EC is almost ready for production with 3.7.3 release based on our tests in lab.
Pranith
I have tested the 3.7.3 as well as 3.7.2 nightly build( glusterfs-3.7.2-20150726.b639cb9.tar.gz) for the I/O error and handout issue. I found that 3.7.3 has the data corruption issue which is not present is 3.7.2 nightly build( glusterfs-3.7.2-20150707.36f24f5.tar.gz). Data has been corrupted after replacing the failed drive and running the self heal. Even we find the data corruption after the recovery of node failure ,When unavailable data chunks has been copied by proactive self heal daemon. You can reproduce the bug through the following steps Steps to reproduce: 1. create a 3x(4+2) disperse volume across nodes 2. FUSE mount on the client and start creating files/directories with mkdir and rsync/dd 3. Now, bring down 2 of the nodes(node 5 & 6) 4. write some files(eg filenew1, filenew2). The files will be available only on 4 nodes( node 1,2,3 & 4 ) 5. calculate the md5sum of filenew1 and filenew2 6. Now bring up the failed/down 2 nodes( node 5 & 6) 6. Pro active Self healing will create unavailable data chunks on 2 nodes (node 5 & 6). 7. Once finish the self healing, bring down another two nodes (node 1 & 2) 8. Now try to get the mdsum of same recovered file, there will be a mismatch in md5sum value. But this bug is not available in 3.7.2 nightly build (glusterfs-3.7.2-20150707.36f24f5.tar.gz) Also i would like to know, why the proactive self healing is not happening after replacing the failed drives. I have to manually run the volume heal command for healing the unavailable files. hi Backer,
Thanks for the quick reply. Based on your comment, I am assuming no hangs are observed. Auto-healing of replace-brick/disk-replacement is something we are working for 3.7.4, until then you need to execute "gluster volume heal ec2 full".
As for the data corruption bug, I am not able to re-create it:
Let me know if I missed any step:
root@localhost - ~
14:48:24 :) ⚡ glusterd && gluster volume create ec2 disperse 6 redundancy 2 `hostname`:/home/gfs/ec_{0..5} force && gluster volume start ec2 && mount -t glusterfs `hostname`:/ec2 /mnt/ec2
volume create: ec2: success: please start the volume to access data
volume start: ec2: success
#I disabled perf-xlators so that reads are served from the bricks always
root@localhost - ~
14:48:38 :( ⚡ ~/.scripts/disable-perf-xl.sh ec2
+ gluster volume set ec2 performance.quick-read off
volume set: success
+ gluster volume set ec2 performance.io-cache off
volume set: success
+ gluster volume set ec2 performance.write-behind off
volume set: success
+ gluster volume set ec2 performance.stat-prefetch off
volume set: success
+ gluster volume set ec2 performance.read-ahead off
volume set: success
+ gluster volume set ec2 performance.open-behind off
volume set: success
root@localhost - ~
14:48:47 :) ⚡ cd /mnt/ec2/
root@localhost - /mnt/ec2
14:48:59 :) ⚡ gluster v status
Status of volume: ec2
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick localhost.localdomain:/home/gfs/ec_0 49152 0 Y 14828
Brick localhost.localdomain:/home/gfs/ec_1 49153 0 Y 14846
Brick localhost.localdomain:/home/gfs/ec_2 49155 0 Y 14864
Brick localhost.localdomain:/home/gfs/ec_3 49156 0 Y 14882
Brick localhost.localdomain:/home/gfs/ec_4 49157 0 Y 14900
Brick localhost.localdomain:/home/gfs/ec_5 49158 0 Y 14918
NFS Server on localhost 2049 0 Y 14937
Task Status of Volume ec2
------------------------------------------------------------------------------
There are no active volume tasks
root@localhost - /mnt/ec2
14:49:02 :) ⚡ kill -9 14918 14900
root@localhost - /mnt/ec2
14:49:11 :) ⚡ dd if=/dev/urandom of=1.txt bs=1M count=2
2+0 records in
2+0 records out
2097152 bytes (2.1 MB) copied, 0.153835 s, 13.6 MB/s
root@localhost - /mnt/ec2
14:49:15 :) ⚡ md5sum 1.txt
5ead68d0a60b8134f7daf0e8d1afe19c 1.txt
root@localhost - /mnt/ec2
14:49:23 :) ⚡ gluster v start ec2 force
volume start: ec2: success
root@localhost - /mnt/ec2
14:49:35 :) ⚡ gluster v heal ec2
Launching heal operation to perform index self heal on volume ec2 has been successful
Use heal info commands to check status
root@localhost - /mnt/ec2
14:49:39 :) ⚡ gluster v heal ec2 info
Brick localhost.localdomain:/home/gfs/ec_0/
/1.txt
Number of entries: 1
Brick localhost.localdomain:/home/gfs/ec_1/
/1.txt
Number of entries: 1
Brick localhost.localdomain:/home/gfs/ec_2/
/1.txt
Number of entries: 1
Brick localhost.localdomain:/home/gfs/ec_3/
/1.txt
Number of entries: 1
Brick localhost.localdomain:/home/gfs/ec_4/
Number of entries: 0
Brick localhost.localdomain:/home/gfs/ec_5/
Number of entries: 0
root@localhost - /mnt/ec2
14:49:45 :) ⚡ gluster v heal ec2
Launching heal operation to perform index self heal on volume ec2 has been successful
Use heal info commands to check status
root@localhost - /mnt/ec2
14:49:47 :) ⚡ gluster v heal ec2 info
Brick localhost.localdomain:/home/gfs/ec_0/
Number of entries: 0
Brick localhost.localdomain:/home/gfs/ec_1/
Number of entries: 0
Brick localhost.localdomain:/home/gfs/ec_2/
Number of entries: 0
Brick localhost.localdomain:/home/gfs/ec_3/
Number of entries: 0
Brick localhost.localdomain:/home/gfs/ec_4/
Number of entries: 0
Brick localhost.localdomain:/home/gfs/ec_5/
Number of entries: 0
root@localhost - /mnt/ec2
14:49:51 :) ⚡ gluster v status
Status of volume: ec2
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick localhost.localdomain:/home/gfs/ec_0 49152 0 Y 14828
Brick localhost.localdomain:/home/gfs/ec_1 49153 0 Y 14846
Brick localhost.localdomain:/home/gfs/ec_2 49155 0 Y 14864
Brick localhost.localdomain:/home/gfs/ec_3 49156 0 Y 14882
Brick localhost.localdomain:/home/gfs/ec_4 49157 0 Y 15173
Brick localhost.localdomain:/home/gfs/ec_5 49158 0 Y 15191
NFS Server on localhost 2049 0 Y 15211
Task Status of Volume ec2
------------------------------------------------------------------------------
There are no active volume tasks
root@localhost - /mnt/ec2
14:49:56 :) ⚡ kill -9 14828 14846
root@localhost - /mnt/ec2
14:50:03 :) ⚡ md5sum 1.txt
5ead68d0a60b8134f7daf0e8d1afe19c 1.txt
root@localhost - /mnt/ec2
14:50:06 :) ⚡ cd
root@localhost - ~
14:50:13 :) ⚡ umount /mnt/ec2
root@localhost - ~
14:50:16 :) ⚡ mount -t glusterfs `hostname`:/ec2 /mnt/ec2
root@localhost - ~
14:50:19 :) ⚡ md5sum /mnt/ec2/1.txt
5ead68d0a60b8134f7daf0e8d1afe19c /mnt/ec2/1.txt
Created attachment 1059799 [details]
Different test scenarios and result
I am getting random test results after disabled and enabled the perf-xlators. Please refer the attachment. root@gfs-tst-08:/home/qubevaultadmin# gluster --version glusterfs 3.7.3 built on Jul 31 2015 17:03:01 Repository revision: git://git.gluster.com/glusterfs.git Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com> GlusterFS comes with ABSOLUTELY NO WARRANTY. You may redistribute copies of GlusterFS under the terms of the GNU General Public License. root@gfs-tst-08:/home/gfsadmin# gluster volume info Volume Name: vaulttest39 Type: Disperse Volume ID: fcbed6b5-0654-489c-a29e-d18f737ac2f7 Status: Started Number of Bricks: 1 x (3 + 1) = 4 Transport-type: tcp Bricks: Brick1: 10.1.2.238:/media/disk1 Brick2: 10.1.2.238:/media/disk2 Brick3: 10.1.2.238:/media/disk3 Brick4: 10.1.2.238:/media/disk4 Options Reconfigured: performance.readdir-ahead: on performance.quick-read: off performance.io-cache: off performance.write-behind: off performance.stat-prefetch: off performance.read-ahead: off performance.open-behind: off gfsadmin@gfs-tst-08:~$ sudo gluster volume status Status of volume: vaulttest39 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.1.2.238:/media/disk1 49152 0 Y 1560 Brick 10.1.2.238:/media/disk2 49153 0 Y 1568 Brick 10.1.2.238:/media/disk3 49154 0 Y 1576 Brick 10.1.2.238:/media/disk4 49155 0 Y 1582 NFS Server on localhost 2049 0 Y 1544 Task Status of Volume vaulttest39 ------------------------------------------------------------------------------ There are no active volume tasks gfsadmin@gfs-tst-08:~$ sudo kill -9 1560 gfsadmin@gfs-tst-08:~$ sudo gluster volume status Status of volume: vaulttest39 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.1.2.238:/media/disk1 N/A N/A N N/A Brick 10.1.2.238:/media/disk2 49153 0 Y 1568 Brick 10.1.2.238:/media/disk3 49154 0 Y 1576 Brick 10.1.2.238:/media/disk4 49155 0 Y 1582 NFS Server on localhost 2049 0 Y 1544 Task Status of Volume vaulttest39 ------------------------------------------------------------------------------ There are no active volume tasks root@gfs-tst-09:/mnt/gluster# dd if=/dev/urandom of=2.txt bs=1M count=2 2+0 records in 2+0 records out 2097152 bytes (2.1 MB) copied, 0.226147 s, 9.3 MB/s root@gfs-tst-09:/mnt/gluster# md5sum 2.txt cd9db53f9c090958ff8c033161576b95 2.txt gfsadmin@gfs-tst-08:~$ ls -l -h /media/disk{1..4} /media/disk1: total 960K -rw-r--r-- 2 root root 683K Aug 6 13:58 1.txt /media/disk2: total 1.9M -rw-r--r-- 2 root root 683K Aug 6 13:58 1.txt -rw-r--r-- 2 root root 683K Aug 6 13:59 2.txt /media/disk3: total 1.9M -rw-r--r-- 2 root root 683K Aug 6 13:58 1.txt -rw-r--r-- 2 root root 683K Aug 6 13:59 2.txt /media/disk4: total 1.9M -rw-r--r-- 2 root root 683K Aug 6 13:58 1.txt -rw-r--r-- 2 root root 683K Aug 6 13:59 2.txt root@gfs-tst-08:/home/gfsadmin# gluster v start vaulttest39 force volume start: vaulttest39: success root@gfs-tst-08:/home/gfsadmin# gluster v heal vaulttest39 Launching heal operation to perform index self heal on volume vaulttest39 has been successful Use heal info commands to check status root@gfs-tst-08:/home/gfsadmin# gluster v heal vaulttest39 info Brick gfs-tst-08:/media/disk1/ Number of entries: 0 Brick gfs-tst-08:/media/disk2/ Number of entries: 0 Brick gfs-tst-08:/media/disk3/ Number of entries: 0 Brick gfs-tst-08:/media/disk4/ Number of entries: 0 root@gfs-tst-08:/home/gfsadmin# gluster v heal vaulttest39 Launching heal operation to perform index self heal on volume vaulttest39 has been successful Use heal info commands to check status root@gfs-tst-08:/home/gfsadmin# gluster v heal vaulttest39 info Brick gfs-tst-08:/media/disk1/ Number of entries: 0 Brick gfs-tst-08:/media/disk2/ Number of entries: 0 Brick gfs-tst-08:/media/disk3/ Number of entries: 0 Brick gfs-tst-08:/media/disk4/ Number of entries: 0 root@gfs-tst-08:/home/gfsadmin# ls -l -h /media/disk{1..4} /media/disk1: total 1004K -rw-r--r-- 2 root root 683K Aug 6 13:58 1.txt -rw-r--r-- 2 root root 683K Aug 6 13:59 2.txt /media/disk2: total 1.9M -rw-r--r-- 2 root root 683K Aug 6 13:58 1.txt -rw-r--r-- 2 root root 683K Aug 6 13:59 2.txt /media/disk3: total 1.9M -rw-r--r-- 2 root root 683K Aug 6 13:58 1.txt -rw-r--r-- 2 root root 683K Aug 6 13:59 2.txt /media/disk4: total 1.9M -rw-r--r-- 2 root root 683K Aug 6 13:58 1.txt -rw-r--r-- 2 root root 683K Aug 6 13:59 2.txt root@gfs-tst-08:/home/gfsadmin# gluster volume status Status of volume: vaulttest39 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.1.2.238:/media/disk1 49152 0 Y 1721 Brick 10.1.2.238:/media/disk2 49153 0 Y 1568 Brick 10.1.2.238:/media/disk3 49154 0 Y 1576 Brick 10.1.2.238:/media/disk4 49155 0 Y 1582 NFS Server on localhost 2049 0 Y 1740 Task Status of Volume vaulttest39 ------------------------------------------------------------------------------ There are no active volume tasks root@gfs-tst-08:/home/gfsadmin# kill -9 1582 root@gfs-tst-08:/home/gfsadmin# gluster volume status Status of volume: vaulttest39 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.1.2.238:/media/disk1 49152 0 Y 1721 Brick 10.1.2.238:/media/disk2 49153 0 Y 1568 Brick 10.1.2.238:/media/disk3 49154 0 Y 1576 Brick 10.1.2.238:/media/disk4 N/A N/A N N/A NFS Server on localhost 2049 0 Y 1740 Task Status of Volume vaulttest39 ------------------------------------------------------------------------------ There are no active volume tasks root@gfs-tst-09:/mnt/gluster# md5sum 2.txt cd9db53f9c090958ff8c033161576b95 2.txt root@gfs-tst-09:/mnt/gluster# md5sum 2.txt cd9db53f9c090958ff8c033161576b95 2.txt root@gfs-tst-09:/mnt/gluster# ls 1.txt 2.txt root@gfs-tst-09:/mnt/gluster# ls 1.txt 2.txt root@gfs-tst-09:/mnt/gluster# md5sum 2.txt 70b40a7e3f5dc85345e466968416cde1 2.txt root@gfs-tst-09:/mnt/gluster# md5sum 2.txt 70b40a7e3f5dc85345e466968416cde1 2.txt root@gfs-tst-09:/mnt/gluster# md5sum 2.txt 70b40a7e3f5dc85345e466968416cde1 2.txt root@gfs-tst-09:/mnt/gluster# I have created a new volume once again and confirmed the bug.
root@gfs-tst-08:/home/gfsadmin# gluster volume create vaulttest52 disperse-data 3 redundancy 1 10.1.2.238:/media/disk{1..4} force
root@gfs-tst-08:/home/gfsadmin# gluster v start vaulttest52
root@gfs-tst-08:/home/gfsadmin# gluster v status
Status of volume: vaulttest52
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.1.2.238:/media/disk1 49172 0 Y 1574
Brick 10.1.2.238:/media/disk2 49173 0 Y 1582
Brick 10.1.2.238:/media/disk3 49174 0 Y 1595
Brick 10.1.2.238:/media/disk4 49175 0 Y 1590
NFS Server on localhost 2049 0 Y 1558
Task Status of Volume vaulttest52
------------------------------------------------------------------------------
There are no active volume tasks
root@gfs-tst-08:/home/gfsadmin# gluster v info
Volume Name: vaulttest52
Type: Disperse
Volume ID: 0b0b3f8f-acb9-4e2c-a029-fcb89f85b1e7
Status: Started
Number of Bricks: 1 x (3 + 1) = 4
Transport-type: tcp
Bricks:
Brick1: 10.1.2.238:/media/disk1
Brick2: 10.1.2.238:/media/disk2
Brick3: 10.1.2.238:/media/disk3
Brick4: 10.1.2.238:/media/disk4
Options Reconfigured:
performance.readdir-ahead: on
gfsadmin@gfs-tst-09:/mnt/gluster$ sudo dd if=/dev/urandom of=1.txt bs=1M count=2
2+0 records in
2+0 records out
2097152 bytes (2.1 MB) copied, 0.208704 s, 10.0 MB/s
gfsadmin@gfs-tst-09:/mnt/gluster$ md5sum 1.txt
1233b5321315c05abb4668cc9a1d9d25 1.txt
root@gfs-tst-08:/home/gfsadmin# ls -l -h /media/disk{1..4}
/media/disk1:
total 960K
-rw-r--r-- 2 root root 683K Aug 6 19:14 1.txt
/media/disk2:
total 960K
-rw-r--r-- 2 root root 683K Aug 6 19:14 1.txt
/media/disk3:
total 960K
-rw-r--r-- 2 root root 683K Aug 6 19:14 1.txt
/media/disk4:
total 960K
-rw-r--r-- 2 root root 683K Aug 6 19:14 1.txt
root@gfs-tst-08:/home/gfsadmin# kill -9 1574
root@gfs-tst-08:/home/gfsadmin# gluster v status
Status of volume: vaulttest52
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.1.2.238:/media/disk1 N/A N/A N N/A
Brick 10.1.2.238:/media/disk2 49173 0 Y 1582
Brick 10.1.2.238:/media/disk3 49174 0 Y 1595
Brick 10.1.2.238:/media/disk4 49175 0 Y 1590
NFS Server on localhost 2049 0 Y 1558
Task Status of Volume vaulttest52
------------------------------------------------------------------------------
There are no active volume tasks
gfsadmin@gfs-tst-09:/mnt/gluster$ sudo dd if=/dev/urandom of=2.txt bs=1M count=2
2+0 records in
2+0 records out
2097152 bytes (2.1 MB) copied, 0.205401 s, 10.2 MB/s
gfsadmin@gfs-tst-09:/mnt/gluster$ md5sum 2.txt
9c8b37847622efbf2ec75c683166de97 2.txt
root@gfs-tst-08:/home/gfsadmin# ls -l -h /media/disk{1..4}
/media/disk1:
total 960K
-rw-r--r-- 2 root root 683K Aug 6 19:14 1.txt
/media/disk2:
total 1.9M
-rw-r--r-- 2 root root 683K Aug 6 19:14 1.txt
-rw-r--r-- 2 root root 683K Aug 6 19:16 2.txt
/media/disk3:
total 1.9M
-rw-r--r-- 2 root root 683K Aug 6 19:14 1.txt
-rw-r--r-- 2 root root 683K Aug 6 19:16 2.txt
/media/disk4:
total 1.4M
-rw-r--r-- 2 root root 683K Aug 6 19:14 1.txt
-rw-r--r-- 2 root root 683K Aug 6 19:16 2.txt
root@gfs-tst-08:/home/gfsadmin# gluster v start vaulttest52 force
volume start: vaulttest52: success
root@gfs-tst-08:/home/gfsadmin# gluster v status
Status of volume: vaulttest52
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.1.2.238:/media/disk1 49172 0 Y 1739
Brick 10.1.2.238:/media/disk2 49173 0 Y 1582
Brick 10.1.2.238:/media/disk3 49174 0 Y 1595
Brick 10.1.2.238:/media/disk4 49175 0 Y 1590
NFS Server on localhost 2049 0 Y 1758
Task Status of Volume vaulttest52
------------------------------------------------------------------------------
There are no active volume tasks
root@gfs-tst-08:/home/gfsadmin# gluster v heal vaulttest52
Launching heal operation to perform index self heal on volume vaulttest52 has been successful
Use heal info commands to check status
root@gfs-tst-08:/home/gfsadmin# gluster v heal vaulttest52 info
Brick gfs-tst-08:/media/disk1/
Number of entries: 0
Brick gfs-tst-08:/media/disk2/
Number of entries: 0
Brick gfs-tst-08:/media/disk3/
Number of entries: 0
Brick gfs-tst-08:/media/disk4/
Number of entries: 0
root@gfs-tst-08:/home/gfsadmin# ls -l -h /media/disk{1..4}
/media/disk1:
total 728K
-rw-r--r-- 2 root root 683K Aug 6 19:14 1.txt
-rw-r--r-- 2 root root 683K Aug 6 19:16 2.txt
/media/disk2:
total 1.4M
-rw-r--r-- 2 root root 683K Aug 6 19:14 1.txt
-rw-r--r-- 2 root root 683K Aug 6 19:16 2.txt
/media/disk3:
total 1.4M
-rw-r--r-- 2 root root 683K Aug 6 19:14 1.txt
-rw-r--r-- 2 root root 683K Aug 6 19:16 2.txt
/media/disk4:
total 1.4M
-rw-r--r-- 2 root root 683K Aug 6 19:14 1.txt
-rw-r--r-- 2 root root 683K Aug 6 19:16 2.txt
root@gfs-tst-08:/home/gfsadmin# gluster v status
Status of volume: vaulttest52
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.1.2.238:/media/disk1 49172 0 Y 1739
Brick 10.1.2.238:/media/disk2 49173 0 Y 1582
Brick 10.1.2.238:/media/disk3 49174 0 Y 1595
Brick 10.1.2.238:/media/disk4 49175 0 Y 1590
NFS Server on localhost 2049 0 Y 1758
Task Status of Volume vaulttest52
------------------------------------------------------------------------------
There are no active volume tasks
root@gfs-tst-08:/home/gfsadmin# kill -9 1590
root@gfs-tst-08:/home/gfsadmin# gluster v status
Status of volume: vaulttest52
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.1.2.238:/media/disk1 49172 0 Y 1739
Brick 10.1.2.238:/media/disk2 49173 0 Y 1582
Brick 10.1.2.238:/media/disk3 49174 0 Y 1595
Brick 10.1.2.238:/media/disk4 N/A N/A N N/A
NFS Server on localhost 2049 0 Y 1758
Task Status of Volume vaulttest52
------------------------------------------------------------------------------
There are no active volume tasks
gfsadmin@gfs-tst-09:/mnt/gluster$ md5sum 2.txt
96f6f469f4b743b4a575fdc408b5f007 2.txt
gfsadmin@gfs-tst-09:/mnt/gluster$ md5sum 2.txt
96f6f469f4b743b4a575fdc408b5f007 2.txt
gfsadmin@gfs-tst-09:/mnt/gluster$ md5sum 2.txt
96f6f469f4b743b4a575fdc408b5f007 2.txt
gfsadmin@gfs-tst-09:/mnt/gluster$ ls
1.txt 2.txt
gfsadmin@gfs-tst-09:/mnt/gluster$ ls
1.txt 2.txt
gfsadmin@gfs-tst-09:/mnt/gluster$ ls
1.txt 2.txt
gfsadmin@gfs-tst-09:/mnt/gluster$ md5sum 2.txt
96f6f469f4b743b4a575fdc408b5f007 2.txt
gfsadmin@gfs-tst-09:/mnt/gluster$ md5sum 2.txt
96f6f469f4b743b4a575fdc408b5f007 2.txt
=====================================
MD5SUM has ben changed
====================================
root@gfs-tst-08:/home/gfsadmin# gluster v start vaulttest52 force
volume start: vaulttest52: success
root@gfs-tst-08:/home/gfsadmin# gluster v status
Status of volume: vaulttest52
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.1.2.238:/media/disk1 49172 0 Y 1739
Brick 10.1.2.238:/media/disk2 49173 0 Y 1582
Brick 10.1.2.238:/media/disk3 49174 0 Y 1595
Brick 10.1.2.238:/media/disk4 49175 0 Y 1852
NFS Server on localhost 2049 0 Y 1871
Task Status of Volume vaulttest52
------------------------------------------------------------------------------
There are no active volume tasks
======================================
disabled perf-xlators
=====================================
root@gfs-tst-08:/home/gfsadmin# gluster volume set vaulttest52 performance.quick-read off
gluster volume set vaulttest52 performance.io-cache off
gluster volume set vaulttest52 performance.write-behind off
gluster volume set vaulttest52 performance.stat-prefetch off
gluster volume set vaulttest52 performance.read-ahead off
gluster volume set vaulttest52 performance.open-behind off
volume set: success
root@gfs-tst-08:/home/gfsadmin# gluster volume set vaulttest52 performance.io-cache off
volume set: success
root@gfs-tst-08:/home/gfsadmin# gluster volume set vaulttest52 performance.write-behind off
volume set: success
root@gfs-tst-08:/home/gfsadmin# gluster volume set vaulttest52 performance.stat-prefetch off
volume set: success
root@gfs-tst-08:/home/gfsadmin# gluster volume set vaulttest52 performance.read-ahead off
volume set: success
root@gfs-tst-08:/home/gfsadmin# gluster volume set vaulttest52 performance.open-behind off
volume set: success
root@gfs-tst-08:/home/gfsadmin# gluster v info
Volume Name: vaulttest52
Type: Disperse
Volume ID: 0b0b3f8f-acb9-4e2c-a029-fcb89f85b1e7
Status: Started
Number of Bricks: 1 x (3 + 1) = 4
Transport-type: tcp
Bricks:
Brick1: 10.1.2.238:/media/disk1
Brick2: 10.1.2.238:/media/disk2
Brick3: 10.1.2.238:/media/disk3
Brick4: 10.1.2.238:/media/disk4
Options Reconfigured:
performance.open-behind: off
performance.read-ahead: off
performance.stat-prefetch: off
performance.write-behind: off
performance.io-cache: off
performance.quick-read: off
performance.readdir-ahead: on
root@gfs-tst-08:/home/gfsadmin# gluster v status
Status of volume: vaulttest52
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.1.2.238:/media/disk1 49172 0 Y 1739
Brick 10.1.2.238:/media/disk2 49173 0 Y 1582
Brick 10.1.2.238:/media/disk3 49174 0 Y 1595
Brick 10.1.2.238:/media/disk4 49175 0 Y 1852
NFS Server on localhost 2049 0 Y 1871
Task Status of Volume vaulttest52
------------------------------------------------------------------------------
There are no active volume tasks
root@gfs-tst-08:/home/gfsadmin# kill -9 1852
root@gfs-tst-08:/home/gfsadmin# gluster v status
Status of volume: vaulttest52
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.1.2.238:/media/disk1 49172 0 Y 1739
Brick 10.1.2.238:/media/disk2 49173 0 Y 1582
Brick 10.1.2.238:/media/disk3 49174 0 Y 1595
Brick 10.1.2.238:/media/disk4 N/A N/A N N/A
NFS Server on localhost 2049 0 Y 1871
Task Status of Volume vaulttest52
------------------------------------------------------------------------------
There are no active volume tasks
gfsadmin@gfs-tst-09:/mnt/gluster$ sudo dd if=/dev/urandom of=3.txt bs=5M count=10
10+0 records in
10+0 records out
52428800 bytes (52 MB) copied, 5.40714 s, 9.7 MB/s
gfsadmin@gfs-tst-09:/mnt/gluster$ md5sum 3.txt
fa9d9d3e298d01c8cf54855968784b83 3.txt
root@gfs-tst-08:/home/gfsadmin# gluster v start vaulttest52 force
volume start: vaulttest52: success
root@gfs-tst-08:/home/gfsadmin# gluster v status
Status of volume: vaulttest52
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.1.2.238:/media/disk1 49172 0 Y 1739
Brick 10.1.2.238:/media/disk2 49173 0 Y 1582
Brick 10.1.2.238:/media/disk3 49174 0 Y 1595
Brick 10.1.2.238:/media/disk4 49175 0 Y 2017
NFS Server on localhost N/A N/A N N/A
Task Status of Volume vaulttest52
------------------------------------------------------------------------------
There are no active volume tasks
root@gfs-tst-08:/home/gfsadmin# gluster v heal vaulttest52
Launching heal operation to perform index self heal on volume vaulttest52 has been successful
Use heal info commands to check status
root@gfs-tst-08:/home/gfsadmin# gluster v heal vaulttest52 info
Brick gfs-tst-08:/media/disk1/
Number of entries: 0
Brick gfs-tst-08:/media/disk2/
Number of entries: 0
Brick gfs-tst-08:/media/disk3/
Number of entries: 0
Brick gfs-tst-08:/media/disk4/
Number of entries: 0
root@gfs-tst-08:/home/gfsadmin# ls -l -h /media/disk{1..4}
/media/disk1:
total 33M
-rw-r--r-- 2 root root 683K Aug 6 19:14 1.txt
-rw-r--r-- 2 root root 683K Aug 6 19:16 2.txt
-rw-r--r-- 2 root root 17M Aug 6 19:26 3.txt
/media/disk2:
total 34M
-rw-r--r-- 2 root root 683K Aug 6 19:14 1.txt
-rw-r--r-- 2 root root 683K Aug 6 19:16 2.txt
-rw-r--r-- 2 root root 17M Aug 6 19:26 3.txt
/media/disk3:
total 34M
-rw-r--r-- 2 root root 683K Aug 6 19:14 1.txt
-rw-r--r-- 2 root root 683K Aug 6 19:16 2.txt
-rw-r--r-- 2 root root 17M Aug 6 19:26 3.txt
/media/disk4:
total 1.4M
-rw-r--r-- 2 root root 683K Aug 6 19:14 1.txt
-rw-r--r-- 2 root root 683K Aug 6 19:16 2.txt
-rw-r--r-- 2 root root 17M Aug 6 19:26 3.txt
root@gfs-tst-08:/home/gfsadmin# gluster v status
Status of volume: vaulttest52
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.1.2.238:/media/disk1 49172 0 Y 1739
Brick 10.1.2.238:/media/disk2 49173 0 Y 1582
Brick 10.1.2.238:/media/disk3 49174 0 Y 1595
Brick 10.1.2.238:/media/disk4 49175 0 Y 2017
NFS Server on localhost 2049 0 Y 2036
Task Status of Volume vaulttest52
------------------------------------------------------------------------------
There are no active volume tasks
root@gfs-tst-08:/home/gfsadmin# kill -9 1582
root@gfs-tst-08:/home/gfsadmin# gluster v status
Status of volume: vaulttest52
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.1.2.238:/media/disk1 49172 0 Y 1739
Brick 10.1.2.238:/media/disk2 N/A N/A N N/A
Brick 10.1.2.238:/media/disk3 49174 0 Y 1595
Brick 10.1.2.238:/media/disk4 49175 0 Y 2017
NFS Server on localhost 2049 0 Y 2036
Task Status of Volume vaulttest52
------------------------------------------------------------------------------
There are no active volume tasks
gfsadmin@gfs-tst-09:/mnt/gluster$ md5sum 3.txt
fa9d9d3e298d01c8cf54855968784b83 3.txt
gfsadmin@gfs-tst-09:/mnt/gluster$ md5sum 3.txt
fa9d9d3e298d01c8cf54855968784b83 3.txt
gfsadmin@gfs-tst-09:/mnt/gluster$ md5sum 3.txt
fa9d9d3e298d01c8cf54855968784b83 3.txt
gfsadmin@gfs-tst-09:/mnt/gluster$ md5sum 3.txt
fa9d9d3e298d01c8cf54855968784b83 3.txt
gfsadmin@gfs-tst-09:/mnt/gluster$ ls
1.txt 2.txt 3.txt
gfsadmin@gfs-tst-09:/mnt/gluster$ ls
1.txt 2.txt 3.txt
gfsadmin@gfs-tst-09:/mnt/gluster$ ls
1.txt 2.txt 3.txt
gfsadmin@gfs-tst-09:/mnt/gluster$ md5sum 3.txt
ea50603ce500b29c73dca6a9c733eb7a 3.txt
gfsadmin@gfs-tst-09:/$ sudo umount /mnt/gluster
gfsadmin@gfs-tst-09:/$ sudo mount -t glusterfs 10.1.2.238:/vaulttest52 /mnt/gluster/
gfsadmin@gfs-tst-09:/$ cd /mnt/gluster/
gfsadmin@gfs-tst-09:/mnt/gluster$ md5sum 3.txt
ea50603ce500b29c73dca6a9c733eb7a 3.txt
After putting ls command in local dir, the md5sum hash has been changed
(In reply to Backer from comment #6) Thanks for the detailed description. We have been able to identify the cause of this problem. Self-heal doesn't correctly heal files on volumes where the number of data bricks is not a power of 2. I'll send a patch to solve this. REVIEW: http://review.gluster.org/11869 (cluster/ec: Fix write size in self-heal) posted (#1) for review on release-3.7 by Xavier Hernandez (xhernandez) Can you check if the last patch solves the problem ? COMMIT: http://review.gluster.org/11869 committed in release-3.7 by Pranith Kumar Karampuri (pkarampu) ------ commit fc3da7299dc2adaf66076bfbfebe4a87582f7008 Author: Xavier Hernandez <xhernandez> Date: Fri Aug 7 12:37:52 2015 +0200 cluster/ec: Fix write size in self-heal Self-heal was always using a fixed block size to heal a file. This was incorrect for dispersed volumes with a number of data bricks not being a power of 2. This patch adjusts the block size to a multiple of the stripe size of the volume. It also propagates errors detected during the data heal to stop healing the file and not mark it as healed. This is a backport if http//review.gluster.org/11862 Change-Id: I5104ae4bfed8585ca40cb45831ca20582566370c BUG: 1236050 Signed-off-by: Xavier Hernandez <xhernandez> Reviewed-on: http://review.gluster.org/11869 Tested-by: Gluster Build System <jenkins.com> Tested-by: NetBSD Build System <jenkins.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu> The issue has been solved after apply the attached patch(http://review.gluster.org/11869). Thanks Backer for the confirmation and help with reproducible test case. This patch is merged now. This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.4, please open a new bug report. glusterfs-3.7.4 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/12496 [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user |