+++ This bug was initially created as a clone of Bug #1236050 +++ Description of problem: In a 3 x (4 + 2) = 18 distributed disperse volume, fuse mount point hung after self healing of failed disk files and folders. Version-Release number of selected component (if applicable): glusterfs 3.7.2 built on Jun 19 2015 16:33:27 Repository revision: git://git.gluster.com/glusterfs.git <http://git.gluster.com/glusterfs.git> Copyright (coffee) 2006-2011 Gluster Inc. <http://www.gluster.com> GlusterFS comes with ABSOLUTELY NO WARRANTY. You may redistribute copies of GlusterFS under the terms of the GNU General Public License How reproducible: 100% Steps to Reproduce: 1. create a 3x(4+2) distributed disperse volume across nodes 2. FUSE mount on the client and start creating files/directories on the following hierarchy /mountpoint/folder1/file1 /mountpoint/folder2/file2 /mountpoint/folder3/file3 3. simulate the disk failure by killing pid of file2 disk on any one node and add again the same disk after formatting the drive 4. start volume by force 5. self haling adding the file2 with 0 bytes in newly formatted drive 6. wait more time to finish self healing, but self healing doesn't happen. The file2 resides on 0 bytes 7. Try to read file2 from client, now the file name with 0 byte is tried to recovery and recovery will be completed. Get the md5sum of the file2 with all storage nodes and the result is positive 8. Now, bring down 2 of the nodes other than failed drive. 9. Now try to ls the mount point, mount point will hang Actual results: mount point hung Expected results: Mount point should list all the folders Additional info: admin@node001:~$ sudo gluster volume info Volume Name: vaulttest21 Type: Distributed-Disperse Volume ID: ac6a374d-a0a2-405c-823d-0672fd92f0af Status: Started Number of Bricks: 3 x (4 + 2) = 18 Transport-type: tcp Bricks: Brick1: 10.1.2.1:/media/disk1 Brick2: 10.1.2.2:/media/disk1 Brick3: 10.1.2.3:/media/disk1 Brick4: 10.1.2.4:/media/disk1 Brick5: 10.1.2.5:/media/disk1 Brick6: 10.1.2.6:/media/disk1 Brick7: 10.1.2.1:/media/disk2 Brick8: 10.1.2.2:/media/disk2 Brick9: 10.1.2.3:/media/disk2 Brick10: 10.1.2.4:/media/disk2 Brick11: 10.1.2.5:/media/disk2 Brick12: 10.1.2.6:/media/disk2 Brick13: 10.1.2.1:/media/disk3 Brick14: 10.1.2.2:/media/disk3 Brick15: 10.1.2.3:/media/disk3 Brick16: 10.1.2.4:/media/disk3 Brick17: 10.1.2.5:/media/disk3 Brick18: 10.1.2.6:/media/disk3 Options Reconfigured: performance.readdir-ahead: on root@mas03:/mnt/gluster# ls -R .: test1 test2 test3 ./test1: testfile1 ./test2: testfile8 ./test3: testfile10 Try to simluate disk failure and add again same disk.After recovery put ls on client mount point, mount point will hung. node001:~$ sudo gluster volume get vaulttest21 all Option Value ------ ----- cluster.lookup-unhashed on cluster.lookup-optimize off cluster.min-free-disk 10% cluster.min-free-inodes 5% cluster.rebalance-stats off cluster.subvols-per-directory (null) cluster.readdir-optimize off cluster.rsync-hash-regex (null) cluster.extra-hash-regex (null) cluster.dht-xattr-name trusted.glusterfs.dht cluster.randomize-hash-range-by-gfid off cluster.rebal-throttle normal cluster.local-volume-name (null) cluster.weighted-rebalance on cluster.entry-change-log on cluster.read-subvolume (null) cluster.read-subvolume-index -1 cluster.read-hash-mode 1 cluster.background-self-heal-count 16 cluster.metadata-self-heal on cluster.data-self-heal on cluster.entry-self-heal on cluster.self-heal-daemon on cluster.heal-timeout 600 cluster.self-heal-window-size 1 cluster.data-change-log on cluster.metadata-change-log on cluster.data-self-heal-algorithm (null) cluster.eager-lock on cluster.quorum-type none cluster.quorum-count (null) cluster.choose-local true cluster.self-heal-readdir-size 1KB cluster.post-op-delay-secs 1 cluster.ensure-durability on cluster.consistent-metadata no cluster.stripe-block-size 128KB cluster.stripe-coalesce true diagnostics.latency-measurement off diagnostics.dump-fd-stats off diagnostics.count-fop-hits off diagnostics.brick-log-level INFO diagnostics.client-log-level INFO diagnostics.brick-sys-log-level CRITICAL diagnostics.client-sys-log-level CRITICAL diagnostics.brick-logger (null) diagnostics.client-logger (null) diagnostics.brick-log-format (null) diagnostics.client-log-format (null) diagnostics.brick-log-buf-size 5 diagnostics.client-log-buf-size 5 diagnostics.brick-log-flush-timeout 120 diagnostics.client-log-flush-timeout 120 performance.cache-max-file-size 0 performance.cache-min-file-size 0 performance.cache-refresh-timeout 1 performance.cache-priority performance.cache-size 32MB performance.io-thread-count 16 performance.high-prio-threads 16 performance.normal-prio-threads 16 performance.low-prio-threads 16 performance.least-prio-threads 1 performance.enable-least-priority on performance.least-rate-limit 0 performance.cache-size 128MB performance.flush-behind on performance.nfs.flush-behind on performance.write-behind-window-size 1MB performance.nfs.write-behind-window-size1MB performance.strict-o-direct off performance.nfs.strict-o-direct off performance.strict-write-ordering off performance.nfs.strict-write-ordering off performance.lazy-open yes performance.read-after-open no performance.read-ahead-page-count 4 performance.md-cache-timeout 1 features.encryption off encryption.master-key (null) encryption.data-key-size 256 encryption.block-size 4096 network.frame-timeout 1800 network.ping-timeout 42 network.tcp-window-size (null) features.lock-heal off features.grace-timeout 10 network.remote-dio disable client.event-threads 2 network.ping-timeout 42 network.tcp-window-size (null) network.inode-lru-limit 16384 auth.allow * auth.reject (null) transport.keepalive (null) server.allow-insecure (null) server.root-squash off server.anonuid 65534 server.anongid 65534 server.statedump-path /var/run/gluster server.outstanding-rpc-limit 64 features.lock-heal off features.grace-timeout (null) server.ssl (null) auth.ssl-allow * server.manage-gids off client.send-gids on server.gid-timeout 300 server.own-thread (null) server.event-threads 2 performance.write-behind on performance.read-ahead on performance.readdir-ahead on performance.io-cache on performance.quick-read on performance.open-behind on performance.stat-prefetch on performance.client-io-threads off performance.nfs.write-behind on performance.nfs.read-ahead off performance.nfs.io-cache off performance.nfs.quick-read off performance.nfs.stat-prefetch off performance.nfs.io-threads off performance.force-readdirp true features.file-snapshot off features.uss off features.snapshot-directory .snaps features.show-snapshot-directory off network.compression off network.compression.window-size -15 network.compression.mem-level 8 network.compression.min-size 0 network.compression.compression-level -1 network.compression.debug false features.limit-usage (null) features.quota-timeout 0 features.default-soft-limit 80% features.soft-timeout 60 features.hard-timeout 5 features.alert-time 86400 features.quota-deem-statfs off geo-replication.indexing off geo-replication.indexing off geo-replication.ignore-pid-check off geo-replication.ignore-pid-check off features.quota off features.inode-quota off features.bitrot disable debug.trace off debug.log-history no debug.log-file no debug.exclude-ops (null) debug.include-ops (null) debug.error-gen off debug.error-failure (null) debug.error-number (null) debug.random-failure off debug.error-fops (null) nfs.enable-ino32 no nfs.mem-factor 15 nfs.export-dirs on nfs.export-volumes on nfs.addr-namelookup off nfs.dynamic-volumes off nfs.register-with-portmap on nfs.outstanding-rpc-limit 16 nfs.port 2049 nfs.rpc-auth-unix on nfs.rpc-auth-null on nfs.rpc-auth-allow all nfs.rpc-auth-reject none nfs.ports-insecure off nfs.trusted-sync off nfs.trusted-write off nfs.volume-access read-write nfs.export-dir nfs.disable false nfs.nlm on nfs.acl on nfs.mount-udp off nfs.mount-rmtab /var/lib/glusterd/nfs/rmtab nfs.rpc-statd /sbin/rpc.statd nfs.server-aux-gids off nfs.drc off nfs.drc-size 0x20000 nfs.read-size (1 * 1048576ULL) nfs.write-size (1 * 1048576ULL) nfs.readdir-size (1 * 1048576ULL) nfs.exports-auth-enable (null) nfs.auth-refresh-interval-sec (null) nfs.auth-cache-ttl-sec (null) features.read-only off features.worm off storage.linux-aio off storage.batch-fsync-mode reverse-fsync storage.batch-fsync-delay-usec 0 storage.owner-uid -1 storage.owner-gid -1 storage.node-uuid-pathinfo off storage.health-check-interval 30 storage.build-pgfid off storage.bd-aio off cluster.server-quorum-type off cluster.server-quorum-ratio 0 changelog.changelog off changelog.changelog-dir (null) changelog.encoding ascii changelog.rollover-time 15 changelog.fsync-interval 5 changelog.changelog-barrier-timeout 120 changelog.capture-del-path off features.barrier disable features.barrier-timeout 120 features.trash off features.trash-dir .trashcan features.trash-eliminate-path (null) features.trash-max-filesize 5MB features.trash-internal-op off cluster.enable-shared-storage disable features.ctr-enabled off features.record-counters off features.ctr_link_consistency off locks.trace (null) cluster.disperse-self-heal-daemon enable cluster.quorum-reads no client.bind-insecure (null) ganesha.enable off features.shard off features.shard-block-size 4MB features.scrub-throttle lazy features.scrub-freq biweekly features.expiry-time 120 features.cache-invalidation off features.cache-invalidation-timeout 60 --- Additional comment from Pranith Kumar K on 2015-08-05 03:54:28 CEST --- hi Backer, Could you try this test with 3.7.3 please. We fixed 2-3 hang bugs so it would be great if you could let us know if it still happens. Meanwhile Xavi and I are going to work on 1235964 you raised. Do you hangout on #gluster IRC? It would be great to know your feedback about 3.7.3 to see what you think about the stability of EC. We feel EC is almost ready for production with 3.7.3 release based on our tests in lab. Pranith --- Additional comment from Backer on 2015-08-05 10:09:17 CEST --- I have tested the 3.7.3 as well as 3.7.2 nightly build( glusterfs-3.7.2-20150726.b639cb9.tar.gz) for the I/O error and handout issue. I found that 3.7.3 has the data corruption issue which is not present is 3.7.2 nightly build( glusterfs-3.7.2-20150707.36f24f5.tar.gz). Data has been corrupted after replacing the failed drive and running the self heal. Even we find the data corruption after the recovery of node failure ,When unavailable data chunks has been copied by proactive self heal daemon. You can reproduce the bug through the following steps Steps to reproduce: 1. create a 3x(4+2) disperse volume across nodes 2. FUSE mount on the client and start creating files/directories with mkdir and rsync/dd 3. Now, bring down 2 of the nodes(node 5 & 6) 4. write some files(eg filenew1, filenew2). The files will be available only on 4 nodes( node 1,2,3 & 4 ) 5. calculate the md5sum of filenew1 and filenew2 6. Now bring up the failed/down 2 nodes( node 5 & 6) 6. Pro active Self healing will create unavailable data chunks on 2 nodes (node 5 & 6). 7. Once finish the self healing, bring down another two nodes (node 1 & 2) 8. Now try to get the mdsum of same recovered file, there will be a mismatch in md5sum value. But this bug is not available in 3.7.2 nightly build (glusterfs-3.7.2-20150707.36f24f5.tar.gz) Also i would like to know, why the proactive self healing is not happening after replacing the failed drives. I have to manually run the volume heal command for healing the unavailable files. --- Additional comment from Pranith Kumar K on 2015-08-05 11:24:47 CEST --- hi Backer, Thanks for the quick reply. Based on your comment, I am assuming no hangs are observed. Auto-healing of replace-brick/disk-replacement is something we are working for 3.7.4, until then you need to execute "gluster volume heal ec2 full". As for the data corruption bug, I am not able to re-create it: Let me know if I missed any step: root@localhost - ~ 14:48:24 :) ⚡ glusterd && gluster volume create ec2 disperse 6 redundancy 2 `hostname`:/home/gfs/ec_{0..5} force && gluster volume start ec2 && mount -t glusterfs `hostname`:/ec2 /mnt/ec2 volume create: ec2: success: please start the volume to access data volume start: ec2: success #I disabled perf-xlators so that reads are served from the bricks always root@localhost - ~ 14:48:38 :( ⚡ ~/.scripts/disable-perf-xl.sh ec2 + gluster volume set ec2 performance.quick-read off volume set: success + gluster volume set ec2 performance.io-cache off volume set: success + gluster volume set ec2 performance.write-behind off volume set: success + gluster volume set ec2 performance.stat-prefetch off volume set: success + gluster volume set ec2 performance.read-ahead off volume set: success + gluster volume set ec2 performance.open-behind off volume set: success root@localhost - ~ 14:48:47 :) ⚡ cd /mnt/ec2/ root@localhost - /mnt/ec2 14:48:59 :) ⚡ gluster v status Status of volume: ec2 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick localhost.localdomain:/home/gfs/ec_0 49152 0 Y 14828 Brick localhost.localdomain:/home/gfs/ec_1 49153 0 Y 14846 Brick localhost.localdomain:/home/gfs/ec_2 49155 0 Y 14864 Brick localhost.localdomain:/home/gfs/ec_3 49156 0 Y 14882 Brick localhost.localdomain:/home/gfs/ec_4 49157 0 Y 14900 Brick localhost.localdomain:/home/gfs/ec_5 49158 0 Y 14918 NFS Server on localhost 2049 0 Y 14937 Task Status of Volume ec2 ------------------------------------------------------------------------------ There are no active volume tasks root@localhost - /mnt/ec2 14:49:02 :) ⚡ kill -9 14918 14900 root@localhost - /mnt/ec2 14:49:11 :) ⚡ dd if=/dev/urandom of=1.txt bs=1M count=2 2+0 records in 2+0 records out 2097152 bytes (2.1 MB) copied, 0.153835 s, 13.6 MB/s root@localhost - /mnt/ec2 14:49:15 :) ⚡ md5sum 1.txt 5ead68d0a60b8134f7daf0e8d1afe19c 1.txt root@localhost - /mnt/ec2 14:49:23 :) ⚡ gluster v start ec2 force volume start: ec2: success root@localhost - /mnt/ec2 14:49:35 :) ⚡ gluster v heal ec2 Launching heal operation to perform index self heal on volume ec2 has been successful Use heal info commands to check status root@localhost - /mnt/ec2 14:49:39 :) ⚡ gluster v heal ec2 info Brick localhost.localdomain:/home/gfs/ec_0/ /1.txt Number of entries: 1 Brick localhost.localdomain:/home/gfs/ec_1/ /1.txt Number of entries: 1 Brick localhost.localdomain:/home/gfs/ec_2/ /1.txt Number of entries: 1 Brick localhost.localdomain:/home/gfs/ec_3/ /1.txt Number of entries: 1 Brick localhost.localdomain:/home/gfs/ec_4/ Number of entries: 0 Brick localhost.localdomain:/home/gfs/ec_5/ Number of entries: 0 root@localhost - /mnt/ec2 14:49:45 :) ⚡ gluster v heal ec2 Launching heal operation to perform index self heal on volume ec2 has been successful Use heal info commands to check status root@localhost - /mnt/ec2 14:49:47 :) ⚡ gluster v heal ec2 info Brick localhost.localdomain:/home/gfs/ec_0/ Number of entries: 0 Brick localhost.localdomain:/home/gfs/ec_1/ Number of entries: 0 Brick localhost.localdomain:/home/gfs/ec_2/ Number of entries: 0 Brick localhost.localdomain:/home/gfs/ec_3/ Number of entries: 0 Brick localhost.localdomain:/home/gfs/ec_4/ Number of entries: 0 Brick localhost.localdomain:/home/gfs/ec_5/ Number of entries: 0 root@localhost - /mnt/ec2 14:49:51 :) ⚡ gluster v status Status of volume: ec2 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick localhost.localdomain:/home/gfs/ec_0 49152 0 Y 14828 Brick localhost.localdomain:/home/gfs/ec_1 49153 0 Y 14846 Brick localhost.localdomain:/home/gfs/ec_2 49155 0 Y 14864 Brick localhost.localdomain:/home/gfs/ec_3 49156 0 Y 14882 Brick localhost.localdomain:/home/gfs/ec_4 49157 0 Y 15173 Brick localhost.localdomain:/home/gfs/ec_5 49158 0 Y 15191 NFS Server on localhost 2049 0 Y 15211 Task Status of Volume ec2 ------------------------------------------------------------------------------ There are no active volume tasks root@localhost - /mnt/ec2 14:49:56 :) ⚡ kill -9 14828 14846 root@localhost - /mnt/ec2 14:50:03 :) ⚡ md5sum 1.txt 5ead68d0a60b8134f7daf0e8d1afe19c 1.txt root@localhost - /mnt/ec2 14:50:06 :) ⚡ cd root@localhost - ~ 14:50:13 :) ⚡ umount /mnt/ec2 root@localhost - ~ 14:50:16 :) ⚡ mount -t glusterfs `hostname`:/ec2 /mnt/ec2 root@localhost - ~ 14:50:19 :) ⚡ md5sum /mnt/ec2/1.txt 5ead68d0a60b8134f7daf0e8d1afe19c /mnt/ec2/1.txt --- Additional comment from Backer on 2015-08-06 10:46:11 CEST --- --- Additional comment from Backer on 2015-08-06 10:49:10 CEST --- I am getting random test results after disabled and enabled the perf-xlators. Please refer the attachment. root@gfs-tst-08:/home/qubevaultadmin# gluster --version glusterfs 3.7.3 built on Jul 31 2015 17:03:01 Repository revision: git://git.gluster.com/glusterfs.git Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com> GlusterFS comes with ABSOLUTELY NO WARRANTY. You may redistribute copies of GlusterFS under the terms of the GNU General Public License. root@gfs-tst-08:/home/gfsadmin# gluster volume info Volume Name: vaulttest39 Type: Disperse Volume ID: fcbed6b5-0654-489c-a29e-d18f737ac2f7 Status: Started Number of Bricks: 1 x (3 + 1) = 4 Transport-type: tcp Bricks: Brick1: 10.1.2.238:/media/disk1 Brick2: 10.1.2.238:/media/disk2 Brick3: 10.1.2.238:/media/disk3 Brick4: 10.1.2.238:/media/disk4 Options Reconfigured: performance.readdir-ahead: on performance.quick-read: off performance.io-cache: off performance.write-behind: off performance.stat-prefetch: off performance.read-ahead: off performance.open-behind: off gfsadmin@gfs-tst-08:~$ sudo gluster volume status Status of volume: vaulttest39 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.1.2.238:/media/disk1 49152 0 Y 1560 Brick 10.1.2.238:/media/disk2 49153 0 Y 1568 Brick 10.1.2.238:/media/disk3 49154 0 Y 1576 Brick 10.1.2.238:/media/disk4 49155 0 Y 1582 NFS Server on localhost 2049 0 Y 1544 Task Status of Volume vaulttest39 ------------------------------------------------------------------------------ There are no active volume tasks gfsadmin@gfs-tst-08:~$ sudo kill -9 1560 gfsadmin@gfs-tst-08:~$ sudo gluster volume status Status of volume: vaulttest39 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.1.2.238:/media/disk1 N/A N/A N N/A Brick 10.1.2.238:/media/disk2 49153 0 Y 1568 Brick 10.1.2.238:/media/disk3 49154 0 Y 1576 Brick 10.1.2.238:/media/disk4 49155 0 Y 1582 NFS Server on localhost 2049 0 Y 1544 Task Status of Volume vaulttest39 ------------------------------------------------------------------------------ There are no active volume tasks root@gfs-tst-09:/mnt/gluster# dd if=/dev/urandom of=2.txt bs=1M count=2 2+0 records in 2+0 records out 2097152 bytes (2.1 MB) copied, 0.226147 s, 9.3 MB/s root@gfs-tst-09:/mnt/gluster# md5sum 2.txt cd9db53f9c090958ff8c033161576b95 2.txt gfsadmin@gfs-tst-08:~$ ls -l -h /media/disk{1..4} /media/disk1: total 960K -rw-r--r-- 2 root root 683K Aug 6 13:58 1.txt /media/disk2: total 1.9M -rw-r--r-- 2 root root 683K Aug 6 13:58 1.txt -rw-r--r-- 2 root root 683K Aug 6 13:59 2.txt /media/disk3: total 1.9M -rw-r--r-- 2 root root 683K Aug 6 13:58 1.txt -rw-r--r-- 2 root root 683K Aug 6 13:59 2.txt /media/disk4: total 1.9M -rw-r--r-- 2 root root 683K Aug 6 13:58 1.txt -rw-r--r-- 2 root root 683K Aug 6 13:59 2.txt root@gfs-tst-08:/home/gfsadmin# gluster v start vaulttest39 force volume start: vaulttest39: success root@gfs-tst-08:/home/gfsadmin# gluster v heal vaulttest39 Launching heal operation to perform index self heal on volume vaulttest39 has been successful Use heal info commands to check status root@gfs-tst-08:/home/gfsadmin# gluster v heal vaulttest39 info Brick gfs-tst-08:/media/disk1/ Number of entries: 0 Brick gfs-tst-08:/media/disk2/ Number of entries: 0 Brick gfs-tst-08:/media/disk3/ Number of entries: 0 Brick gfs-tst-08:/media/disk4/ Number of entries: 0 root@gfs-tst-08:/home/gfsadmin# gluster v heal vaulttest39 Launching heal operation to perform index self heal on volume vaulttest39 has been successful Use heal info commands to check status root@gfs-tst-08:/home/gfsadmin# gluster v heal vaulttest39 info Brick gfs-tst-08:/media/disk1/ Number of entries: 0 Brick gfs-tst-08:/media/disk2/ Number of entries: 0 Brick gfs-tst-08:/media/disk3/ Number of entries: 0 Brick gfs-tst-08:/media/disk4/ Number of entries: 0 root@gfs-tst-08:/home/gfsadmin# ls -l -h /media/disk{1..4} /media/disk1: total 1004K -rw-r--r-- 2 root root 683K Aug 6 13:58 1.txt -rw-r--r-- 2 root root 683K Aug 6 13:59 2.txt /media/disk2: total 1.9M -rw-r--r-- 2 root root 683K Aug 6 13:58 1.txt -rw-r--r-- 2 root root 683K Aug 6 13:59 2.txt /media/disk3: total 1.9M -rw-r--r-- 2 root root 683K Aug 6 13:58 1.txt -rw-r--r-- 2 root root 683K Aug 6 13:59 2.txt /media/disk4: total 1.9M -rw-r--r-- 2 root root 683K Aug 6 13:58 1.txt -rw-r--r-- 2 root root 683K Aug 6 13:59 2.txt root@gfs-tst-08:/home/gfsadmin# gluster volume status Status of volume: vaulttest39 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.1.2.238:/media/disk1 49152 0 Y 1721 Brick 10.1.2.238:/media/disk2 49153 0 Y 1568 Brick 10.1.2.238:/media/disk3 49154 0 Y 1576 Brick 10.1.2.238:/media/disk4 49155 0 Y 1582 NFS Server on localhost 2049 0 Y 1740 Task Status of Volume vaulttest39 ------------------------------------------------------------------------------ There are no active volume tasks root@gfs-tst-08:/home/gfsadmin# kill -9 1582 root@gfs-tst-08:/home/gfsadmin# gluster volume status Status of volume: vaulttest39 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.1.2.238:/media/disk1 49152 0 Y 1721 Brick 10.1.2.238:/media/disk2 49153 0 Y 1568 Brick 10.1.2.238:/media/disk3 49154 0 Y 1576 Brick 10.1.2.238:/media/disk4 N/A N/A N N/A NFS Server on localhost 2049 0 Y 1740 Task Status of Volume vaulttest39 ------------------------------------------------------------------------------ There are no active volume tasks root@gfs-tst-09:/mnt/gluster# md5sum 2.txt cd9db53f9c090958ff8c033161576b95 2.txt root@gfs-tst-09:/mnt/gluster# md5sum 2.txt cd9db53f9c090958ff8c033161576b95 2.txt root@gfs-tst-09:/mnt/gluster# ls 1.txt 2.txt root@gfs-tst-09:/mnt/gluster# ls 1.txt 2.txt root@gfs-tst-09:/mnt/gluster# md5sum 2.txt 70b40a7e3f5dc85345e466968416cde1 2.txt root@gfs-tst-09:/mnt/gluster# md5sum 2.txt 70b40a7e3f5dc85345e466968416cde1 2.txt root@gfs-tst-09:/mnt/gluster# md5sum 2.txt 70b40a7e3f5dc85345e466968416cde1 2.txt root@gfs-tst-09:/mnt/gluster# --- Additional comment from Backer on 2015-08-06 16:07:10 CEST --- I have created a new volume once again and confirmed the bug. root@gfs-tst-08:/home/gfsadmin# gluster volume create vaulttest52 disperse-data 3 redundancy 1 10.1.2.238:/media/disk{1..4} force root@gfs-tst-08:/home/gfsadmin# gluster v start vaulttest52 root@gfs-tst-08:/home/gfsadmin# gluster v status Status of volume: vaulttest52 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.1.2.238:/media/disk1 49172 0 Y 1574 Brick 10.1.2.238:/media/disk2 49173 0 Y 1582 Brick 10.1.2.238:/media/disk3 49174 0 Y 1595 Brick 10.1.2.238:/media/disk4 49175 0 Y 1590 NFS Server on localhost 2049 0 Y 1558 Task Status of Volume vaulttest52 ------------------------------------------------------------------------------ There are no active volume tasks root@gfs-tst-08:/home/gfsadmin# gluster v info Volume Name: vaulttest52 Type: Disperse Volume ID: 0b0b3f8f-acb9-4e2c-a029-fcb89f85b1e7 Status: Started Number of Bricks: 1 x (3 + 1) = 4 Transport-type: tcp Bricks: Brick1: 10.1.2.238:/media/disk1 Brick2: 10.1.2.238:/media/disk2 Brick3: 10.1.2.238:/media/disk3 Brick4: 10.1.2.238:/media/disk4 Options Reconfigured: performance.readdir-ahead: on gfsadmin@gfs-tst-09:/mnt/gluster$ sudo dd if=/dev/urandom of=1.txt bs=1M count=2 2+0 records in 2+0 records out 2097152 bytes (2.1 MB) copied, 0.208704 s, 10.0 MB/s gfsadmin@gfs-tst-09:/mnt/gluster$ md5sum 1.txt 1233b5321315c05abb4668cc9a1d9d25 1.txt root@gfs-tst-08:/home/gfsadmin# ls -l -h /media/disk{1..4} /media/disk1: total 960K -rw-r--r-- 2 root root 683K Aug 6 19:14 1.txt /media/disk2: total 960K -rw-r--r-- 2 root root 683K Aug 6 19:14 1.txt /media/disk3: total 960K -rw-r--r-- 2 root root 683K Aug 6 19:14 1.txt /media/disk4: total 960K -rw-r--r-- 2 root root 683K Aug 6 19:14 1.txt root@gfs-tst-08:/home/gfsadmin# kill -9 1574 root@gfs-tst-08:/home/gfsadmin# gluster v status Status of volume: vaulttest52 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.1.2.238:/media/disk1 N/A N/A N N/A Brick 10.1.2.238:/media/disk2 49173 0 Y 1582 Brick 10.1.2.238:/media/disk3 49174 0 Y 1595 Brick 10.1.2.238:/media/disk4 49175 0 Y 1590 NFS Server on localhost 2049 0 Y 1558 Task Status of Volume vaulttest52 ------------------------------------------------------------------------------ There are no active volume tasks gfsadmin@gfs-tst-09:/mnt/gluster$ sudo dd if=/dev/urandom of=2.txt bs=1M count=2 2+0 records in 2+0 records out 2097152 bytes (2.1 MB) copied, 0.205401 s, 10.2 MB/s gfsadmin@gfs-tst-09:/mnt/gluster$ md5sum 2.txt 9c8b37847622efbf2ec75c683166de97 2.txt root@gfs-tst-08:/home/gfsadmin# ls -l -h /media/disk{1..4} /media/disk1: total 960K -rw-r--r-- 2 root root 683K Aug 6 19:14 1.txt /media/disk2: total 1.9M -rw-r--r-- 2 root root 683K Aug 6 19:14 1.txt -rw-r--r-- 2 root root 683K Aug 6 19:16 2.txt /media/disk3: total 1.9M -rw-r--r-- 2 root root 683K Aug 6 19:14 1.txt -rw-r--r-- 2 root root 683K Aug 6 19:16 2.txt /media/disk4: total 1.4M -rw-r--r-- 2 root root 683K Aug 6 19:14 1.txt -rw-r--r-- 2 root root 683K Aug 6 19:16 2.txt root@gfs-tst-08:/home/gfsadmin# gluster v start vaulttest52 force volume start: vaulttest52: success root@gfs-tst-08:/home/gfsadmin# gluster v status Status of volume: vaulttest52 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.1.2.238:/media/disk1 49172 0 Y 1739 Brick 10.1.2.238:/media/disk2 49173 0 Y 1582 Brick 10.1.2.238:/media/disk3 49174 0 Y 1595 Brick 10.1.2.238:/media/disk4 49175 0 Y 1590 NFS Server on localhost 2049 0 Y 1758 Task Status of Volume vaulttest52 ------------------------------------------------------------------------------ There are no active volume tasks root@gfs-tst-08:/home/gfsadmin# gluster v heal vaulttest52 Launching heal operation to perform index self heal on volume vaulttest52 has been successful Use heal info commands to check status root@gfs-tst-08:/home/gfsadmin# gluster v heal vaulttest52 info Brick gfs-tst-08:/media/disk1/ Number of entries: 0 Brick gfs-tst-08:/media/disk2/ Number of entries: 0 Brick gfs-tst-08:/media/disk3/ Number of entries: 0 Brick gfs-tst-08:/media/disk4/ Number of entries: 0 root@gfs-tst-08:/home/gfsadmin# ls -l -h /media/disk{1..4} /media/disk1: total 728K -rw-r--r-- 2 root root 683K Aug 6 19:14 1.txt -rw-r--r-- 2 root root 683K Aug 6 19:16 2.txt /media/disk2: total 1.4M -rw-r--r-- 2 root root 683K Aug 6 19:14 1.txt -rw-r--r-- 2 root root 683K Aug 6 19:16 2.txt /media/disk3: total 1.4M -rw-r--r-- 2 root root 683K Aug 6 19:14 1.txt -rw-r--r-- 2 root root 683K Aug 6 19:16 2.txt /media/disk4: total 1.4M -rw-r--r-- 2 root root 683K Aug 6 19:14 1.txt -rw-r--r-- 2 root root 683K Aug 6 19:16 2.txt root@gfs-tst-08:/home/gfsadmin# gluster v status Status of volume: vaulttest52 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.1.2.238:/media/disk1 49172 0 Y 1739 Brick 10.1.2.238:/media/disk2 49173 0 Y 1582 Brick 10.1.2.238:/media/disk3 49174 0 Y 1595 Brick 10.1.2.238:/media/disk4 49175 0 Y 1590 NFS Server on localhost 2049 0 Y 1758 Task Status of Volume vaulttest52 ------------------------------------------------------------------------------ There are no active volume tasks root@gfs-tst-08:/home/gfsadmin# kill -9 1590 root@gfs-tst-08:/home/gfsadmin# gluster v status Status of volume: vaulttest52 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.1.2.238:/media/disk1 49172 0 Y 1739 Brick 10.1.2.238:/media/disk2 49173 0 Y 1582 Brick 10.1.2.238:/media/disk3 49174 0 Y 1595 Brick 10.1.2.238:/media/disk4 N/A N/A N N/A NFS Server on localhost 2049 0 Y 1758 Task Status of Volume vaulttest52 ------------------------------------------------------------------------------ There are no active volume tasks gfsadmin@gfs-tst-09:/mnt/gluster$ md5sum 2.txt 96f6f469f4b743b4a575fdc408b5f007 2.txt gfsadmin@gfs-tst-09:/mnt/gluster$ md5sum 2.txt 96f6f469f4b743b4a575fdc408b5f007 2.txt gfsadmin@gfs-tst-09:/mnt/gluster$ md5sum 2.txt 96f6f469f4b743b4a575fdc408b5f007 2.txt gfsadmin@gfs-tst-09:/mnt/gluster$ ls 1.txt 2.txt gfsadmin@gfs-tst-09:/mnt/gluster$ ls 1.txt 2.txt gfsadmin@gfs-tst-09:/mnt/gluster$ ls 1.txt 2.txt gfsadmin@gfs-tst-09:/mnt/gluster$ md5sum 2.txt 96f6f469f4b743b4a575fdc408b5f007 2.txt gfsadmin@gfs-tst-09:/mnt/gluster$ md5sum 2.txt 96f6f469f4b743b4a575fdc408b5f007 2.txt ===================================== MD5SUM has ben changed ==================================== root@gfs-tst-08:/home/gfsadmin# gluster v start vaulttest52 force volume start: vaulttest52: success root@gfs-tst-08:/home/gfsadmin# gluster v status Status of volume: vaulttest52 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.1.2.238:/media/disk1 49172 0 Y 1739 Brick 10.1.2.238:/media/disk2 49173 0 Y 1582 Brick 10.1.2.238:/media/disk3 49174 0 Y 1595 Brick 10.1.2.238:/media/disk4 49175 0 Y 1852 NFS Server on localhost 2049 0 Y 1871 Task Status of Volume vaulttest52 ------------------------------------------------------------------------------ There are no active volume tasks ====================================== disabled perf-xlators ===================================== root@gfs-tst-08:/home/gfsadmin# gluster volume set vaulttest52 performance.quick-read off gluster volume set vaulttest52 performance.io-cache off gluster volume set vaulttest52 performance.write-behind off gluster volume set vaulttest52 performance.stat-prefetch off gluster volume set vaulttest52 performance.read-ahead off gluster volume set vaulttest52 performance.open-behind off volume set: success root@gfs-tst-08:/home/gfsadmin# gluster volume set vaulttest52 performance.io-cache off volume set: success root@gfs-tst-08:/home/gfsadmin# gluster volume set vaulttest52 performance.write-behind off volume set: success root@gfs-tst-08:/home/gfsadmin# gluster volume set vaulttest52 performance.stat-prefetch off volume set: success root@gfs-tst-08:/home/gfsadmin# gluster volume set vaulttest52 performance.read-ahead off volume set: success root@gfs-tst-08:/home/gfsadmin# gluster volume set vaulttest52 performance.open-behind off volume set: success root@gfs-tst-08:/home/gfsadmin# gluster v info Volume Name: vaulttest52 Type: Disperse Volume ID: 0b0b3f8f-acb9-4e2c-a029-fcb89f85b1e7 Status: Started Number of Bricks: 1 x (3 + 1) = 4 Transport-type: tcp Bricks: Brick1: 10.1.2.238:/media/disk1 Brick2: 10.1.2.238:/media/disk2 Brick3: 10.1.2.238:/media/disk3 Brick4: 10.1.2.238:/media/disk4 Options Reconfigured: performance.open-behind: off performance.read-ahead: off performance.stat-prefetch: off performance.write-behind: off performance.io-cache: off performance.quick-read: off performance.readdir-ahead: on root@gfs-tst-08:/home/gfsadmin# gluster v status Status of volume: vaulttest52 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.1.2.238:/media/disk1 49172 0 Y 1739 Brick 10.1.2.238:/media/disk2 49173 0 Y 1582 Brick 10.1.2.238:/media/disk3 49174 0 Y 1595 Brick 10.1.2.238:/media/disk4 49175 0 Y 1852 NFS Server on localhost 2049 0 Y 1871 Task Status of Volume vaulttest52 ------------------------------------------------------------------------------ There are no active volume tasks root@gfs-tst-08:/home/gfsadmin# kill -9 1852 root@gfs-tst-08:/home/gfsadmin# gluster v status Status of volume: vaulttest52 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.1.2.238:/media/disk1 49172 0 Y 1739 Brick 10.1.2.238:/media/disk2 49173 0 Y 1582 Brick 10.1.2.238:/media/disk3 49174 0 Y 1595 Brick 10.1.2.238:/media/disk4 N/A N/A N N/A NFS Server on localhost 2049 0 Y 1871 Task Status of Volume vaulttest52 ------------------------------------------------------------------------------ There are no active volume tasks gfsadmin@gfs-tst-09:/mnt/gluster$ sudo dd if=/dev/urandom of=3.txt bs=5M count=10 10+0 records in 10+0 records out 52428800 bytes (52 MB) copied, 5.40714 s, 9.7 MB/s gfsadmin@gfs-tst-09:/mnt/gluster$ md5sum 3.txt fa9d9d3e298d01c8cf54855968784b83 3.txt root@gfs-tst-08:/home/gfsadmin# gluster v start vaulttest52 force volume start: vaulttest52: success root@gfs-tst-08:/home/gfsadmin# gluster v status Status of volume: vaulttest52 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.1.2.238:/media/disk1 49172 0 Y 1739 Brick 10.1.2.238:/media/disk2 49173 0 Y 1582 Brick 10.1.2.238:/media/disk3 49174 0 Y 1595 Brick 10.1.2.238:/media/disk4 49175 0 Y 2017 NFS Server on localhost N/A N/A N N/A Task Status of Volume vaulttest52 ------------------------------------------------------------------------------ There are no active volume tasks root@gfs-tst-08:/home/gfsadmin# gluster v heal vaulttest52 Launching heal operation to perform index self heal on volume vaulttest52 has been successful Use heal info commands to check status root@gfs-tst-08:/home/gfsadmin# gluster v heal vaulttest52 info Brick gfs-tst-08:/media/disk1/ Number of entries: 0 Brick gfs-tst-08:/media/disk2/ Number of entries: 0 Brick gfs-tst-08:/media/disk3/ Number of entries: 0 Brick gfs-tst-08:/media/disk4/ Number of entries: 0 root@gfs-tst-08:/home/gfsadmin# ls -l -h /media/disk{1..4} /media/disk1: total 33M -rw-r--r-- 2 root root 683K Aug 6 19:14 1.txt -rw-r--r-- 2 root root 683K Aug 6 19:16 2.txt -rw-r--r-- 2 root root 17M Aug 6 19:26 3.txt /media/disk2: total 34M -rw-r--r-- 2 root root 683K Aug 6 19:14 1.txt -rw-r--r-- 2 root root 683K Aug 6 19:16 2.txt -rw-r--r-- 2 root root 17M Aug 6 19:26 3.txt /media/disk3: total 34M -rw-r--r-- 2 root root 683K Aug 6 19:14 1.txt -rw-r--r-- 2 root root 683K Aug 6 19:16 2.txt -rw-r--r-- 2 root root 17M Aug 6 19:26 3.txt /media/disk4: total 1.4M -rw-r--r-- 2 root root 683K Aug 6 19:14 1.txt -rw-r--r-- 2 root root 683K Aug 6 19:16 2.txt -rw-r--r-- 2 root root 17M Aug 6 19:26 3.txt root@gfs-tst-08:/home/gfsadmin# gluster v status Status of volume: vaulttest52 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.1.2.238:/media/disk1 49172 0 Y 1739 Brick 10.1.2.238:/media/disk2 49173 0 Y 1582 Brick 10.1.2.238:/media/disk3 49174 0 Y 1595 Brick 10.1.2.238:/media/disk4 49175 0 Y 2017 NFS Server on localhost 2049 0 Y 2036 Task Status of Volume vaulttest52 ------------------------------------------------------------------------------ There are no active volume tasks root@gfs-tst-08:/home/gfsadmin# kill -9 1582 root@gfs-tst-08:/home/gfsadmin# gluster v status Status of volume: vaulttest52 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.1.2.238:/media/disk1 49172 0 Y 1739 Brick 10.1.2.238:/media/disk2 N/A N/A N N/A Brick 10.1.2.238:/media/disk3 49174 0 Y 1595 Brick 10.1.2.238:/media/disk4 49175 0 Y 2017 NFS Server on localhost 2049 0 Y 2036 Task Status of Volume vaulttest52 ------------------------------------------------------------------------------ There are no active volume tasks gfsadmin@gfs-tst-09:/mnt/gluster$ md5sum 3.txt fa9d9d3e298d01c8cf54855968784b83 3.txt gfsadmin@gfs-tst-09:/mnt/gluster$ md5sum 3.txt fa9d9d3e298d01c8cf54855968784b83 3.txt gfsadmin@gfs-tst-09:/mnt/gluster$ md5sum 3.txt fa9d9d3e298d01c8cf54855968784b83 3.txt gfsadmin@gfs-tst-09:/mnt/gluster$ md5sum 3.txt fa9d9d3e298d01c8cf54855968784b83 3.txt gfsadmin@gfs-tst-09:/mnt/gluster$ ls 1.txt 2.txt 3.txt gfsadmin@gfs-tst-09:/mnt/gluster$ ls 1.txt 2.txt 3.txt gfsadmin@gfs-tst-09:/mnt/gluster$ ls 1.txt 2.txt 3.txt gfsadmin@gfs-tst-09:/mnt/gluster$ md5sum 3.txt ea50603ce500b29c73dca6a9c733eb7a 3.txt gfsadmin@gfs-tst-09:/$ sudo umount /mnt/gluster gfsadmin@gfs-tst-09:/$ sudo mount -t glusterfs 10.1.2.238:/vaulttest52 /mnt/gluster/ gfsadmin@gfs-tst-09:/$ cd /mnt/gluster/ gfsadmin@gfs-tst-09:/mnt/gluster$ md5sum 3.txt ea50603ce500b29c73dca6a9c733eb7a 3.txt After putting ls command in local dir, the md5sum hash has been changed
REVIEW: http://review.gluster.org/11862 (cluster/ec: Fix write size in self-heal) posted (#1) for review on master by Xavier Hernandez (xhernandez)
COMMIT: http://review.gluster.org/11862 committed in master by Pranith Kumar Karampuri (pkarampu) ------ commit 289d00369f0ddb78f534735f7d3bf86268adac60 Author: Xavier Hernandez <xhernandez> Date: Fri Aug 7 12:37:52 2015 +0200 cluster/ec: Fix write size in self-heal Self-heal was always using a fixed block size to heal a file. This was incorrect for dispersed volumes with a number of data bricks not being a power of 2. This patch adjusts the block size to a multiple of the stripe size of the volume. It also propagates errors detected during the data heal to stop healing the file and not mark it as healed. Change-Id: I9ee3fde98a9e5d6116fd096ceef88686fd1d28e2 BUG: 1251446 Signed-off-by: Xavier Hernandez <xhernandez> Reviewed-on: http://review.gluster.org/11862 Tested-by: NetBSD Build System <jenkins.org> Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report. glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/ [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user