I have configured three posix vols on 4 servers, replicated each three way, ditributed across mirrors, and mounted this from one ESX. Create a VM in that datastore, ran 'dd if=/dev/zero of=/tmp/tmp_file bs=1M count=1000, get 22.3 MB/s bring down glusterfsd on one of the servers storing the vmdk, run same command, get 14.5 MB/s bring down glusterfsd on another server storing the vmdk, run same command, get 14.2 MB/s Seems like bringing replicas down affects performance. Volfiles and logs attached.
(In reply to comment #0) > Created an attachment (id=215) [details] > Volfiles and logs > > I have configured three > posix vols on 4 servers, replicated each three way, ditributed across > mirrors, and mounted this from one ESX. Create a VM in that datastore, > ran > 'dd if=/dev/zero of=/tmp/tmp_file bs=1M count=1000, get 22.3 MB/s I'm assuming you are running this dd inside the VM. These differences can simply be due to the VM's kernel caching. Can you try dd with "oflag=direct" and see if you still see the problem? You might also want to try doing "sync" before each dd to clear out the cache. The only other thing that could slow things down when a server is down is the client's reconnection attempts. Can you run the client in debug/trace and see how often it tries to reconnect?
Without fix: root@brick4 mnt1]# [root@brick4 mnt1]# dd if=/dev/zero of=testfile count=10k bs=128k 10240+0 records in 10240+0 records out 1342177280 bytes (1.3 GB) copied, 31.3399 seconds, 42.8 MB/s [root@brick4 mnt1]# dd if=/dev/zero of=testfile count=10k bs=128k 10240+0 records in 10240+0 records out 1342177280 bytes (1.3 GB) copied, 58.3153 seconds, 23.0 MB/s [root@brick4 mnt1]# dd if=/dev/zero of=testfile count=10k bs=128k 10240+0 records in 10240+0 records out 1342177280 bytes (1.3 GB) copied, 93.5311 seconds, 14.4 MB/s [root@brick4 mnt1]# dd if=/dev/zero of=testfile count=10k bs=128k dd: closing output file `testfile': Interrupted system call [root@brick4 mnt1]# With fix: 1342177280 bytes (1.3 GB) copied, 27.4226 seconds, 48.9 MB/s [root@brick4 mnt1]# dd if=/dev/zero of=testfile1 count=10k bs=128k 10240+0 records in 10240+0 records out 1342177280 bytes (1.3 GB) copied, 28.4585 seconds, 47.2 MB/s [root@brick4 mnt1]# dd if=/dev/zero of=testfile1 count=10k bs=128k 10240+0 records in 10240+0 records out 1342177280 bytes (1.3 GB) copied, 27.7122 seconds, 48.4 MB/s [root@brick4 mnt1]# dd if=/dev/zero of=testfile1 count=10k bs=128k 10240+0 records in 10240+0 records out <replica down> 1342177280 bytes (1.3 GB) copied, 24.9811 seconds, 53.7 MB/s [root@brick4 mnt1]#
> With fix: Where is the fix?
> Where is the fix? Fix is available at: http://patches.gluster.com/patch/4226/
*** This bug has been marked as a duplicate of bug 960 ***