Bug 762690 (GLUSTER-958) - Bringing replica down reduces performance
Summary: Bringing replica down reduces performance
Keywords:
Status: CLOSED DUPLICATE of bug 762692
Alias: GLUSTER-958
Product: GlusterFS
Classification: Community
Component: replicate
Version: nfs-alpha
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Pavan Vilas Sondur
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-05-27 06:00 UTC by Shehjar Tikoo
Modified: 2015-12-01 16:45 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: RTP
Mount Type: nfs
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
Volfiles and logs (6.70 KB, application/x-compressed-tar)
2010-05-27 03:00 UTC, Shehjar Tikoo
no flags Details

Description Shehjar Tikoo 2010-05-27 06:00:56 UTC
I have configured three
posix vols on 4 servers, replicated each three way, ditributed across
mirrors, and mounted this from one ESX. Create a VM in that datastore,
ran 
'dd if=/dev/zero of=/tmp/tmp_file bs=1M count=1000, get 22.3 MB/s

bring down glusterfsd on one of the servers storing the vmdk, run same
command, get 14.5 MB/s

bring down glusterfsd on another server storing the vmdk, run same
command, get 14.2 MB/s

Seems like bringing replicas down affects performance.

Volfiles and logs attached.

Comment 1 Vikas Gorur 2010-05-27 14:52:55 UTC
(In reply to comment #0)
> Created an attachment (id=215) [details]
> Volfiles and logs
> 
> I have configured three
> posix vols on 4 servers, replicated each three way, ditributed across
> mirrors, and mounted this from one ESX. Create a VM in that datastore,
> ran 
> 'dd if=/dev/zero of=/tmp/tmp_file bs=1M count=1000, get 22.3 MB/s

I'm assuming you are running this dd inside the VM. These differences can simply be due to the VM's kernel caching. Can you try dd with "oflag=direct" and see if you still see the problem? You might also want to try doing "sync" before each dd to clear out the cache.

The only other thing that could slow things down when a server is down is the client's reconnection attempts. Can you run the client in debug/trace and see how often it tries to reconnect?

Comment 2 Pavan Vilas Sondur 2010-08-20 05:49:48 UTC
Without fix:

root@brick4 mnt1]# 
[root@brick4 mnt1]# dd if=/dev/zero of=testfile count=10k bs=128k
10240+0 records in
10240+0 records out
1342177280 bytes (1.3 GB) copied, 31.3399 seconds, 42.8 MB/s
[root@brick4 mnt1]# dd if=/dev/zero of=testfile count=10k bs=128k
10240+0 records in
10240+0 records out
1342177280 bytes (1.3 GB) copied, 58.3153 seconds, 23.0 MB/s
[root@brick4 mnt1]# dd if=/dev/zero of=testfile count=10k bs=128k
10240+0 records in
10240+0 records out
1342177280 bytes (1.3 GB) copied, 93.5311 seconds, 14.4 MB/s
[root@brick4 mnt1]# dd if=/dev/zero of=testfile count=10k bs=128k
dd: closing output file `testfile': Interrupted system call
[root@brick4 mnt1]# 


With fix:


1342177280 bytes (1.3 GB) copied, 27.4226 seconds, 48.9 MB/s
[root@brick4 mnt1]# dd if=/dev/zero of=testfile1 count=10k bs=128k
10240+0 records in
10240+0 records out
1342177280 bytes (1.3 GB) copied, 28.4585 seconds, 47.2 MB/s
[root@brick4 mnt1]# dd if=/dev/zero of=testfile1 count=10k bs=128k
10240+0 records in
10240+0 records out
1342177280 bytes (1.3 GB) copied, 27.7122 seconds, 48.4 MB/s
[root@brick4 mnt1]# dd if=/dev/zero of=testfile1 count=10k bs=128k
10240+0 records in
10240+0 records out

<replica down>

1342177280 bytes (1.3 GB) copied, 24.9811 seconds, 53.7 MB/s
[root@brick4 mnt1]#

Comment 3 Vikas Gorur 2010-08-20 14:46:09 UTC
 
> With fix:

Where is the fix?

Comment 4 Vijay Bellur 2010-08-20 14:57:39 UTC
> Where is the fix?

Fix is available at: http://patches.gluster.com/patch/4226/

Comment 5 Vijay Bellur 2010-08-24 03:18:04 UTC

*** This bug has been marked as a duplicate of bug 960 ***


Note You need to log in before you can comment on or make changes to this bug.