Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 762690 (GLUSTER-958)

Summary:

Bringing replica down reduces performance

Product:

[Community] GlusterFS

Reporter:

Shehjar Tikoo <shehjart>

Component:

replicate

Assignee:

Pavan Vilas Sondur <pavan>

Status:

CLOSED DUPLICATE

QA Contact:

Severity:

medium

Docs Contact:

Priority:

low

Version:

nfs-alpha

CC:

gluster-bugs, vijay, vikas

Target Milestone:

---

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

Type:

---

Regression:

RTP

Mount Type:

nfs

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Volfiles and logs	none

Description Shehjar Tikoo 2010-05-27 06:00:56 UTC

I have configured three
posix vols on 4 servers, replicated each three way, ditributed across
mirrors, and mounted this from one ESX. Create a VM in that datastore,
ran 
'dd if=/dev/zero of=/tmp/tmp_file bs=1M count=1000, get 22.3 MB/s

bring down glusterfsd on one of the servers storing the vmdk, run same
command, get 14.5 MB/s

bring down glusterfsd on another server storing the vmdk, run same
command, get 14.2 MB/s

Seems like bringing replicas down affects performance.

Volfiles and logs attached.

Comment 1 Vikas Gorur 2010-05-27 14:52:55 UTC

(In reply to comment #0)
> Created an attachment (id=215) [details]
> Volfiles and logs
> 
> I have configured three
> posix vols on 4 servers, replicated each three way, ditributed across
> mirrors, and mounted this from one ESX. Create a VM in that datastore,
> ran 
> 'dd if=/dev/zero of=/tmp/tmp_file bs=1M count=1000, get 22.3 MB/s

I'm assuming you are running this dd inside the VM. These differences can simply be due to the VM's kernel caching. Can you try dd with "oflag=direct" and see if you still see the problem? You might also want to try doing "sync" before each dd to clear out the cache.

The only other thing that could slow things down when a server is down is the client's reconnection attempts. Can you run the client in debug/trace and see how often it tries to reconnect?

Comment 2 Pavan Vilas Sondur 2010-08-20 05:49:48 UTC

Without fix:

root@brick4 mnt1]# 
[root@brick4 mnt1]# dd if=/dev/zero of=testfile count=10k bs=128k
10240+0 records in
10240+0 records out
1342177280 bytes (1.3 GB) copied, 31.3399 seconds, 42.8 MB/s
[root@brick4 mnt1]# dd if=/dev/zero of=testfile count=10k bs=128k
10240+0 records in
10240+0 records out
1342177280 bytes (1.3 GB) copied, 58.3153 seconds, 23.0 MB/s
[root@brick4 mnt1]# dd if=/dev/zero of=testfile count=10k bs=128k
10240+0 records in
10240+0 records out
1342177280 bytes (1.3 GB) copied, 93.5311 seconds, 14.4 MB/s
[root@brick4 mnt1]# dd if=/dev/zero of=testfile count=10k bs=128k
dd: closing output file `testfile': Interrupted system call
[root@brick4 mnt1]# 


With fix:


1342177280 bytes (1.3 GB) copied, 27.4226 seconds, 48.9 MB/s
[root@brick4 mnt1]# dd if=/dev/zero of=testfile1 count=10k bs=128k
10240+0 records in
10240+0 records out
1342177280 bytes (1.3 GB) copied, 28.4585 seconds, 47.2 MB/s
[root@brick4 mnt1]# dd if=/dev/zero of=testfile1 count=10k bs=128k
10240+0 records in
10240+0 records out
1342177280 bytes (1.3 GB) copied, 27.7122 seconds, 48.4 MB/s
[root@brick4 mnt1]# dd if=/dev/zero of=testfile1 count=10k bs=128k
10240+0 records in
10240+0 records out

<replica down>

1342177280 bytes (1.3 GB) copied, 24.9811 seconds, 53.7 MB/s
[root@brick4 mnt1]#

Comment 3 Vikas Gorur 2010-08-20 14:46:09 UTC

 
> With fix:

Where is the fix?

Comment 4 Vijay Bellur 2010-08-20 14:57:39 UTC

> Where is the fix?

Fix is available at: http://patches.gluster.com/patch/4226/

Comment 5 Vijay Bellur 2010-08-24 03:18:04 UTC


*** This bug has been marked as a duplicate of bug 960 ***