Bug 1103155 - Perf regression with Denali
Summary: Perf regression with Denali
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: geo-replication
Version: rhgs-3.0
Hardware: All
OS: Linux
high
high
Target Milestone: ---
: RHGS 3.0.0
Assignee: Kotresh HR
QA Contact: Ben Turner
URL:
Whiteboard:
Depends On: 1111171
Blocks: 1105891
TreeView+ depends on / blocked
 
Reported: 2014-05-30 11:42 UTC by Neependra Khare
Modified: 2015-12-03 00:39 UTC (History)
13 users (show)

Fixed In Version: glusterfs-3.6.0.25-1
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1105891 (view as bug list)
Environment:
Last Closed: 2014-09-22 19:39:52 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2014:1278 0 normal SHIPPED_LIVE Red Hat Storage Server 3.0 bug fix and enhancement update 2014-09-22 23:26:55 UTC

Description Neependra Khare 2014-05-30 11:42:40 UTC
Description of problem:

With following workload we are seeing upto 10x slow performance with denali as compared to Corbet. 

4 x Initial workload : -
- Smallfile workload 
- Operation = Create 
- Threads = 24 
- Average filesize = 108 KB compressed 
- Files = 10000/thread 

Post initial sync workload
- Smallfile workload 
- Operation = Create 
- Threads = 24 
- Average filesize = 108 KB compressed 
- Files = 10000/thread 

Version-Release number of selected component (if applicable):
Corbet release : 3.4.0.59rhs
Denali release : 3.6.0.5

How reproducible:
- Every time when geo-rep is running 


Steps to Reproduce:
1. Create 2x2 setup on master and slave end 
2. Create files with initial workload on master  
3. Start Geo-Rep
4. Wait till initial files get synced 
5. Create post initial sync workload on master
6. Wait to get it synced to slave site

Actual results:
- For Corbet
Initial Workload Master = 23:51 - 00:30 = 39 mins 
Geo-Rep Start = 00:39 
Initial Geo-Rep End = 03:38 
Initial Sync = 03:38 - 00:39 = 179 mins 
Post Initial Sync Geo-rep starts = 04:12 
Post Initial Sync Geo-rep end = 04:39 
Post Initial Sync duration = 04:39 - 4:12 = 27 mins 

- For Denali 
Initial Workload Master = 1:40 - 1:00 = 40 mins 
Geo-Rep Start = 04:50 
Geo-Rep End = 08:48 
Initial Sync = 08:48 - 04:50 = 238 mins
Post Initial Sync Geo-rep starts = 09:06 
Post Initial Sync Geo-rep end = 13:36 
Post Initial Sync duration = 13:36 - 09:06 = 270 mins

Expected results:
There should not be any perf regression with Denali. 

Additional info:
1. We saw following logs 
2014-05-28 09:36:09.835500] W [master(/mnt/brick/brick):278:regjob] <top>: Rsync: .gfid/78b751b3-1270-4ad5-b2ec-dcf9e2d88846 [errcode: 23]
[2014-05-28 09:36:09.836548] W [master(/mnt/brick/brick):278:regjob] <top>: Rsync: .gfid/7648b46e-b003-48e8-a694-6b3de752e747 [errcode: 23]
[2014-05-28 09:36:09.837604] W [master(/mnt/brick/brick):278:regjob] <top>: Rsync: .gfid/94ac7ea0-2cb8-4721-9efa-09bef7fdf4b3 [errcode: 23]
[2014-05-28 09:36:09.838660] W [master(/mnt/brick/brick):278:regjob] <top>: Rsync: .gfid/e81b4660-7dee-4ed6-8e5e-5f2f9f045ec8 [errcode: 23]
[2014-05-28 09:36:09.839720] W [master(/mnt/brick/brick):278:regjob] <top>: Rsync: .gfid/b49dd8a3-2ed1-469c-ab61-93efa5c86183 [errcode: 23]

2. Visualization of geo-rep is available at :-
- http://perf19.perf.lab.eng.bos.redhat.com:3838/may14/georepCorbet26May_compress/
- http://perf19.perf.lab.eng.bos.redhat.com:3838/may14/georepDenali27May_compress/

Comment 6 Neependra Khare 2014-06-05 14:41:35 UTC
"SSH + TAR" is giving better results as compared to "rsync"
http://perf19.perf.lab.eng.bos.redhat.com:3838/june14/georepDenali3June_custom_build_compress_tar/

Comment 8 Kotresh HR 2014-06-20 13:51:25 UTC
This will fix stat on '.gfid' giving EINVAL and as a result
rsync 23 errors because of it.

Upstream Patch:
http://review.gluster.org/8011

Downstream Patch:
https://code.engineering.redhat.com/gerrit/#/c/27398/

Comment 14 Neependra Khare 2014-07-08 12:54:24 UTC
- With Corbett
Initial Sync (xsync)  = 179 mins 
Post Initial sync  (changelog)  = 27 mins

- With Denali (3.6.0.19-1)
Initial Sync (xsync)  = 210 mins 
Post Initial sync  (changelog) = 27 mins    

- With Denali (3.6.0.24-1)
Initial Sync (xsync) = 208 mins 
Post Initial sync  (changelog) = 24 mins 
http://perf19.perf.lab.eng.bos.redhat.com:3838/july14/georepDenaliJuly8_georep_3.6.0.24-1/

- With Denali (3.6.0.22-1 + Arvinda's patch, http://review.gluster.org/#/c/8124/         )
Initial Sync (xsync) = 193 mins 
Post Initial sync  (changelog) = 23 mins 
 http://perf19.perf.lab.eng.bos.redhat.com:3838/july14/georepDenaliJuly2_georep_3.6.0.22-1_customMasterPy/

So with Aravinda's patch we definitely see improvement, specially with xsync. 

Denali is performing ~14% better with changelog as compared to Corbett with patch 8124 (http://review.gluster.org/#/c/8124)

With Denali we see ~7% perf regression for xsync as compared to xsync.

Comment 19 errata-xmlrpc 2014-09-22 19:39:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-1278.html


Note You need to log in before you can comment on or make changes to this bug.