Bug 764097 (GLUSTER-2365) - performance issues for distribute-replicate volume on AMI-AWS
Summary: performance issues for distribute-replicate volume on AMI-AWS
Keywords:
Status: CLOSED NOTABUG
Alias: GLUSTER-2365
Product: GlusterFS
Classification: Community
Component: replicate
Version: 3.1.2
Hardware: All
OS: Linux
low
low
Target Milestone: ---
Assignee: tcp
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-02-03 11:58 UTC by Saurabh
Modified: 2011-03-17 05:09 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Saurabh 2011-02-03 11:58:19 UTC
Some performance issues were observed for distribute-replicate,

mount :- fuse,

operation:- Metadata intensive operation,
Result:-
3.1.2(AWS):- 456s
AMI-AWS(3.1.2):-665.77s (change of around 50%)
Average of three runs:- (647.50+689.62+660.21)/3=665.77s


Operation:- Largefile Write
Result:-
3.1.2(AWS):- 53s
AMI-AWS(3.1.2):- 77.20s (change of around 50%)
Average of three runs:- (58.85 + 55.95 + 116.81) / 3 = 77.20s 

Operation:- Largefile Create
Result:-
3.1.2(AWS):- 47s
AMI-AWS(3.1.2):-64.84s 
Average of three runs:- (31.54 + 37.68 + 125.32) / 3 = 64.84s

Operation:- Smallfile read/reread,
Result:-
3.1.2(AWS):- 625s/608s
AMI-AWS(3.1.2):- 831.87s/814.53s



Mount:- nfs
Operation : smallfile read/reread
3.1.2(AWS):- 845s/439s
AMI-AWS(3.1.2):- 1103.94s/610.18s

Comment 1 tcp 2011-02-03 16:40:07 UTC
When the performance measurements are being made, can we collect the following statistics on the client and the server for every run and the entire duration of the run? [for both the baseline (AWS in this case) and the test run (AMI-AWS)]

1. iostat -xcdhn 5
2. ethstats [you can get this via apt-get]
3. mpstat -A 5 (also run - "cat /proc/interrupts" once and save the output).
4. vmstat 5

Comment 2 Vikas Gorur 2011-02-03 20:02:16 UTC
More questions to add to those from Pavan:

- How many servers?
- Which AMI/distribution are you using for the AWS case?
- Are the EBS volumes in the AWS case RAID'ed?

The performance of EBS volumes can vary considerably from instance to instance. Can you also get baseline numbers for the export directories in each case by running:

dd if=/dev/zero of=/export/directory/file bs=1M count=512 oflag=direct

Capturing iostat will also help answer this question.

Finally, are you trying to figure out from these tests how the AMI version of GlusterFS compares to just regular GlusterFS installed on an EC2 instance? Do you have any reason to believe there will be a difference?

Comment 3 tcp 2011-02-04 06:32:48 UTC
(In reply to comment #2)
> More questions to add to those from Pavan:
> 
> - How many servers?
> - Which AMI/distribution are you using for the AWS case?
> - Are the EBS volumes in the AWS case RAID'ed?
> 
> The performance of EBS volumes can vary considerably from instance to instance.
> Can you also get baseline numbers for the export directories in each case by
> running:
> 
> dd if=/dev/zero of=/export/directory/file bs=1M count=512 oflag=direct
> 
> Capturing iostat will also help answer this question.
> 
> Finally, are you trying to figure out from these tests how the AMI version of
> GlusterFS compares to just regular GlusterFS installed on an EC2 instance? Do
> you have any reason to believe there will be a difference?

To answer your question above, I think the baseline in this test was using CentOS. The statistics will hopefully indicate at the bottlenecks. But what concerns me is the degree of variation.

Pavan

Comment 4 tcp 2011-02-04 16:37:56 UTC
After running another round of perf tests (baseline on AWS with Centos and the test run on Gluster AMI), the results were very different from what led Saurabh to raise this bug. But what comes out very clearly is that there is a possibility of high variation in the perf numbers. For some tests, I see ~4x variation from one setup to the other, even though the system configurations (of both setups) are more or less the same.

NOTE: No EBS was used for this test run.

I think this bug can be closed. I'd like to hear from Saurabh on this matter before closing it out.

The detailed logs and some scripts to perform statistics collection on servers remotely are available. They are a bit bulky to be attached. If there are people interested in taking a look, please send me an email regarding the same and I'll send the logs/scripts.

Pavan

Comment 5 tcp 2011-03-17 02:09:56 UTC
As mentioned in the previous update, the results are not consistently reproducible. This is not a bug.


Note You need to log in before you can comment on or make changes to this bug.