704046 – Write Performance Issues with GFS2

Bug 704046 - Write Performance Issues with GFS2

Summary: Write Performance Issues with GFS2

Keywords:
Status:	CLOSED DUPLICATE of bug 683155
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.6
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Steve Whitehouse
QA Contact:	Red Hat Kernel QE team
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2011-05-11 23:45 UTC by Ramiro
Modified:	2011-05-17 08:45 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2011-05-16 10:30:49 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Ramiro 2011-05-11 23:45:11 UTC

Description of problem:
I have a 4 node cluster running gfs2 on top of a EMC SAN for a while
now, and since couple of months ago we are randomly experiencing heavy
write slowdowns. Write rate goes down from about 30 MB/s to 10 kB/s.
It can affect 1 node or more at the same time. Umount and mount solves
the problem on the affected node, but after some random time (hours,
days) happens again.


Version-Release number of selected component (if applicable):
Centos 5.6 X86_64

How reproducible:
is random

Steps to Reproduce:
1.just wait until it happens
  
Actual results:
file write is 10kB/s max. no matter the file size

Expected results:
about 30 MB/s, as usual

Additional info:
Operating system: Centos 5.6 x86_64
Kernel: 2.6.18-238.9.1.el5
Cman: cman-2.0.115-68.el5_6.3
Gfs2-utils: gfs2-utils-0.1.62-28.el5_6.1
3 nodes fibre channel 4gb
1 node on iscsi 1gb

Comment 1 Ric Wheeler 2011-05-11 23:51:39 UTC

Hi Ramiro,

Have you opened a ticket with RH support (assuming you have a RH subscription) or with EMC?

Which EMC storage do you have?

Thanks!

Comment 2 Ramiro 2011-05-12 00:36:35 UTC

Hi, all systems are running Centos 5.6 and I don't have any RH subscription at the moment.

Our storage is a EMC Clariion CX3-10
Thank you

Comment 3 Ric Wheeler 2011-05-12 01:26:45 UTC

Hi Ramiro,

Red Hat bugzilla is meant primarily to track customer issues we have with RHEL.

If you use CentOS, we certainly appreciate hearing about the issues but you are probably better off posting to the upstream lists.

For GFS2, use:

cluster-devel

As an EMC customer, you can probably open a ticket through them.

One thing to try and avoid is thrashing locks between nodes. For example, running find on multiple nodes, adding and creating files in a shared directory or possibly running a backup application could all contribute to un-even performance.

Thanks!

Comment 4 Ramiro 2011-05-12 04:43:51 UTC

Hi Ric, 

I've created this report because I was suggested to do so on linux-cluster list. Also, some months ago, Steven Whitehouse sent some guidelines as how one should report an issue and he said that members of the community should report potential bugs through here.

I appologize if I'm out of the line here, just triyng to follow protocol in order to get some assistance.

As for EMC, we don't have support right now. But even if we did, I don't think it has to do with it since umounting and mounting solves the problem and other nodes (connected to the storage) with other filesystems (ext3) doesn't have any problems.
I've gone through all our applications (web apps) and found nothing that could be compromising performance. The only application that might affect performance is a backup client (amanda) which runs once a day. But it has been running since a while now and we saw no problems in the past.


Thank you for your time.

Comment 5 Steve Whitehouse 2011-05-12 08:34:38 UTC

Yes, I would like to encourage people to report bugs here. Provided its clear what is being reported, then thats ok.

There are a couple of likely causes, so let me ask some questions to try and narrow down the possibilities:

Firstly, how full is the filesystem in question?

Secondly, are you able to identify a file on the filesystem which has been created while the write speed was very slow? If, and assume it is a reasonable size, say 1000 blocks or more, then take a look at it with the filefrag tool. That will show you all the extents. If the extents are very small (e.g. only a few blocks each) then you may have hit a known problem. If so then we have already fixed this and I can close this bug as a dup of that one. This is more likely if the filesystem is getting close to being full - you won't hit this bug on a filesystem that is nearly empty. Also note that this particular bug does not affect either Fedora/upstream or RHEL6.

If that turns out not to be the case, then the next most likely issue is contention between the nodes as Ric mentioned above. We have a document which describes how to deal with that situation, but it is only accessible to customers with an RHN account at the moment, I'm afraid.

Comment 6 Ramiro 2011-05-12 21:00:48 UTC

Hi Steve,

1. The filesystem is 60% full. 
2. I've copied a file (2632 blocks) from an affected node and here is what i've got:

from affected node:
filefrag /mnt/gfs/xymon-4.3.0.tar.gz 
/mnt/gfs/xymon-4.3.0.tar.gz: 652 extents found

the same file copied from unaffected node to another location:
filefrag /mnt/gfs/tmp/xymon-4.3.0.tar.gz 
/mnt/gfs/tmp/xymon-4.3.0.tar.gz: 157 extents found

a file created on 2009 and about the same size:
filefrag /mnt/gfs/awstats/data/awstats022009.test.txt 
/mnt/gfs/awstats/data/awstats022009.test.txt: 1 extent found

3. Where can I find that document? I don't have any entitlements right now but maybe I can access it.

Thank you.

Comment 7 Ramiro 2011-05-12 21:39:58 UTC

Sorry, the filesystem is 66% full. 
total: 200G  
used:  132G
free:  69G

Comment 8 Steve Whitehouse 2011-05-16 10:30:08 UTC

It looks like you have more extents than I'd expect for only ~2600 blocks on the affected node, and your fs is probably full enough to have hit the problem.

The document is, I'm afraid, only available to those with an RHN account, but the same information has been repeated by myself (and others) many times on the mailing lists. It is a question of taking into account the cache performance of GFS2 in order to gain the most from it.

I'm going to close this bz as a dup of the existing, known issue that is almost certainly causing the problem you've reported.

I would also suggest that you consider either moving to a paid support contract with Red Hat or to use Fedora. In the latter case, the particular problem has been fixed for a long time now, and it is much more uptodate. The problem with CentOS is that they are not picking up updates in a timely manner, so that you are likely to run into a few issues that have long since been fixed in other distros.

Comment 9 Steve Whitehouse 2011-05-16 10:30:49 UTC


*** This bug has been marked as a duplicate of bug 683155 ***

Comment 10 Ramiro 2011-05-17 04:56:24 UTC

Steve, is this problem fixed in RHEL 5? Is Fedora OK for production use? Isn't Fedora's policy to give updates for only 13 months on each release? 
 Cheers

Comment 11 Steve Whitehouse 2011-05-17 08:45:47 UTC

Yes, it will be fixed in RHEL 5.7, and it has also been released in z-stream for 5.4.z and 5.6.z, however these z releases do not make their way into CentOS to the best of my knowledge, so there will be no released CentOS with this fix until 5.7.

Yes, it is Fedora policy to only release updates for a fairly limited period of time. My point was not that Fedora was suitable for production use, but more that it will get fixes much quicker than CentOS which has to wait both for the RHEL process and then for its own processes before updates are ready.

So Fedora is good for evaluation use, and also if you are willing to do your own support. CentOS has its uses, but it does appear to be a bit behind the time wrt GFS2. I don't often use any of the Debian package manager based distros (Ubuntu, etc) but they may potentially offer a solution closer to what you are looking for (a balance between update frequency and stability) though I can't vouch for that.

If on the other hand, you want the support to be done for you, and at risk of sounding like an advert, there is really no substitute for paid-for distros.

Note You need to log in before you can comment on or make changes to this bug.