Bug 142342 - GFS causes unreasonably slow system performance w/ 2.4.21-20.0.1.EL
Summary: GFS causes unreasonably slow system performance w/ 2.4.21-20.0.1.EL
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: gfs
Version: 3
Hardware: i686
OS: Linux
medium
high
Target Milestone: ---
Assignee: Kiersten (Kerri) Anderson
QA Contact: GFS Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2004-12-09 01:58 UTC by Tolga Tarhan
Modified: 2010-01-12 03:01 UTC (History)
1 user (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2006-05-08 15:35:11 UTC
Embargoed:


Attachments (Terms of Use)

Description Tolga Tarhan 2004-12-09 01:58:06 UTC
Description of problem:
GFS 6.0.2-12 (and related packages) running on kernel-smp-2.4.21-
20.0.1-EL causes unreasonably slow system performance.  For example, 
a completely empty filesystem with ext3 copies 2.3 GB of data in just 
over 3 minutes, whereas GFS takes OVER 30 minutes (I stopped it at 
30; it wasn't done yet).

The tests were performed using lock_nolock (it's even worse if 
lock_gulm is used) and both the ext3 partition and the GFS partition 
are on the same physical media (A RAID 5 array on a SAN).  The 
command used to perform the test was "cp -av /usr /test/gfs".

I've opened a support ticket about this, but support's responses have 
been pathetic, at best.  I'm hoping the GFS developers that might 
read bug postings can be of more help.

Clearly there is some expected performance hit by using GFS, but it's 
certainly not supposed to be TEN TIMES slower than ext3.

Some research indicates that this bug could be related to bug 132639 
and/or 121434, which are about various kswapd issues that were 
introduced in U3.  Some of the symptoms are similar, most 
specifically, the problem is caused by IO and the system doesn't seem 
to recover until you reboot.

I am going to try the same tests using the beta kernel which contains 
a fix for bug 132639 (2.4.21-25.EL).  I have to compile the GFS 
modules from source because the ones on RHN won't load in 2.4.21-
25.EL.  I'll post results here in a few hours.


Version-Release number of selected component:
GFS-6.0.2-12 and friends


How reproducible:
Every time


Steps to Reproduce:
1. Create an ext3 and a GFS partition on the same media.
2. Try "cp -av /usr /gfspartition" and "cp -av /usr /ext3partition"
3. Observe that ext3 is literally ten times faster.

Comment 1 Derek Anderson 2004-12-09 18:58:45 UTC
Posted below are the results of my attempt to recreate this.  The
numbers are in line with what I would expect to see.  With no locking
the GFS was actually faster than ext3 in these runs.  Of course
different runs produce some variability, but not of the order of
magnitude as described in the original bug.

Please provide hardware information or anything else that may help to
determine why you are seeing these kinds of performance numbers.

hardware: 3 node x86 cluster; all lock servers, one master, two slaves
kernel:   2.4.21-20.0.1.ELsmp
GFS:      GFS-6.0.2-12
command:  time cp -av /usr /mnt/<fstype>

=======================
ON A SLAVE LOCK SERVER:
=======================
ext3:
----
real    3m6.400s
user    0m2.270s
sys     0m25.880s

gfs (lock_gulm):
----------------
real    5m5.389s
user    0m3.190s
sys     0m53.540s

gfs (lock_nolock):
------------------
real    2m34.451s
user    0m2.850s
sys     0m43.780s

==========================
ON THE MASTER LOCK SERVER:
==========================
ext3:
-----
real    3m42.878s
user    0m2.170s
sys     0m24.440s

gfs (lock_gulm):
----------------
real    4m59.828s
user    0m2.850s
sys     0m58.930s

gfs (lock_nolock):
------------------
real    3m11.067s
user    0m2.250s
sys     0m36.540s

Comment 2 Kiersten (Kerri) Anderson 2005-10-11 21:51:47 UTC
Is this still a problem with our environment?

Comment 3 Tolga Tarhan 2005-10-11 21:54:59 UTC
We gave-up on GFS because this issue got no attention from RedHat.

Comment 4 Kiersten (Kerri) Anderson 2005-10-11 22:48:37 UTC
Looks like there was some miscommunication.  In your first submission, you said
you were going to run further tests based on the issue in bugzilla 132639.  Did
you run those tests and what were the results?  Since we weren't able to
reproduce your problem on our equipment, we were waiting on you for both the
results of your tests and details on the hardware you are using.

Are you interested in pursuing this further and running with the latest RHEL3 U6
and GFS 6.0 version?

Comment 5 Kiersten (Kerri) Anderson 2006-05-08 15:35:11 UTC
Closing this bug due to lack of any information to make progress.


Note You need to log in before you can comment on or make changes to this bug.