Description of problem:
GFS 6.0.2-12 (and related packages) running on kernel-smp-2.4.21-
20.0.1-EL causes unreasonably slow system performance. For example,
a completely empty filesystem with ext3 copies 2.3 GB of data in just
over 3 minutes, whereas GFS takes OVER 30 minutes (I stopped it at
30; it wasn't done yet).
The tests were performed using lock_nolock (it's even worse if
lock_gulm is used) and both the ext3 partition and the GFS partition
are on the same physical media (A RAID 5 array on a SAN). The
command used to perform the test was "cp -av /usr /test/gfs".
I've opened a support ticket about this, but support's responses have
been pathetic, at best. I'm hoping the GFS developers that might
read bug postings can be of more help.
Clearly there is some expected performance hit by using GFS, but it's
certainly not supposed to be TEN TIMES slower than ext3.
Some research indicates that this bug could be related to bug 132639
and/or 121434, which are about various kswapd issues that were
introduced in U3. Some of the symptoms are similar, most
specifically, the problem is caused by IO and the system doesn't seem
to recover until you reboot.
I am going to try the same tests using the beta kernel which contains
a fix for bug 132639 (2.4.21-25.EL). I have to compile the GFS
modules from source because the ones on RHN won't load in 2.4.21-
25.EL. I'll post results here in a few hours.
Version-Release number of selected component:
GFS-6.0.2-12 and friends
Steps to Reproduce:
1. Create an ext3 and a GFS partition on the same media.
2. Try "cp -av /usr /gfspartition" and "cp -av /usr /ext3partition"
3. Observe that ext3 is literally ten times faster.
Posted below are the results of my attempt to recreate this. The
numbers are in line with what I would expect to see. With no locking
the GFS was actually faster than ext3 in these runs. Of course
different runs produce some variability, but not of the order of
magnitude as described in the original bug.
Please provide hardware information or anything else that may help to
determine why you are seeing these kinds of performance numbers.
hardware: 3 node x86 cluster; all lock servers, one master, two slaves
command: time cp -av /usr /mnt/<fstype>
ON A SLAVE LOCK SERVER:
ON THE MASTER LOCK SERVER:
Is this still a problem with our environment?
We gave-up on GFS because this issue got no attention from RedHat.
Looks like there was some miscommunication. In your first submission, you said
you were going to run further tests based on the issue in bugzilla 132639. Did
you run those tests and what were the results? Since we weren't able to
reproduce your problem on our equipment, we were waiting on you for both the
results of your tests and details on the hardware you are using.
Are you interested in pursuing this further and running with the latest RHEL3 U6
and GFS 6.0 version?
Closing this bug due to lack of any information to make progress.