Description of problem: GFS 6.0.2-12 (and related packages) running on kernel-smp-2.4.21- 20.0.1-EL causes unreasonably slow system performance. For example, a completely empty filesystem with ext3 copies 2.3 GB of data in just over 3 minutes, whereas GFS takes OVER 30 minutes (I stopped it at 30; it wasn't done yet). The tests were performed using lock_nolock (it's even worse if lock_gulm is used) and both the ext3 partition and the GFS partition are on the same physical media (A RAID 5 array on a SAN). The command used to perform the test was "cp -av /usr /test/gfs". I've opened a support ticket about this, but support's responses have been pathetic, at best. I'm hoping the GFS developers that might read bug postings can be of more help. Clearly there is some expected performance hit by using GFS, but it's certainly not supposed to be TEN TIMES slower than ext3. Some research indicates that this bug could be related to bug 132639 and/or 121434, which are about various kswapd issues that were introduced in U3. Some of the symptoms are similar, most specifically, the problem is caused by IO and the system doesn't seem to recover until you reboot. I am going to try the same tests using the beta kernel which contains a fix for bug 132639 (2.4.21-25.EL). I have to compile the GFS modules from source because the ones on RHN won't load in 2.4.21- 25.EL. I'll post results here in a few hours. Version-Release number of selected component: GFS-6.0.2-12 and friends How reproducible: Every time Steps to Reproduce: 1. Create an ext3 and a GFS partition on the same media. 2. Try "cp -av /usr /gfspartition" and "cp -av /usr /ext3partition" 3. Observe that ext3 is literally ten times faster.
Posted below are the results of my attempt to recreate this. The numbers are in line with what I would expect to see. With no locking the GFS was actually faster than ext3 in these runs. Of course different runs produce some variability, but not of the order of magnitude as described in the original bug. Please provide hardware information or anything else that may help to determine why you are seeing these kinds of performance numbers. hardware: 3 node x86 cluster; all lock servers, one master, two slaves kernel: 2.4.21-20.0.1.ELsmp GFS: GFS-6.0.2-12 command: time cp -av /usr /mnt/<fstype> ======================= ON A SLAVE LOCK SERVER: ======================= ext3: ---- real 3m6.400s user 0m2.270s sys 0m25.880s gfs (lock_gulm): ---------------- real 5m5.389s user 0m3.190s sys 0m53.540s gfs (lock_nolock): ------------------ real 2m34.451s user 0m2.850s sys 0m43.780s ========================== ON THE MASTER LOCK SERVER: ========================== ext3: ----- real 3m42.878s user 0m2.170s sys 0m24.440s gfs (lock_gulm): ---------------- real 4m59.828s user 0m2.850s sys 0m58.930s gfs (lock_nolock): ------------------ real 3m11.067s user 0m2.250s sys 0m36.540s
Is this still a problem with our environment?
We gave-up on GFS because this issue got no attention from RedHat.
Looks like there was some miscommunication. In your first submission, you said you were going to run further tests based on the issue in bugzilla 132639. Did you run those tests and what were the results? Since we weren't able to reproduce your problem on our equipment, we were waiting on you for both the results of your tests and details on the hardware you are using. Are you interested in pursuing this further and running with the latest RHEL3 U6 and GFS 6.0 version?
Closing this bug due to lack of any information to make progress.