| Summary: | Severe GFS2 performance regression between 2.6.18-164.2.1 and 2.6.18-194.17.1 | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Justin I. Nevill <jnevill> |
| Component: | kernel | Assignee: | Robert Peterson <rpeterso> |
| Status: | CLOSED DUPLICATE | QA Contact: | Cluster QE <mspqa-list> |
| Severity: | urgent | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 5.5 | CC: | mdimaio, rpeterso, rwheeler |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2011-04-06 17:17:46 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Justin I. Nevill
2011-04-05 17:28:57 UTC
I recommend the customer try the kernel modules located on my people page: http://people.redhat.com/rpeterso/Experimental/RHEL5.x/gfs2/kernel-2.6.18-248.el5.bz656032.x86_64.rpm This kernel contains a bunch of fixes that may improve their performance, the most notable of which is the patch to DLM that sets the TCP_NODELAY bit by default. Let me know how this goes. There is still an outstanding issue with this kernel for which we have bug #690555, but those issues are rare. I'll set the NEEDINFO bit until I hear if this kernel helps. Robert, I really appreciate your quick response on this! We got the necessary approvals and such done at the customer this afternoon, and the test kernel resolved the issue entirely! It's even faster than the original (-164) testing: * 2.6.18-248.bz656032 job_name: startup_time / production_time / total_time / transactions e_V_P: 0:02 / 0:00 / 0:00:12 / 1844 s_G_A_f_ST: 0:03 / 2:35 / 0:02:50 / 922664 s_G_AT_f_ST: 0:02 / 0:00 / 0:00:12 / 0 E_R_T_C: 0:02 / 0:00 / 0:00:07 / 27 Customer wants hotfix asap and backport to 5.5. I'll read through #690555 in the morning to see if there's any concern with not having it included. I'll get those requests in tomorrow as well. Should I do that against this BZ, or do you have another that this one dupes and that resulted in the test kernel RPM you linked? Thanks, Justin Since that kernel fixed their performance issue, and since that kernel contains a number of patches for a number of bugs, the problem they reported could be any number of them: 1. bug #690239 - gfs2: creating large files suddenly slow to a crawl 2. bug #604139 - flock performance with DLM in RHEL 5.5 3. bug #650494 - Bouncing locks in a cluster is slow in GFS2 4. bug #656032 - GFS2 filesystem hang caused by incorrect lock order It's hard to say which case is causing their specific problem. I'm guessing it's really #2, bug #604139, but I have no proof. The problem goes well beyond flocks. We could use the process of elimination to figure out which problem is their primary one, but it seems hardly worth the time it would take, since they're going to want the fixes for all four bugs anyway. I'm going to mark this as a duplicate of bug #604139, since it best fits that symptom. *** This bug has been marked as a duplicate of bug 604139 *** |