| Summary: | customer reports interactivity regression between 6.0 and 6.1 kernels with high IO | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Steven Dake <sdake> | ||||
| Component: | kernel | Assignee: | Larry Woodman <lwoodman> | ||||
| Status: | CLOSED DUPLICATE | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 6.1 | CC: | aquini, asalkeld, cluster-maint, cww, djansa, fdinitto, jcastillo, lhh, nicolas, omer.sen, rnelson, sbradley, sdake, vgoyal | ||||
| Target Milestone: | rc | Keywords: | Regression | ||||
| Target Release: | --- | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | 709758 | Environment: | |||||
| Last Closed: | 2011-10-17 19:53:52 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Bug Depends On: | 709758 | ||||||
| Bug Blocks: | |||||||
| Attachments: |
|
||||||
How easy is it to reproduce this problem? If I can reproduce it myself I can try to see what kernel version in 6.1 the problem started happening in. If I cant reproduce it maybe I can give someone a series of 6.1 development/interm kernels so we can try to pinpoint a change thats resposible for the regression. Larry Woodman You can also refer to bug # 709758 and our support case # 477247 for this issue. But, basicly when we discovered this, our cluster was just begining to take place, it was as simple as : - 8 nodes cluster - Fibre channel attached storage - Some service (like apache/php, with no trafic) running on top of gfs2, where the FS is mounted on more than one node. Wait a few hours, and it was enought to see a good spike in cpu, and during the issue, ssh console was not usable at all (i.e. slow as hell to respond). Is this bug a duplicate of BZ709758 ??? Larry Larry, 709758 is the corosync part of this bug where we converted mutex to spinlocks. After we did this, the customer still had problems where the system would still be unusable periodically but only with RHEL6.1 kernels. The customer was able to identify the system misbehaved with only a specific kernel (with the updated packages from 709758). The RHEL process requires separate bugs for different components even if they have the same problem. comment #22 should read "converted spinlocks to mutexes". *** This bug has been marked as a duplicate of bug 710265 *** |
Created attachment 521997 [details] sar data