Bug 717010 - customer reports interactivity regression between 6.0 and 6.1 kernels with high IO
customer reports interactivity regression between 6.0 and 6.1 kernels with hi...
Status: CLOSED DUPLICATE of bug 710265
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel (Show other bugs)
6.1
All Linux
high Severity high
: rc
: ---
Assigned To: Larry Woodman
Red Hat Kernel QE team
: Regression
Depends On: 709758
Blocks:
  Show dependency treegraph
 
Reported: 2011-06-27 14:09 EDT by Steven Dake
Modified: 2016-04-26 10:45 EDT (History)
14 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 709758
Environment:
Last Closed: 2011-10-17 15:53:52 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
sar data (541.89 KB, application/x-gzip)
2011-09-07 16:16 EDT, Ricky Nelson
no flags Details

  None (edit)
Comment 13 Ricky Nelson 2011-09-07 16:16:41 EDT
Created attachment 521997 [details]
sar data
Comment 19 Larry Woodman 2011-09-13 14:14:00 EDT
How easy is it to reproduce this problem?  If I can reproduce it myself I can try to see what kernel version in 6.1 the problem started happening in.  If I cant reproduce it maybe I can give someone a series of 6.1 development/interm kernels so we can try to pinpoint a change thats resposible for the regression.

Larry Woodman
Comment 20 Nicolas Ross 2011-09-13 14:36:07 EDT
You can also refer to bug # 709758 and our support case # 477247 for this issue.

But, basicly when we discovered this, our cluster was just begining to take place, it was as simple as :

- 8 nodes cluster
- Fibre channel attached storage
- Some service (like apache/php, with no trafic) running on top of gfs2, where the FS is mounted on more than one node.

Wait a few hours, and it was enought to see a good spike in cpu, and during the issue, ssh console was not usable at all (i.e. slow as hell to respond).
Comment 21 Larry Woodman 2011-09-19 10:14:04 EDT
Is this bug a duplicate of BZ709758 ???


Larry
Comment 22 Steven Dake 2011-09-19 11:09:07 EDT
Larry,

709758 is the corosync part of this bug where we converted mutex to spinlocks.  After we did this, the customer still had problems where the system would still be unusable periodically but only with RHEL6.1 kernels.  The customer was able to identify the system misbehaved with only a specific kernel (with the updated packages from 709758).

The RHEL process requires separate bugs for different components even if they have the same problem.
Comment 23 Steven Dake 2011-09-19 14:33:06 EDT
comment #22 should read "converted spinlocks to mutexes".
Comment 29 Ricky Nelson 2011-10-17 15:53:52 EDT

*** This bug has been marked as a duplicate of bug 710265 ***

Note You need to log in before you can comment on or make changes to this bug.