Hide Forgot
Created attachment 521997 [details] sar data
How easy is it to reproduce this problem? If I can reproduce it myself I can try to see what kernel version in 6.1 the problem started happening in. If I cant reproduce it maybe I can give someone a series of 6.1 development/interm kernels so we can try to pinpoint a change thats resposible for the regression. Larry Woodman
You can also refer to bug # 709758 and our support case # 477247 for this issue. But, basicly when we discovered this, our cluster was just begining to take place, it was as simple as : - 8 nodes cluster - Fibre channel attached storage - Some service (like apache/php, with no trafic) running on top of gfs2, where the FS is mounted on more than one node. Wait a few hours, and it was enought to see a good spike in cpu, and during the issue, ssh console was not usable at all (i.e. slow as hell to respond).
Is this bug a duplicate of BZ709758 ??? Larry
Larry, 709758 is the corosync part of this bug where we converted mutex to spinlocks. After we did this, the customer still had problems where the system would still be unusable periodically but only with RHEL6.1 kernels. The customer was able to identify the system misbehaved with only a specific kernel (with the updated packages from 709758). The RHEL process requires separate bugs for different components even if they have the same problem.
comment #22 should read "converted spinlocks to mutexes".
*** This bug has been marked as a duplicate of bug 710265 ***