Hide Forgot
Created attachment 574102 [details] Excerpt from /var/log/messages Description of problem: Heavy local input/output activity causes nfsd, lockd and rpcbind to become unresponsive for a period of about 1-2 minutes. This problem was first observed after an upgrade to kernel 2.6.32-220.4.2.el6.x86_64, and continues with kernel 2.6.32-220.7.1.el6.x86_64. The downgrade to kernel 2.6.32-131.21.1.el6.x86_64 resolves the problem, suggesting this is a regression introduced in RHEL 6.2. Details: This problem has been observed on a machine which is a production NFS fileserver using XFS to store user home directories. At night it runs rsnapshot, which takes around 35 minutes to complete. In the middle of this run, clients report that the NFS server is unresponsive, for example: Mar 8 03:22:46 giga kernel: [2528650.152883] lockd: server wind not responding, still trying Mar 8 03:24:41 giga kernel: [2528764.520874] lockd: server wind OK Mar 9 03:57:55 giga kernel: [2616988.229604] rpcbind: server wind not responding, timed out Mar 21 04:00:30 giga kernel: [3648349.365498] nfs: server wind not responding, still trying Mar 21 04:01:39 giga kernel: [3648418.079378] nfs: server wind OK Mar 21 04:01:39 work dovecot: dovecot: chdir(/share/wind/john) blocked for 21 secs On one occasion the affected machine reported a soft lockup of nfsd and lockd, which appears to be caused by waiting on a mutex in XFS-related code. Please see the attached system log for details. The workaround is to switch back to kernel 2.6.32-131.21.1.el6.x86_64. Version-Release number of selected component (if applicable): kernel 2.6.32-220.7.1.el6.x86_64 kernel 2.6.32-220.4.2.el6.x86_64 How reproducible: Intermittent, but fairly frequent. Steps to Reproduce: As the affected machine is a production server, I was unable to perform stress-testing necessary to find a reliable way of reproduction of this problem.
Can you please open a support ticket through Red Hat support? They will help you gather information and debug first level issues and then work with development if that is required. If you don't have a support agreement in place, best to raise these issues on upstream community lists. Thanks!
Ric, thank you for your advice. However please keep in mind that this is a bug report, and not a service request. As I stated above, I was able to work around this issue, and do not need an assistance at this time. I will gladly cooperate in efforts to fix this bug, if Red Hat is interested in fixing it.
Not really advice - I manage all of the file system developers and we use bugzilla to support our customers. If you have a support contract, you should work with our support team since they do a lot of the work and often resolve the issues before hitting the core developer team. If you don't have a support contract, please take the support request to the community lists. All of our developers are very active in helping out on community issues, but of course, our customers do take priority. Thanks!
Since RHEL 6.3 External Beta has begun, and this bug remains unresolved, it has been rejected as it is not proposed as exception or blocker. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux.
cc
Please reopen if you can work with Red Hat support to gather data.