Bug 695387

Summary: NFS Workloads triggering System Hang
Product: Red Hat Enterprise Linux 5 Reporter: sstephens
Component: kernelAssignee: Red Hat Kernel Manager <kernel-mgr>
Status: CLOSED WONTFIX QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: unspecified    
Version: 5.4CC: pm-rhel, rwheeler, sstephens
Target Milestone: rcFlags: pm-rhel: needinfo? (sstephens)
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-06-03 12:29:22 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
cacti graphs of nfs performance none

Description sstephens 2011-04-11 14:45:18 UTC
Created attachment 491251 [details]
cacti graphs of nfs performance

Description of problem:

     Under increased NFS load, above 1.5k write_req/sec and above 50Mbit/sec bandwidth utilization, the system hangs and becomes unresponsive to all interrupts. The system resumes after 15-30 minutes with no evidence of the root cause. The dmesg and other system logs fail to record any error and there are no core files created.

Version-Release number of selected component (if applicable):

     kernel-2.6.18-194.26.1.el5 x86_64

How reproducible:
     
     Anytime the number of write_req's reaches or exceeds roughly 1.5k requests/sec and seemingly coupled with an interface bandwidth usage above 50 megabits/sec.

Steps to Reproduce:
1. Using Iozone http://www.iozone.org/ to run benchmark tests from NFS client to NFS mount point on affected server.
2. Setup tests to produce 1.5k or more write requests/sec with 2MB file sizes.
3. NFS Server becomes unresponsive
  
Actual results:

     NFS Server becomes unresponsive until test ends or the write requests are allowed to taper off.

Expected results:

     NFS Server should be able to handle this traffic level. This same hardware was upgraded from RHEL4 to RHEL5 and physical memory increased. NFS request traffic has remained the same from before OS upgrade. After upgrade to RHEL5, the server would start hanging during the increased NFS load periods.

Additional info:
     The hardware is a Dell Poweredge 1850, with 8Gb of RAM. It is connected to an EMC Clariion CX-500 via Qlogic Fiber HBA adapters. The connections to the SAN are managed by PowerPath. The kernel is at the highest supported level of the PowerPath drivers. PowerPath is version 5.3 SP1. I'm attaching the graphs of NFS server performace from Cacti. The gaps in the graphs are the periods when the host stops responding.

Comment 1 Ric Wheeler 2011-04-11 14:53:03 UTC
It would be great if you can file this via the Red Hat support channels - our support people are great at helping triage and pull together data. If you don't have a subscription, you can always post a summary of the issue to linux-nfs (our nfs team follows that very closely).

RH bugzilla is not meant to be used as a front line support tool, thanks!

Comment 2 RHEL Program Management 2014-03-07 13:41:50 UTC
This bug/component is not included in scope for RHEL-5.11.0 which is the last RHEL5 minor release. This Bugzilla will soon be CLOSED as WONTFIX (at the end of RHEL5.11 development phase (Apr 22, 2014)). Please contact your account manager or support representative in case you need to escalate this bug.

Comment 3 RHEL Program Management 2014-06-03 12:29:22 UTC
Thank you for submitting this request for inclusion in Red Hat Enterprise Linux 5. We've carefully evaluated the request, but are unable to include it in RHEL5 stream. If the issue is critical for your business, please provide additional business justification through the appropriate support channels (https://access.redhat.com/site/support).