Bug 208419

Summary: SPECsfs NFS V3 workload on RHEL4(optimized and instrumented with lockmeter) using TCP/IP, EXT3 shows major lock contention in sunrpc code
Product: Red Hat Enterprise Linux 4 Reporter: Barry Marson <bmarson>
Component: kernelAssignee: Peter Staubach <staubach>
Status: CLOSED WONTFIX QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 4.4CC: dshaks, jbaron, steved
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-02-05 15:05:51 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 208327    
Bug Blocks: 430698    

Description Barry Marson 2006-09-28 14:29:42 UTC
+++ This bug was initially created as a clone of Bug #208327 +++

This bug almost certainly exists in all of RHEL4.  Having seen the same system
time and Oprofile data with a run on RHEL4 (U3 and U4), and seeing virtually the
same hot functions, and looking at RHEL4 sunrpc sources briefly, I'm reasonably
comfortable saying it's here too.

Barry

Description of problem:

Running SPECsfs using a RHEL5 beta1 based kernel that is fully optimized 
(RHEL5-2.6.17-1.2519.4.21.el5.f2.opt created by Don Zickus) and instrumented
with lockmeter (latest patches from HP) , profiling shows excessive CPU cycles
in the sunrpc routines.  Lockmeter shows excessive missing and spinning.  Below
is a snippet of the output of the lockstat tool at it's highest contention.

System: Linux bigi.hpperf.rdu.redhat.com 2.6.17-1.2519.4.21el5.f2.opt.lockmeter2
#1 SMP Fri Sep 22 02:32:12 EDT 2006 x86_64
Total counts

All (8) CPUs

Start time: Fri Sep 22 06:59:26 2006
End   time: Fri Sep 22 07:00:26 2006

Symbols from:  /proc/kallsyms

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - -
SPINLOCKS         HOLD            WAIT
  UTIL  CON    MEAN(  MAX )   MEAN(  MAX )(% CPU)     TOTAL NOWAIT SPIN RJECT  NAME

        7.5%  1.3us( 198ms)   11us(  60ms)(23.9%) 136775128 92.5%  7.5% 0.00% 
*TOTAL*

.
.
.

 45.2% 45.5%  1.4us(  83us)   12us( 322us)(21.9%)  19017532 54.5% 45.5%    0% 
[0xffff81022f1bc0f0]
  8.1% 48.2%  1.8us(  59us)   12us( 236us)( 3.2%)   2732401 51.8% 48.2%    0%  
 svc_recv+0x22a
  3.9% 62.0%  0.9us(  67us)   12us( 278us)( 4.1%)   2644539 38.0% 62.0%    0%  
 svc_recv+0x322
  1.7% 44.3%  0.7us(  46us)   11us( 232us)( 1.5%)   1433024 55.7% 44.3%    0%  
 svc_recv+0x416
  1.5% 61.3%  0.2us(  52us)   12us( 279us)( 6.6%)   4262578 38.7% 61.3%    0%  
 svc_reserve+0x25
 29.1% 40.1%  3.4us(  83us)   13us( 322us)( 5.5%)   5212589 59.9% 40.1%    0%  
 svc_sock_enqueue+0x2e
 0.75% 13.4%  0.2us(  40us)   13us( 262us)(0.96%)   2732401 86.6% 13.4%    0%  
 svc_sock_release+0x106

.
.
.

Looking at spinlock calls in the hot functions above, there appears to be a
single lock in the struct svc_serv (sv_lock).

The server system is an HP DL580 4 socket Xeon Extreme with HT disabled yielding
8 logical CPU's.  It contains 16GB of RAM, 4 NICS all doing jumbo frames to 4
clients on private VLANS.  There are 4 HP MSA1000's directly connected to two
dual port Qlogic FC adaptors.

The benchmark has 64 processes per client communicating evenly to 16 EXT3
filesystems presented by the server.  Filesystem options include -J size=4.

Extensive profiling data has been gathered and is available.  The data includes:

    I/O rates
    VMSTAT
    SLABINFO
    NETSTAT
    OPROFILE
    LOCKMETER

and can be uploaded.  LARGE LOG FILE for each run.

Version-Release number of selected component (if applicable):


How reproducible:

Always

Steps to Reproduce:
1. Run benchmark with opt and patched for lockmeter kernel
2.
3.
  
Actual results:

See above

Expected results:


Additional info:

Comment 1 RHEL Program Management 2007-05-09 09:32:34 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 6 RHEL Program Management 2008-02-05 15:05:51 UTC
Development Management has reviewed and declined this request.  You may appeal
this decision by reopening this request.