804741 – extreme CPU consumption in server kernel doing pure stat workload, read workload

Bug 804741 - extreme CPU consumption in server kernel doing pure stat workload, read workload

Summary: extreme CPU consumption in server kernel doing pure stat workload, read workload

Keywords:
Status:	CLOSED NEXTRELEASE
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterfs
Sub Component:
Version:	1.0
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Vinayaga Raman
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-03-19 17:38 UTC by Ben England
Modified:	2014-03-31 01:28 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2012-04-16 15:34:59 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Ben England 2012-03-19 17:38:14 UTC

Description of problem:

cached performance of "stat" and "read" small-file workloads is an order of magnitude worse than uncached performance.

note: I have not finished isolating this problem.

configuration:
hardware: 2 Dell R510 servers, each wi 2 westmere 6 core sockets, 16-GB mem
4 Dell R610 clients with Intel 10-Gbps card, 2 westmere sockets + 16 GB mem
Brocade 10-Gbps network switch, jumbo frames enabled.

clients mount servers with NFS options "nfsvers=3,tcp", servers are using Gluster-NFS.  No gluster-specific or NFS-specific tuning is done.

workload:

ssh -o StrictHostKeyChecking=no -nx gprfc025 "cd /root/smallfile-v1.8 ; ./smallfile_cli.py  --top /mnt/glnfs/smf.d --operation stat --threads 4 --file-size 4 --record-size 4 --files 16384 --host-set gprfc025,gprfc026,gprfc027,gprfc028" > /shared/benchmarks/gluster_test/logs/12-03-19-09-00-04/smallfile.12-03-19-11-18-06

where workload generator is documented at:

https://home.corp.redhat.com/wiki/smallfile-distributed-io-benchmark 

Version-Release number of selected component (if applicable):

servers running RHSSA-3.2-20111115.1-SSA-x86_64-DVD1.iso (RHEL6.1 based)
clients running RHEL6.2

Volume Name: testfs
Type: Replicate
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: gprfs001-10ge:/vol
Brick2: gprfs002-10ge:/vol


How reproducible:


Steps to Reproduce:
1. create files using above tool
2. drop cache, run above workload
3. run it AGAIN, so metadata is cached in server by XFS
  
Actual results:

initial uncached run shows "stat" workload running around 600 files/sec, and iostat shows that disk is bottleneck.

cached "stat" run gets 60 files/sec.

Expected results:

cached "stat" run should be much better than uncached run.

Additional info:

"top" utility shows very high CPU utilization in glusterfsd, about 6 cores:

[root@gprfs001 ~]#  top
top - 15:28:50 up 6 days, 15:05,  1 user,  load average: 4.84, 4.94, 4.16
Tasks: 328 total,   2 running, 326 sleeping,   0 stopped,   0 zombie
Cpu(s):  7.7%us, 47.8%sy,  0.0%ni, 44.2%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
Mem:  49545232k total, 39829596k used,  9715636k free,     9584k buffers
Swap: 51658744k total,        0k used, 51658744k free,  4034336k cached

 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+ COMMAND                                              
18078 root      20   0  606m  33m 1884 S 604.7  0.1 654:06.31 glusterfsd                                          
18085 root      20   0  555m 383m 2372 R 42.8  0.8  54:20.66 glusterfs                                            
...

"strace" utility shows that this workload is indeed doing only 2 things to XFS:

[root@gprfs001 ~]# strace -c -f -p 18078

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
...
  1.98    0.392300           1    351005       120 lgetxattr
  1.87    0.368941           1    350684           lstat
...

"perf top" shows that there is significant time being spent for at least one of the kernel inside XFS and VFS:

-----------------------------------------------------------------------------------------------------------------
   PerfTop:    5891 irqs/sec  kernel:91.4%  exact:  0.0% [1000Hz cycles],  (target_pid: 18079)
------------------------------------------------------------------------------------------------------------------

             samples  pcnt function                 DSO
             _______ _____ ________________________ ____________________________________________________________

            13866.00 21.1% __d_lookup               [kernel.kallsyms]                                          
             8234.00 12.6% _atomic_dec_and_lock     [kernel.kallsyms]                                          
             5713.00  8.7% dput                     [kernel.kallsyms]                                          
             5657.00  8.6% _spin_lock               [kernel.kallsyms]                                          
             5165.00  7.9% path_init                [kernel.kallsyms]                                          
             4101.00  6.3% __link_path_walk         [kernel.kallsyms]                                          
             3007.00  4.6% xfs_inherit_acl          /lib/modules/2.6.32-131.17.1.el6.x86_64/kernel/fs/xfs/xfs.ko
             1996.00  3.0% path_walk                [kernel.kallsyms]                                          
             1424.00  2.2% strncpy_from_user        [kernel.kallsyms]                                          
             1281.00  2.0% copy_user_generic_string [kernel.kallsyms]                                          
             1175.00  1.8% __follow_mount           [kernel.kallsyms]                                          
             1062.00  1.6% _read_lock               [kernel.kallsyms]                                          
              851.00  1.3% down_read                [kernel.kallsyms]                                          
              749.00  1.1% security_inode_getattr   [kernel.kallsyms]                                          
              709.00  1.1% do_lookup                [kernel.kallsyms]                                          
              481.00  0.7% lookup_mnt               [kernel.kallsyms]

Comment 1 Ben England 2012-04-16 15:34:59 UTC

Cannot reproduce in Gluster V3.3 beta qa33 release.

Note You need to log in before you can comment on or make changes to this bug.