Description of problem: cached performance of "stat" and "read" small-file workloads is an order of magnitude worse than uncached performance. note: I have not finished isolating this problem. configuration: hardware: 2 Dell R510 servers, each wi 2 westmere 6 core sockets, 16-GB mem 4 Dell R610 clients with Intel 10-Gbps card, 2 westmere sockets + 16 GB mem Brocade 10-Gbps network switch, jumbo frames enabled. clients mount servers with NFS options "nfsvers=3,tcp", servers are using Gluster-NFS. No gluster-specific or NFS-specific tuning is done. workload: ssh -o StrictHostKeyChecking=no -nx gprfc025 "cd /root/smallfile-v1.8 ; ./smallfile_cli.py --top /mnt/glnfs/smf.d --operation stat --threads 4 --file-size 4 --record-size 4 --files 16384 --host-set gprfc025,gprfc026,gprfc027,gprfc028" > /shared/benchmarks/gluster_test/logs/12-03-19-09-00-04/smallfile.12-03-19-11-18-06 where workload generator is documented at: https://home.corp.redhat.com/wiki/smallfile-distributed-io-benchmark Version-Release number of selected component (if applicable): servers running RHSSA-3.2-20111115.1-SSA-x86_64-DVD1.iso (RHEL6.1 based) clients running RHEL6.2 Volume Name: testfs Type: Replicate Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: gprfs001-10ge:/vol Brick2: gprfs002-10ge:/vol How reproducible: Steps to Reproduce: 1. create files using above tool 2. drop cache, run above workload 3. run it AGAIN, so metadata is cached in server by XFS Actual results: initial uncached run shows "stat" workload running around 600 files/sec, and iostat shows that disk is bottleneck. cached "stat" run gets 60 files/sec. Expected results: cached "stat" run should be much better than uncached run. Additional info: "top" utility shows very high CPU utilization in glusterfsd, about 6 cores: [root@gprfs001 ~]# top top - 15:28:50 up 6 days, 15:05, 1 user, load average: 4.84, 4.94, 4.16 Tasks: 328 total, 2 running, 326 sleeping, 0 stopped, 0 zombie Cpu(s): 7.7%us, 47.8%sy, 0.0%ni, 44.2%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st Mem: 49545232k total, 39829596k used, 9715636k free, 9584k buffers Swap: 51658744k total, 0k used, 51658744k free, 4034336k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 18078 root 20 0 606m 33m 1884 S 604.7 0.1 654:06.31 glusterfsd 18085 root 20 0 555m 383m 2372 R 42.8 0.8 54:20.66 glusterfs ... "strace" utility shows that this workload is indeed doing only 2 things to XFS: [root@gprfs001 ~]# strace -c -f -p 18078 % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- ... 1.98 0.392300 1 351005 120 lgetxattr 1.87 0.368941 1 350684 lstat ... "perf top" shows that there is significant time being spent for at least one of the kernel inside XFS and VFS: ----------------------------------------------------------------------------------------------------------------- PerfTop: 5891 irqs/sec kernel:91.4% exact: 0.0% [1000Hz cycles], (target_pid: 18079) ------------------------------------------------------------------------------------------------------------------ samples pcnt function DSO _______ _____ ________________________ ____________________________________________________________ 13866.00 21.1% __d_lookup [kernel.kallsyms] 8234.00 12.6% _atomic_dec_and_lock [kernel.kallsyms] 5713.00 8.7% dput [kernel.kallsyms] 5657.00 8.6% _spin_lock [kernel.kallsyms] 5165.00 7.9% path_init [kernel.kallsyms] 4101.00 6.3% __link_path_walk [kernel.kallsyms] 3007.00 4.6% xfs_inherit_acl /lib/modules/2.6.32-131.17.1.el6.x86_64/kernel/fs/xfs/xfs.ko 1996.00 3.0% path_walk [kernel.kallsyms] 1424.00 2.2% strncpy_from_user [kernel.kallsyms] 1281.00 2.0% copy_user_generic_string [kernel.kallsyms] 1175.00 1.8% __follow_mount [kernel.kallsyms] 1062.00 1.6% _read_lock [kernel.kallsyms] 851.00 1.3% down_read [kernel.kallsyms] 749.00 1.1% security_inode_getattr [kernel.kallsyms] 709.00 1.1% do_lookup [kernel.kallsyms] 481.00 0.7% lookup_mnt [kernel.kallsyms]
Cannot reproduce in Gluster V3.3 beta qa33 release.