Bug 731585 - ext3/ext4 mbcache causes high CPU load [RHEL6]
Summary: ext3/ext4 mbcache causes high CPU load [RHEL6]
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.2
Hardware: All
OS: Linux
unspecified
medium
Target Milestone: rc
: ---
Assignee: Eric Sandeen
QA Contact: Eryu Guan
URL:
Whiteboard:
Depends On: 729261
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-08-17 21:45 UTC by Eric Sandeen
Modified: 2011-12-06 14:05 UTC (History)
5 users (show)

Fixed In Version: kernel-2.6.32-206.el6
Doc Type: Bug Fix
Doc Text:
Clone Of: 729261
Environment:
Last Closed: 2011-12-06 14:05:00 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:1530 normal SHIPPED_LIVE Moderate: Red Hat Enterprise Linux 6 kernel security, bug fix and enhancement update 2011-12-06 01:45:35 UTC

Description Eric Sandeen 2011-08-17 21:45:42 UTC
+++ This bug was initially created as a clone of Bug #729261 +++

Description of problem:

We copied small files on FhGFS file systems and noticed unusual high CPU load of the fhgfs-meta server (6 x E5520  @ 2.27GHz cores saturated). Application and kernel profiling showed that is due to the ext3/ext4 mbcache usage.

CPU: Intel Core/i7, speed 2266.81 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
samples  %        app name                 symbol name
836662   48.7378  vmlinux                  mb_cache_entry_insert
368013   21.4377  vmlinux                  mb_cache_entry_free
103116    6.0068  vmlinux                  mb_cache_entry_release
93262     5.4328  vmlinux                  mb_cache_entry_get
72118     4.2011  vmlinux                  mb_cache_entry_find_first
45358     2.6422  vmlinux                  mb_cache_shrink_fn

The issue was also already exposed in the past by the Lustre file system and the Lustre file system simply added a patch to disable the mbcache:

https://bugzilla.lustre.org/show_bug.cgi?id=22771


Unlike Lustre the FhGFS file system does not need kernel patches at all and so our users depend on upstream kernels. We therefore would like to see either the patch attached to the Lustre bugzilla in RHEL kernel version or at least commit 3a48ee8a4ad26c3a538b6fc11a86a8f80c3dce18 (mbcache: Limit the maximum number of cache entries) landed in upstream linux, which should "At least partially solves https://bugzilla.lustre.org/show_bug.cgi?id=22771".


Version-Release number of selected component (if applicable):


How reproducible:

Easily with FhGFS or Lustre and with an inode size of 128 Bytes of the mete server.

Install FhGFS, the meta server should be on ext3 or ext4 and with enabled XATTR usage. Furthermore, in order to reproduce, the inode size should be limited to 128 Bytes, 


Steps to Reproduce:
1. Install FhGFS
2. meta server on ext3 or ext4 with 128 Bytes inode size, enable XATTR usage
3. Fill the filesystem with small files and introduce a high memory usage (server has at least 12 GB memory).
  
Actual results:

Top will show a high CPU usage of fhgfs-meta and Oprofile profiling will show that is due to mbcache.

Expected results:

Low CPU usage of fhgfs-meta.

Additional info:

--- Additional comment from rwheeler@redhat.com on 2011-08-09 05:46:48 EDT ---

Do you see the same issue in RHEL6?

Thanks!

--- Additional comment from bernd.schubert@itwm.fraunhofer.de on 2011-08-09 07:35:29 EDT ---

I have not tested it yet, but I guess so, as commit 3a48ee8a4ad26c3a538b6fc11a86a8f80c3dce18 is not included in 2.6.32-131.6.1.el6 yet. I will test with that kernel version as soon as possible and then will report results here.

Cheers,
Bernd

--- Additional comment from bernd.schubert@itwm.fraunhofer.de on 2011-08-09 10:56:18 EDT ---

So I can reproduce it with the 2.6.32-131.6.1.el6 kernel. Right now I'm simply using tar 

(cd /mnt/tmp/fhgfs_meta && /root/tar -cf - . --xattrs --sparse) | (cd /mnt/tmp2/fhgfs_meta/ && /root/tar -xpf -)

to copy files from /mnt/tmp, which has 512 Byte ext4 inodes to /mnt/tmp, which only has 128 Byte ext4 inodes.

After about 300,000 inodes "perf top" shows 30% mb_cache_entry_insert() and 24% __mb_cache_entry_find(). While I'm writing it up here, the numbers are uncreasing and now already

------------------------------------------------------------------------------
   PerfTop:   23470 irqs/sec  kernel:92.3% [100000 cycles],  (all, 2 CPUs)
------------------------------------------------------------------------------

             samples    pcnt   kernel function
             _______   _____   _______________

           119181.00 - 37.2% : mb_cache_entry_insert    [mbcache]
            81295.00 - 25.4% : __mb_cache_entry_find    [mbcache]
             3456.00 -  1.1% : __d_lookup
             3364.00 -  1.1% : avc_has_perm_noaudit
             1967.00 -  0.6% : __link_path_walk
             1880.00 -  0.6% : inode_has_perm
             1619.00 -  0.5% : _spin_lock


(with about 470,000 copied files). I guess it will have 80-90% mbcache once tar is almost done (we have about 16,000,000 files on that test file system).

--- Additional comment from bernd.schubert@itwm.fraunhofer.de on 2011-08-09 11:45:48 EDT ---

I now tested with a recent 3.1-git kernel and with that version "perf top" does not show anything related to the mbcache.

--- Additional comment from esandeen@redhat.com on 2011-08-17 17:42:33 EDT ---

Ok, patch backports without trouble.  I thought we'd have kabi issues but I guess mbcache isn't on the whitelist after all.

Comment 1 Eric Sandeen 2011-08-19 23:29:49 UTC
Testing on RHEL6/ext3/128b inodes, I can clearly see this with a modified fs_mark which creates random xattrs on each file:

Count    Files/sec  App Overhead  CREAT (Min/Avg/Max)        XATTR (Min/Avg/Max)
 50000    5140.2       332081      26      146  1367275     11       40      348
100000    2724.9       371951      39      199  1346862     37      157     8252
150000    1706.2       492138      51      243   873915     66      329     8252
200000    1153.0       653330      59      306  1889238     100     543     1438
250000    841.4        748322      78      352   637922     170     816   281106
300000    589.6        83160      109      492   640129     262    1181  1571113
350000    585.6        844656     114      448   644288     293    1236  2079622

<and basically tanks>

With the patch in place, much better:

Count    Files/sec  App Overhead   CREAT (Min/Avg/Max)       XATTR (Min/Avg/Max)
 50000    6023.9       316174      37      144  1088330     11       13      207
100000    4559.8       322382      31      197  1969389     11       13      407
150000    4656.8       317585      38      193  1035780     12       13      219
200000    4960.6       317284      38      180   864101     12       13      247
250000    3250.8       319098      38      285  4462162     12       13      237
300000    3437.7       317023      39      269  4336143     11       13      285
350000    3197.5       317611      39      291  1554244     12       13      249

Comment 3 RHEL Product and Program Management 2011-09-30 22:10:48 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 4 Aristeu Rozanski 2011-10-05 15:32:40 UTC
Patch(es) available on kernel-2.6.32-206.el6

Comment 8 Eryu Guan 2011-10-19 10:11:31 UTC
Reproduced on 2.6.32-131.0.15.el6 kernel
mkfs -t ext4 -I 512 /dev/sda5           # ext4 with 512 inode size
mkfs -t ext4 -I 128 /dev/sda6           # ext4 with 128 inode size
mount -t ext4 -o user_xattr /dev/sda5 /mnt/test1
mount -t ext4 -o user_xattr /dev/sda6 /mnt/test2

# Create 50,000 files with file name as xattr(I used a modified fs_mark)
fs_mark -k -n 50000 -r 16 -d /mnt/test1

# Do the tar workload
(cd /mnt/test1 && tar -cf - . --xattrs --sparse) | (cd /mnt/test2 && tar -xpf -)

# At the same time, on another terminal
perf top

The mb_cache_entry_insert will get 25% on 2.6.32-131.0.15.el6, but only around 1% on 2.6.32-209.el6

Set it to VERIFIED

Comment 9 errata-xmlrpc 2011-12-06 14:05:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2011-1530.html


Note You need to log in before you can comment on or make changes to this bug.