Bug 651890 - [LTC 5.7 FEAT] Large memory machine spends huge amount of time in sysfs add of memory nodes (performance/boot)
[LTC 5.7 FEAT] Large memory machine spends huge amount of time in sysfs add o...
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
ppc64 All
high Severity high
: beta
: 5.7
Assigned To: Steve Best
Red Hat Kernel QE team
: FutureFeature, OtherQA, Reopened
Depends On:
Blocks: 618260 ibm5.7feat/ibm5.7features 668558
  Show dependency treegraph
Reported: 2010-11-10 10:01 EST by IBM Bug Proxy
Modified: 2011-02-14 10:02 EST (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2011-02-14 10:02:20 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

External Trackers
Tracker ID Priority Status Summary Last Updated
IBM Linux Technology Center 68164 None None None Never

  None (edit)
Description IBM Bug Proxy 2010-11-10 10:01:31 EST
1. Feature Overview:
Feature Id: [68164]
a. Name of Feature: [LTC 5.7 FEAT] Large memory machine spends huge amount of time in sysfs add of
memory nodes (performance/boot)
b. Feature Description
We have noticed very long boot times for PowerPC64 machines with a lot of RAM (> 512GB). The time is
almost entirely in memory_dev_init(). Some durations for that function vs RAM:

0.5TB RAM - 1 minute
1.5TB RAM - 30 minutes

The backtrace looks like:

c000000000248ee0 .__sysfs_add_one+0x28/0x128
c0000000002492a8 .sysfs_add_one+0x38/0x188
c000000000249c88 .create_dir+0x70/0x138
c000000000249d98 .sysfs_create_dir+0x48/0x78
c00000000032bad8 .kobject_add_internal+0x140/0x308
c00000000032beb4 .kobject_init_and_add+0x4c/0x68
c00000000046c2c0 .sysdev_register+0xa0/0x220
c00000000047b1dc .add_memory_block+0x124/0x1e8
c0000000008d1f28 .memory_dev_init+0xf4/0x168

With 1TB RAM we have about 64k memory nodes and the problem is sysfs has an O(n^2) issue with
duplicate entry detection:

int __sysfs_add_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)
        struct sysfs_inode_attrs *ps_iattr;

        if (sysfs_find_dirent(acxt->parent_sd, sd->s_name))
                return -EEXIST;


struct sysfs_dirent *sysfs_find_dirent(struct sysfs_dirent *parent_sd,
                                       const unsigned char *name)
        struct sysfs_dirent *sd;

        for (sd = parent_sd->s_dir.children; sd; sd = sd->s_sibling)
                if (!strcmp(sd->s_name, name))
                        return sd;
        return NULL;

So with 64k nodes towards the end we are walking a 64k list and doing a strcmp on each.

2. Feature Details:
Sponsor: Power Virtualization
Architectures:  ppc64, 

Arch Specificity: both
Affects Kernel Modules: No
Delivery Mechanism: Backport
Category: kernel
Request Type: Package - Update Version
d. Upstream Acceptance: In Progress
Sponsor Priority P3
f. Severity: normal
IBM Confidential: No
Code Contribution: IBM code
g. Component Version Target: ---
h. Package - Version Update

3. Business Case
Customers purchasing large Power systems will experience extremely long boot times without this
patch, which will result in service calls. 

4. Primary contact at Red Hat:
John Jarvis, jjarvis@redhat.com

5. Primary contacts at Partner:
Project Management Contact:
Michael W. Wortman, wortman@us.ibm.com

Technical contact(s):
Nathan D. Fontenot, nfonteno@us.ibm.com
Comment 1 John Jarvis 2010-12-06 15:33:03 EST
IBM is signed up to test and provide feedback, setting OtherQA
Comment 3 CAI Qian 2011-01-26 01:47:02 EST
Is this the patchset for this upstream?

This is not ppc64 specific though.
Comment 6 RHEL Product and Program Management 2011-01-30 21:05:22 EST
Quality Engineering Management has reviewed and declined this request.  You may
appeal this decision by reopening this request.

Note You need to log in before you can comment on or make changes to this bug.