Bug 651890 - [LTC 5.7 FEAT] Large memory machine spends huge amount of time in sysfs add of memory nodes (performance/boot)
Summary: [LTC 5.7 FEAT] Large memory machine spends huge amount of time in sysfs add o...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.7
Hardware: ppc64
OS: All
high
high
Target Milestone: beta
: 5.7
Assignee: Steve Best
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks: ibm5.7feat, ibm5.7features 618260 668558
TreeView+ depends on / blocked
 
Reported: 2010-11-10 15:01 UTC by IBM Bug Proxy
Modified: 2011-02-14 15:02 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-02-14 15:02:20 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
IBM Linux Technology Center 68164 0 None None None Never

Description IBM Bug Proxy 2010-11-10 15:01:31 UTC
1. Feature Overview:
Feature Id: [68164]
a. Name of Feature: [LTC 5.7 FEAT] Large memory machine spends huge amount of time in sysfs add of
memory nodes (performance/boot)
b. Feature Description
We have noticed very long boot times for PowerPC64 machines with a lot of RAM (> 512GB). The time is
almost entirely in memory_dev_init(). Some durations for that function vs RAM:

0.5TB RAM - 1 minute
1.5TB RAM - 30 minutes

The backtrace looks like:

c000000000248ee0 .__sysfs_add_one+0x28/0x128
c0000000002492a8 .sysfs_add_one+0x38/0x188
c000000000249c88 .create_dir+0x70/0x138
c000000000249d98 .sysfs_create_dir+0x48/0x78
c00000000032bad8 .kobject_add_internal+0x140/0x308
c00000000032beb4 .kobject_init_and_add+0x4c/0x68
c00000000046c2c0 .sysdev_register+0xa0/0x220
c00000000047b1dc .add_memory_block+0x124/0x1e8
c0000000008d1f28 .memory_dev_init+0xf4/0x168

With 1TB RAM we have about 64k memory nodes and the problem is sysfs has an O(n^2) issue with
duplicate entry detection:

int __sysfs_add_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)
{
        struct sysfs_inode_attrs *ps_iattr;

        if (sysfs_find_dirent(acxt->parent_sd, sd->s_name))
                return -EEXIST;

...

struct sysfs_dirent *sysfs_find_dirent(struct sysfs_dirent *parent_sd,
                                       const unsigned char *name)
{
        struct sysfs_dirent *sd;

        for (sd = parent_sd->s_dir.children; sd; sd = sd->s_sibling)
                if (!strcmp(sd->s_name, name))
                        return sd;
        return NULL;
}

So with 64k nodes towards the end we are walking a 64k list and doing a strcmp on each.


2. Feature Details:
Sponsor: Power Virtualization
Architectures:  ppc64, 

Arch Specificity: both
Affects Kernel Modules: No
Delivery Mechanism: Backport
Category: kernel
Request Type: Package - Update Version
d. Upstream Acceptance: In Progress
Sponsor Priority P3
f. Severity: normal
IBM Confidential: No
Code Contribution: IBM code
g. Component Version Target: ---
h. Package - Version Update

3. Business Case
Customers purchasing large Power systems will experience extremely long boot times without this
patch, which will result in service calls. 

4. Primary contact at Red Hat:
John Jarvis, jjarvis

5. Primary contacts at Partner:
Project Management Contact:
Michael W. Wortman, wortman.com

Technical contact(s):
Nathan D. Fontenot, nfonteno.com

Comment 1 John Jarvis 2010-12-06 20:33:03 UTC
IBM is signed up to test and provide feedback, setting OtherQA

Comment 3 Qian Cai 2011-01-26 06:47:02 UTC
Is this the patchset for this upstream?
http://marc.info/?l=linux-mm&m=129554141716331&w=2

This is not ppc64 specific though.

Comment 6 RHEL Program Management 2011-01-31 02:05:22 UTC
Quality Engineering Management has reviewed and declined this request.  You may
appeal this decision by reopening this request.


Note You need to log in before you can comment on or make changes to this bug.