1. Feature Overview:
Feature Id: [68164]
a. Name of Feature: [LTC 5.7 FEAT] Large memory machine spends huge amount of time in sysfs add of
memory nodes (performance/boot)
b. Feature Description
We have noticed very long boot times for PowerPC64 machines with a lot of RAM (> 512GB). The time is
almost entirely in memory_dev_init(). Some durations for that function vs RAM:
0.5TB RAM - 1 minute
1.5TB RAM - 30 minutes
The backtrace looks like:
c000000000248ee0 .__sysfs_add_one+0x28/0x128
c0000000002492a8 .sysfs_add_one+0x38/0x188
c000000000249c88 .create_dir+0x70/0x138
c000000000249d98 .sysfs_create_dir+0x48/0x78
c00000000032bad8 .kobject_add_internal+0x140/0x308
c00000000032beb4 .kobject_init_and_add+0x4c/0x68
c00000000046c2c0 .sysdev_register+0xa0/0x220
c00000000047b1dc .add_memory_block+0x124/0x1e8
c0000000008d1f28 .memory_dev_init+0xf4/0x168
With 1TB RAM we have about 64k memory nodes and the problem is sysfs has an O(n^2) issue with
duplicate entry detection:
int __sysfs_add_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)
{
struct sysfs_inode_attrs *ps_iattr;
if (sysfs_find_dirent(acxt->parent_sd, sd->s_name))
return -EEXIST;
...
struct sysfs_dirent *sysfs_find_dirent(struct sysfs_dirent *parent_sd,
const unsigned char *name)
{
struct sysfs_dirent *sd;
for (sd = parent_sd->s_dir.children; sd; sd = sd->s_sibling)
if (!strcmp(sd->s_name, name))
return sd;
return NULL;
}
So with 64k nodes towards the end we are walking a 64k list and doing a strcmp on each.
2. Feature Details:
Sponsor: Power Virtualization
Architectures: ppc64,
Arch Specificity: both
Affects Kernel Modules: No
Delivery Mechanism: Backport
Category: kernel
Request Type: Package - Update Version
d. Upstream Acceptance: In Progress
Sponsor Priority P3
f. Severity: normal
IBM Confidential: No
Code Contribution: IBM code
g. Component Version Target: ---
h. Package - Version Update
3. Business Case
Customers purchasing large Power systems will experience extremely long boot times without this
patch, which will result in service calls.
4. Primary contact at Red Hat:
John Jarvis, jjarvis
5. Primary contacts at Partner:
Project Management Contact:
Michael W. Wortman, wortman.com
Technical contact(s):
Nathan D. Fontenot, nfonteno.com