Description of problem:
During boot, udevd gets started and tries to throttle the number of worker
threads. This throttling is based upon the number of processes running on
the system which is read from /proc/stat. With a large number of processors,
--- udevd.c.orig 2006-07-14 10:42:37.740751746 -0500
+++ udevd.c 2006-07-14 10:43:10.397527171 -0500
@@ -306,7 +306,7 @@
static int running_processes(void)
- static char buf;
+ static char buf;
const char *pos;
Additionally, a change to the boot.udev script would make setting the limit on
concurrent processes higher. Currently, the limit is set to 64 processes with
16 of them running. With a change to boot.udev to
export UDEVD_MAX_CHILDS = 4096
The results below are on a 256 cpu machine with 2000 LUNs. Note that
with both modifications in place the boot time drops by a *factor* of 125.
Version-Release number of selected component (if applicable):
-- I want to test this with RC1 on the 64p Altix.
every boot on large systems
Steps to Reproduce:
1. Just boot
Without these changes, a 256 cpu machine booting with 2000 LUNs
attached took 64:53.
With the buffer size change, that came down to 11:31.
With the change to boot.udev that time came down to 0:31.
The max_childs can be set by either environment variable or a udevcontrol
command. The max_childs_running can only be set by environment variable.
I have tested the above patch on a 64p 1TB HP Superdome and it does indeed speed
up udev dramatically.
I believe setting the buffer to 64k or, better yet, making it malloc the buffer
and repeatedly growing the malloc if the read completely fills the buffer will
cover all the different sizes of Altix systems we are currently shipping.
Created attachment 149299 [details]
Patch sent to firstname.lastname@example.org
The increase of the buffer to 32768 didn't work on a machine with 1024
apparent cpus. This patch does dynamic allocation so that large systems
will work without further changes.
George will post.
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release. Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products. This request is not yet committed for inclusion in an Update
George -- Although you will be posting the fix, reassigning the bug to the
maintainer so he doesn't look track of it.
Created attachment 153906 [details]
Patch modifies how /proc/stat is processed.
The patch has been sent upstream to udev-devel-list.
Simple and clear fix, Devel-ACK.
Read ya, Phil
Created attachment 154301 [details]
Patch against udev-110 accepted upstream.
No need to be in NEEDINFO for Comment #7.
Red Hat Product Management and Engineering have evaluated this request and
currently plan to include it in the next Red Hat Enterprise Linux minor release.
Please note that its inclusion depends upon the successful completion of code
integration and testing.
Created attachment 156355 [details]
Patch to use upstream versions of mem_size_mb(), cpu_count(), and running_processes()
I actually created the patch against the 06/06/07 nightly
not as a direct replacement for the earlier patch.
You may try:
This is fixed.
Just a short question to you guys. Will this work with a 10000 CPU machine as well?
For udev the answer is yes.
The kernel might hit the hugepagesize limit for /proc/cpuinfo and
/proc/stat. I believe in these cases that the machine will still
boot just fine but udev would "only" use 1500 or so of the 10000
processors. I have not tried it :).
I'm thinking - especially - about the LRZ, where since April 2007, there are
running more than 9700 cores. Well, OK in one SSI, only 1024 cores. Can someone
think about 9216 cores in one SSI? Joking... But maybe we should discuss that
outside of bz :-)
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.