Bug 226997 - udevd read buffer too small.
udevd read buffer too small.
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: udev (Show other bugs)
5.0
ia64 Linux
high Severity high
: ---
: ---
Assigned To: Harald Hoyer
: OtherQA
Depends On:
Blocks: 253733
  Show dependency treegraph
 
Reported: 2007-02-02 10:40 EST by George Beshers
Modified: 2009-06-19 10:20 EDT (History)
6 users (show)

See Also:
Fixed In Version: RHBA-2007-0404
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-11-07 13:08:09 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Patch sent to linux-hotplug-devel@lists.sourceforge.net (1.94 KB, patch)
2007-03-05 17:27 EST, George Beshers
no flags Details | Diff
Patch modifies how /proc/stat is processed. (2.29 KB, patch)
2007-05-01 19:49 EDT, George Beshers
no flags Details | Diff
Patch against udev-110 accepted upstream. (2.33 KB, patch)
2007-05-07 17:50 EDT, George Beshers
no flags Details | Diff
Patch to use upstream versions of mem_size_mb(), cpu_count(), and running_processes() (2.74 KB, patch)
2007-06-06 10:41 EDT, George Beshers
no flags Details | Diff

  None (edit)
Description George Beshers 2007-02-02 10:40:45 EST
Description of problem:
During boot, udevd gets started and tries to throttle the number of worker
threads.  This throttling is based upon the number of processes running on
the system which is read from /proc/stat.  With a large number of processors,

--- udevd.c.orig        2006-07-14 10:42:37.740751746 -0500
+++ udevd.c     2006-07-14 10:43:10.397527171 -0500
@@ -306,7 +306,7 @@
 static int running_processes(void)
 {
        int f;
-       static char buf[4096];
+       static char buf[32768];
        int len;
        int running;
        const char *pos;


Additionally, a change to the boot.udev script would make setting the limit on
concurrent processes higher.  Currently, the limit is set to 64 processes with
16 of them running.  With a change to boot.udev to

export UDEVD_MAX_CHILDS = 4096
export UDEVD_MAX_CHILDS_RUNNING=256

The results below are on a 256 cpu machine with 2000 LUNs.  Note that
with both modifications in place the boot time drops by a *factor* of 125.


Version-Release number of selected component (if applicable):
  -- I want to test this with RC1 on the 64p Altix.


How reproducible:
   every boot on large systems


Steps to Reproduce:
1.  Just boot
2.
3.
  
Actual results:
 Without these changes, a 256 cpu machine booting with 2000 LUNs
 attached took 64:53. 

Expected results:
 With the buffer size change, that came down to 11:31.
 With the change to boot.udev that time came down to 0:31.

Additional info:
 The max_childs can be set by either environment variable or a udevcontrol
 command.  The max_childs_running can only be set by environment variable.
Comment 2 Doug Chapman 2007-02-07 16:55:51 EST
I have tested the above patch on a 64p 1TB HP Superdome and it does indeed speed
up udev dramatically.

Comment 3 Robin Holt 2007-02-07 22:35:50 EST
I believe setting the buffer to 64k or, better yet, making it malloc the buffer
and repeatedly growing the malloc if the read completely fills the buffer will
cover all the different sizes of Altix systems we are currently shipping.
Comment 4 George Beshers 2007-03-05 17:27:26 EST
Created attachment 149299 [details]
Patch sent to linux-hotplug-devel@lists.sourceforge.net


The increase of the buffer to 32768 didn't work on a machine with 1024
apparent cpus.	This patch does dynamic allocation so that large systems
will work without further changes.
Comment 5 Marizol Martinez 2007-04-12 11:52:05 EDT
George will post.
Comment 6 RHEL Product and Program Management 2007-04-25 17:41:55 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 7 Marizol Martinez 2007-05-01 18:19:50 EDT
George -- Although you will be posting the fix, reassigning the bug to the
maintainer so he doesn't look track of it. 
Comment 8 George Beshers 2007-05-01 19:49:52 EDT
Created attachment 153906 [details]
Patch modifies how /proc/stat is processed.

The patch has been sent upstream to udev-devel-list.
Comment 9 Phil Knirsch 2007-05-02 08:58:16 EDT
Simple and clear fix, Devel-ACK.

Read ya, Phil
Comment 10 George Beshers 2007-05-07 17:50:25 EDT
Created attachment 154301 [details]
Patch against udev-110 accepted upstream.
Comment 12 Jose Plans 2007-05-10 05:09:41 EDT
No need to be in NEEDINFO for Comment #7.
Comment 14 Marizol Martinez 2007-06-01 16:57:45 EDT
Red Hat Product Management and Engineering have evaluated this request and
currently plan to include it in the next Red Hat Enterprise Linux minor release.
Please note that its inclusion depends upon the successful completion of code
integration and testing.
Comment 15 George Beshers 2007-06-06 10:41:44 EDT
Created attachment 156355 [details]
Patch to use upstream versions of mem_size_mb(), cpu_count(), and running_processes()


I actually created the patch against the 06/06/07 nightly
not as a direct replacement for the earlier patch.
Comment 17 Harald Hoyer 2007-06-13 08:15:00 EDT
You may try:
http://people.redhat.com/harald/downloads/udev/udev-095-14.9.el5/
Comment 19 George Beshers 2007-08-21 15:10:50 EDT
This is fixed.
Comment 20 Oliver Falk 2007-10-11 05:40:32 EDT
Just a short question to you guys. Will this work with a 10000 CPU machine as well?
Comment 21 George Beshers 2007-10-11 08:40:13 EDT
For udev the answer is yes.

The kernel might hit the hugepagesize limit for /proc/cpuinfo and
/proc/stat.  I believe in these cases that the machine will still
boot just fine but udev would "only" use 1500 or so of the 10000
processors.  I have not tried it :).

Comment 22 Oliver Falk 2007-10-11 10:14:11 EDT
I'm thinking - especially - about the LRZ, where since April 2007, there are
running more than 9700 cores. Well, OK in one SSI, only 1024 cores. Can someone
think about 9216 cores in one SSI? Joking... But maybe we should discuss that
outside of bz :-)
Comment 24 errata-xmlrpc 2007-11-07 13:08:09 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0404.html

Note You need to log in before you can comment on or make changes to this bug.