Bug 226997

Summary: udevd read buffer too small.
Product: Red Hat Enterprise Linux 5 Reporter: George Beshers <gbeshers>
Component: udevAssignee: Harald Hoyer <harald>
Status: CLOSED ERRATA QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: 5.0CC: bmr, dchapman, holt, jhrozek, jh, oliver
Target Milestone: ---Keywords: OtherQA
Target Release: ---   
Hardware: ia64   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2007-0404 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-11-07 18:08:09 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 253733    
Attachments:
Description Flags
Patch sent to linux-hotplug-devel@lists.sourceforge.net
none
Patch modifies how /proc/stat is processed.
none
Patch against udev-110 accepted upstream.
none
Patch to use upstream versions of mem_size_mb(), cpu_count(), and running_processes() none

Description George Beshers 2007-02-02 15:40:45 UTC
Description of problem:
During boot, udevd gets started and tries to throttle the number of worker
threads.  This throttling is based upon the number of processes running on
the system which is read from /proc/stat.  With a large number of processors,

--- udevd.c.orig        2006-07-14 10:42:37.740751746 -0500
+++ udevd.c     2006-07-14 10:43:10.397527171 -0500
@@ -306,7 +306,7 @@
 static int running_processes(void)
 {
        int f;
-       static char buf[4096];
+       static char buf[32768];
        int len;
        int running;
        const char *pos;


Additionally, a change to the boot.udev script would make setting the limit on
concurrent processes higher.  Currently, the limit is set to 64 processes with
16 of them running.  With a change to boot.udev to

export UDEVD_MAX_CHILDS = 4096
export UDEVD_MAX_CHILDS_RUNNING=256

The results below are on a 256 cpu machine with 2000 LUNs.  Note that
with both modifications in place the boot time drops by a *factor* of 125.


Version-Release number of selected component (if applicable):
  -- I want to test this with RC1 on the 64p Altix.


How reproducible:
   every boot on large systems


Steps to Reproduce:
1.  Just boot
2.
3.
  
Actual results:
 Without these changes, a 256 cpu machine booting with 2000 LUNs
 attached took 64:53. 

Expected results:
 With the buffer size change, that came down to 11:31.
 With the change to boot.udev that time came down to 0:31.

Additional info:
 The max_childs can be set by either environment variable or a udevcontrol
 command.  The max_childs_running can only be set by environment variable.

Comment 2 Doug Chapman 2007-02-07 21:55:51 UTC
I have tested the above patch on a 64p 1TB HP Superdome and it does indeed speed
up udev dramatically.



Comment 3 Robin Holt 2007-02-08 03:35:50 UTC
I believe setting the buffer to 64k or, better yet, making it malloc the buffer
and repeatedly growing the malloc if the read completely fills the buffer will
cover all the different sizes of Altix systems we are currently shipping.

Comment 4 George Beshers 2007-03-05 22:27:26 UTC
Created attachment 149299 [details]
Patch sent to linux-hotplug-devel.net


The increase of the buffer to 32768 didn't work on a machine with 1024
apparent cpus.	This patch does dynamic allocation so that large systems
will work without further changes.

Comment 5 Marizol Martinez 2007-04-12 15:52:05 UTC
George will post.

Comment 6 RHEL Program Management 2007-04-25 21:41:55 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 7 Marizol Martinez 2007-05-01 22:19:50 UTC
George -- Although you will be posting the fix, reassigning the bug to the
maintainer so he doesn't look track of it. 

Comment 8 George Beshers 2007-05-01 23:49:52 UTC
Created attachment 153906 [details]
Patch modifies how /proc/stat is processed.

The patch has been sent upstream to udev-devel-list.

Comment 9 Phil Knirsch 2007-05-02 12:58:16 UTC
Simple and clear fix, Devel-ACK.

Read ya, Phil

Comment 10 George Beshers 2007-05-07 21:50:25 UTC
Created attachment 154301 [details]
Patch against udev-110 accepted upstream.

Comment 12 Jose Plans 2007-05-10 09:09:41 UTC
No need to be in NEEDINFO for Comment #7.

Comment 14 Marizol Martinez 2007-06-01 20:57:45 UTC
Red Hat Product Management and Engineering have evaluated this request and
currently plan to include it in the next Red Hat Enterprise Linux minor release.
Please note that its inclusion depends upon the successful completion of code
integration and testing.

Comment 15 George Beshers 2007-06-06 14:41:44 UTC
Created attachment 156355 [details]
Patch to use upstream versions of mem_size_mb(), cpu_count(), and running_processes()


I actually created the patch against the 06/06/07 nightly
not as a direct replacement for the earlier patch.

Comment 17 Harald Hoyer 2007-06-13 12:15:00 UTC
You may try:
http://people.redhat.com/harald/downloads/udev/udev-095-14.9.el5/

Comment 19 George Beshers 2007-08-21 19:10:50 UTC
This is fixed.


Comment 20 Oliver Falk 2007-10-11 09:40:32 UTC
Just a short question to you guys. Will this work with a 10000 CPU machine as well?

Comment 21 George Beshers 2007-10-11 12:40:13 UTC
For udev the answer is yes.

The kernel might hit the hugepagesize limit for /proc/cpuinfo and
/proc/stat.  I believe in these cases that the machine will still
boot just fine but udev would "only" use 1500 or so of the 10000
processors.  I have not tried it :).



Comment 22 Oliver Falk 2007-10-11 14:14:11 UTC
I'm thinking - especially - about the LRZ, where since April 2007, there are
running more than 9700 cores. Well, OK in one SSI, only 1024 cores. Can someone
think about 9216 cores in one SSI? Joking... But maybe we should discuss that
outside of bz :-)

Comment 24 errata-xmlrpc 2007-11-07 18:08:09 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0404.html