Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 698481

Summary: x3950 X5 hangs at udev for 3 hours
Product: Red Hat Enterprise MRG Reporter: IBM Bug Proxy <bugproxy>
Component: realtime-kernelAssignee: Clark Williams <williams>
Status: CLOSED ERRATA QA Contact: David Sommerseth <davids>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 1.3CC: bhu, lgoncalv, ovasik, williams
Target Milestone: 2.1.5   
Target Release: ---   
Hardware: x86_64   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
* Cause: some udev rules added by MRG Realtime were being called at a much higher frequency than anticipated * Consequence: inordinately long boot times for multi-core (>=16 cores) systems * Fix: remove fork+execs from realtime udev rules * Result: multi-core systems boot times are in line with RHEL boot times.
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-04-18 19:33:31 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
kernel boot log with udev debug output none

Description IBM Bug Proxy 2011-04-21 00:41:06 UTC
When booting an x3950 with the MRG 1.3 kernel (2.6.33.7-rt29.55.el5rt), I am seeing that it gets 
stuck at 'Starting udev' for about 3 hours.  The RHEL 5.6 kernel that I installed the system with 
seems to boot the system fine (maybe less than a minute at udev).  

x3950 X5 is a 2-node x3850 X5 system.  When booting as a single-node setup, the MRG 1.3 
kernel (2.6.33.7-rt29.55.el5rt) boots fine and udev starts up (with a long delay), but nowhere 
near 3 hours.

Because of the two-node/one-node discrepancy, I tried with maxcpus=32 (this is a 64-cpu 
system with HT disabled) but it seemed to hang for more time than I wanted to wait for.

Next I tried hacking the /etc/rc.sysinit script where it kicks off udev_start.  I used launched 
udev_start via taskset, pinning it to CPU1.  This didn't go well, but the system booted quickly.  
By not going well, I mean that udev had insufficient resources to correctly process the hotplug 
requests from the bnx2 driver and failed to load the firmware.  I bumped this up to 'taskset -c 
0,1,2,3,4,5,6,7' to bind it to the first 8 CPUs.  This seemed to do the trick.  The system booted 
quickly and udev seemed to have what it needed to load the bnx2 firmware.

This is obviously a hack, but I wanted to get the information out there.  I have also tried booting 
with udevdebug on the kernel command line, but it spews out so much information, nearly all of 
it incomprehensible to me.  So if there is someone who could help me decipher that information, 
I could post it here.

Comment 1 IBM Bug Proxy 2011-05-09 16:10:23 UTC
------- Comment From vernux.com 2011-05-09 12:01 EDT-------
Some other things that I have tested:

1. booting the MRG 1.3 kernel in a RHEL6 installation.  This did not get hung up on udev, but it did print warnings about sysfs deprecated stuff.

2. reducing the number of udev rules it had to parse.  This had no effect at all.

Comment 2 IBM Bug Proxy 2011-05-11 20:20:31 UTC
Created attachment 498391 [details]
kernel boot log with udev debug output


------- Comment on attachment From vernux.com 2011-05-11 16:18 EDT-------


I booted with udevdebug on the kernel command line and this is what I got.

Comment 3 Clark Williams 2011-05-11 22:29:50 UTC
Luis suggested that the MRG firmware download rules were too heavyweight (2 * fork+exec for each rule invocation). Stripping out the "detect RT" logic and assuming that it was ok to download firmware shortened the boot delay from 3 hours to 2 minutes. 

We'll update rt-setup with new udev rules for RHEL5, which will look like this:

#
# MRG Realtime firmware download rules
# This set of udev rules will override the stock RHEL5 udev
# rules for loading firmware
SUBSYSTEM=="firmware", ACTION=="add", RUN+="/sbin/mrg-rt-firmware.sh",  \ 
    OPTIONS="last_rule"

Comment 4 IBM Bug Proxy 2011-05-17 22:30:33 UTC
------- Comment From vernux.com 2011-05-17 18:22 EDT-------
Clark,

When can we expect a new rt-setup package with this fix included?

Comment 8 Clark Williams 2012-04-09 21:15:22 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
* Cause: some udev rules added by MRG Realtime were being called at a much higher frequency than anticipated
* Consequence: inordinately long boot times for multi-core (>=16 cores) systems
* Fix: remove fork+execs from realtime udev rules
* Result: multi-core systems boot times are in line with RHEL boot times.

Comment 9 David Sommerseth 2012-04-11 10:32:32 UTC
Booting kernel-rt-3.0.25-rt44.57.el6rt.x86_64 (MRG 2.1) seems to work
fine without any long delaying udev issues.

-> VERIFIED

Comment 10 errata-xmlrpc 2012-04-18 19:33:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-0495.html