Bug 145660 - nash _must_ check pid returned by wait*()
Summary: nash _must_ check pid returned by wait*()
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: mkinitrd
Version: 4.0
Hardware: All
OS: Linux
medium
high
Target Milestone: ---
: ---
Assignee: Peter Jones
QA Contact: David Lawrence
URL:
Whiteboard:
Depends On:
Blocks: 145719
TreeView+ depends on / blocked
 
Reported: 2005-01-20 15:22 UTC by David Howells
Modified: 2007-11-30 22:07 UTC (History)
2 users (show)

Fixed In Version: 4.2.1.6-1
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-11-03 16:36:36 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
printk-instrumented module loading (2.82 KB, text/plain)
2005-01-20 15:22 UTC, David Howells
no flags Details
Fix kallsyms vs insmod/rmmod race (3.33 KB, patch)
2005-01-20 16:01 UTC, David Howells
no flags Details | Diff
Fix nash to handle wait4() returning other pids (958 bytes, patch)
2005-01-20 16:15 UTC, David Howells
no flags Details | Diff

Description David Howells 2005-01-20 15:22:58 UTC
Description of problem: 
 
nash doesn't check the pid returned by wait4() in otherCommand(). It _must_ do 
this - it runs as process #1 (init) during boot and so will gather exit codes 
for kernel threads and other unparented processes that start up and exit 
whilst it is running. 
 
Version-Release number of selected component (if applicable): 
 
 
How reproducible: 
 
Easily. 
 
Steps to Reproduce: 
1. Check out kernel-2_6_9-5_EL_patchtest_20 from the RHEL-4 kernel CVS. 
2. Re-enable patch 99909 
3. Build kernel 
4. Install and boot. I see the problem on ppc64 pSeries and Power5, but it'll 
probably happen on more. 
   
Actual results: 
 
insmods get run asynchronously, so that sd_mod.ko which depends on scsi_mod.ko 
attempts to load whilst scsi_mod.ko is still running its initialisation 
function. This results in the kernel complaining about missing symbols and the 
insmod failing: 
 
Loading scsi_mod.ko module 
Loading sd_mod.ko module 
Loading ibmvscsic.ko module 
Loading dm-mod.ko module 
Loading jbd.ko module 
Loading ext3.ko module 
Loading dm-mirror.ko module 
Loading dm-zero.ko module 
sd_mod: Unknown symbol scsi_device_get 
sd_mod: Unknown symbol __scsi_mode_sense 
sd_mod: Unknown symbol scsi_release_request 
sd_mod: Unknown symbol scsi_set_medium_removal 
sd_mod: Unknown symbol scsicam_bios_param 
sd_mod: Unknown symbol scsi_print_req_sense 
sd_mod: Unknown symbol scsi_allocate_request 
sd_mod: Unknown symbol scsi_print_sense 
sd_mod: Unknown symbol scsi_register_driver 
sd_mod: Unknown symbol scsi_device_put 
sd_mod: Unknown symbol scsi_logging_level 
sd_mod: Unknown symbol scsi_nonblockable_ioctl 
sd_mod: Unknown symbol scsi_test_unit_ready 
sd_mod: Unknown symbol scsi_ioctl 
sd_mod: Unknown symbol scsi_io_completion 
sd_mod: Unknown symbol scsi_block_when_processing_errors 
sd_mod: Unknown symbol scsi_wait_req 
insmod: error inserting '/lib/sd_mod.ko': -1 Unknown symbol in module 
ERROR: /bin/insmod exited abnormally! 
Loading dm-snapshot.ko module 
SCSI subsystem initialized 
 
Note that the "SCSI subsystem initialized" message occurs after the attempt to 
load sd_mod.ko... this message indicates the return of the scsi_mod init 
function to sys_init_module(). Only after that has happened will scsi_mod be 
marked live, and only then will it be possible to load a module dependent on 
it. 
 
Expected results: 
 
Something like: 
 
Loading scsi_mod.ko module 
SCSI subsystem initialized 
Loading sd_mod.ko module 
Loading ibmvscsic.ko module 
scsi0 : IBM POWER Virtual SCSI Adapter 1.5.3 
 
With no errors from missing symbols. 
 
Additional info: 
 
The bug can be seen in the code: 
 
-->     wait4(-1, &status, 0, NULL); 
        if (!WIFEXITED(status) || WEXITSTATUS(status)) { 
                printf("ERROR: %s exited abnormally!\n", args[0]); 
                return 1; 
        } 
 
Should be something like: 
 
        for (;;) { 
                int wpid = wait4(-1, &status, 0, NULL); 
                if (wpid == pid) 
                         break; 
                if (pid == -1) 
                         goto no_child_error; 
        } 
 
Also, nash should probably check occasionally for such zombies in between 
executing commands.

Comment 1 David Howells 2005-01-20 15:22:59 UTC
Created attachment 110013 [details]
printk-instrumented module loading

Comment 2 David Howells 2005-01-20 15:25:37 UTC
I need this fixed to be able to fix bug 142604 for RHEL4. Although that is 
only marked for RHEL3, it can occur in RHEL4 too. 

Comment 3 David Howells 2005-01-20 16:01:04 UTC
Created attachment 110017 [details]
Fix kallsyms vs insmod/rmmod race

patch99909 - fix the race between insmod/rmmod modifying the module list whilst
kallsyms_lookup() is walking it by stopping everything when the list is
modified.

Comment 4 David Howells 2005-01-20 16:15:30 UTC
Created attachment 110018 [details]
Fix nash to handle wait4() returning other pids

The attached patch fixes nash to discard wait results for processes other than
the one it's interested in.

Comment 5 Peter Jones 2005-01-21 23:28:31 UTC
Thanks for the patch; it's in rawhide now.  Does this need to go to a RHEL
Update release?

Comment 6 David Howells 2005-01-24 12:50:12 UTC
Definitely; hence why I logged it against RHEL4. I need a fixed mkinitrd rpm 
to be able to give IBM a fixed kernel for bug 142604. 

Comment 7 David Howells 2005-02-11 11:42:44 UTC
Make that bug 145719 for RHEL4; bug 142604 is the RHEL3 version. 

Comment 9 Peter Jones 2005-11-03 16:36:36 UTC
Fixed in U2.


Note You need to log in before you can comment on or make changes to this bug.