Bug 145660 - nash _must_ check pid returned by wait*()
nash _must_ check pid returned by wait*()
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: mkinitrd (Show other bugs)
4.0
All Linux
medium Severity high
: ---
: ---
Assigned To: Peter Jones
David Lawrence
:
Depends On:
Blocks: 145719
  Show dependency treegraph
 
Reported: 2005-01-20 10:22 EST by David Howells
Modified: 2007-11-30 17:07 EST (History)
2 users (show)

See Also:
Fixed In Version: 4.2.1.6-1
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-11-03 11:36:36 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
printk-instrumented module loading (2.82 KB, text/plain)
2005-01-20 10:22 EST, David Howells
no flags Details
Fix kallsyms vs insmod/rmmod race (3.33 KB, patch)
2005-01-20 11:01 EST, David Howells
no flags Details | Diff
Fix nash to handle wait4() returning other pids (958 bytes, patch)
2005-01-20 11:15 EST, David Howells
no flags Details | Diff

  None (edit)
Description David Howells 2005-01-20 10:22:58 EST
Description of problem: 
 
nash doesn't check the pid returned by wait4() in otherCommand(). It _must_ do 
this - it runs as process #1 (init) during boot and so will gather exit codes 
for kernel threads and other unparented processes that start up and exit 
whilst it is running. 
 
Version-Release number of selected component (if applicable): 
 
 
How reproducible: 
 
Easily. 
 
Steps to Reproduce: 
1. Check out kernel-2_6_9-5_EL_patchtest_20 from the RHEL-4 kernel CVS. 
2. Re-enable patch 99909 
3. Build kernel 
4. Install and boot. I see the problem on ppc64 pSeries and Power5, but it'll 
probably happen on more. 
   
Actual results: 
 
insmods get run asynchronously, so that sd_mod.ko which depends on scsi_mod.ko 
attempts to load whilst scsi_mod.ko is still running its initialisation 
function. This results in the kernel complaining about missing symbols and the 
insmod failing: 
 
Loading scsi_mod.ko module 
Loading sd_mod.ko module 
Loading ibmvscsic.ko module 
Loading dm-mod.ko module 
Loading jbd.ko module 
Loading ext3.ko module 
Loading dm-mirror.ko module 
Loading dm-zero.ko module 
sd_mod: Unknown symbol scsi_device_get 
sd_mod: Unknown symbol __scsi_mode_sense 
sd_mod: Unknown symbol scsi_release_request 
sd_mod: Unknown symbol scsi_set_medium_removal 
sd_mod: Unknown symbol scsicam_bios_param 
sd_mod: Unknown symbol scsi_print_req_sense 
sd_mod: Unknown symbol scsi_allocate_request 
sd_mod: Unknown symbol scsi_print_sense 
sd_mod: Unknown symbol scsi_register_driver 
sd_mod: Unknown symbol scsi_device_put 
sd_mod: Unknown symbol scsi_logging_level 
sd_mod: Unknown symbol scsi_nonblockable_ioctl 
sd_mod: Unknown symbol scsi_test_unit_ready 
sd_mod: Unknown symbol scsi_ioctl 
sd_mod: Unknown symbol scsi_io_completion 
sd_mod: Unknown symbol scsi_block_when_processing_errors 
sd_mod: Unknown symbol scsi_wait_req 
insmod: error inserting '/lib/sd_mod.ko': -1 Unknown symbol in module 
ERROR: /bin/insmod exited abnormally! 
Loading dm-snapshot.ko module 
SCSI subsystem initialized 
 
Note that the "SCSI subsystem initialized" message occurs after the attempt to 
load sd_mod.ko... this message indicates the return of the scsi_mod init 
function to sys_init_module(). Only after that has happened will scsi_mod be 
marked live, and only then will it be possible to load a module dependent on 
it. 
 
Expected results: 
 
Something like: 
 
Loading scsi_mod.ko module 
SCSI subsystem initialized 
Loading sd_mod.ko module 
Loading ibmvscsic.ko module 
scsi0 : IBM POWER Virtual SCSI Adapter 1.5.3 
 
With no errors from missing symbols. 
 
Additional info: 
 
The bug can be seen in the code: 
 
-->     wait4(-1, &status, 0, NULL); 
        if (!WIFEXITED(status) || WEXITSTATUS(status)) { 
                printf("ERROR: %s exited abnormally!\n", args[0]); 
                return 1; 
        } 
 
Should be something like: 
 
        for (;;) { 
                int wpid = wait4(-1, &status, 0, NULL); 
                if (wpid == pid) 
                         break; 
                if (pid == -1) 
                         goto no_child_error; 
        } 
 
Also, nash should probably check occasionally for such zombies in between 
executing commands.
Comment 1 David Howells 2005-01-20 10:22:59 EST
Created attachment 110013 [details]
printk-instrumented module loading
Comment 2 David Howells 2005-01-20 10:25:37 EST
I need this fixed to be able to fix bug 142604 for RHEL4. Although that is 
only marked for RHEL3, it can occur in RHEL4 too. 
Comment 3 David Howells 2005-01-20 11:01:04 EST
Created attachment 110017 [details]
Fix kallsyms vs insmod/rmmod race

patch99909 - fix the race between insmod/rmmod modifying the module list whilst
kallsyms_lookup() is walking it by stopping everything when the list is
modified.
Comment 4 David Howells 2005-01-20 11:15:30 EST
Created attachment 110018 [details]
Fix nash to handle wait4() returning other pids

The attached patch fixes nash to discard wait results for processes other than
the one it's interested in.
Comment 5 Peter Jones 2005-01-21 18:28:31 EST
Thanks for the patch; it's in rawhide now.  Does this need to go to a RHEL
Update release?
Comment 6 David Howells 2005-01-24 07:50:12 EST
Definitely; hence why I logged it against RHEL4. I need a fixed mkinitrd rpm 
to be able to give IBM a fixed kernel for bug 142604. 
Comment 7 David Howells 2005-02-11 06:42:44 EST
Make that bug 145719 for RHEL4; bug 142604 is the RHEL3 version. 
Comment 9 Peter Jones 2005-11-03 11:36:36 EST
Fixed in U2.

Note You need to log in before you can comment on or make changes to this bug.