Description of problem: nash doesn't check the pid returned by wait4() in otherCommand(). It _must_ do this - it runs as process #1 (init) during boot and so will gather exit codes for kernel threads and other unparented processes that start up and exit whilst it is running. Version-Release number of selected component (if applicable): How reproducible: Easily. Steps to Reproduce: 1. Check out kernel-2_6_9-5_EL_patchtest_20 from the RHEL-4 kernel CVS. 2. Re-enable patch 99909 3. Build kernel 4. Install and boot. I see the problem on ppc64 pSeries and Power5, but it'll probably happen on more. Actual results: insmods get run asynchronously, so that sd_mod.ko which depends on scsi_mod.ko attempts to load whilst scsi_mod.ko is still running its initialisation function. This results in the kernel complaining about missing symbols and the insmod failing: Loading scsi_mod.ko module Loading sd_mod.ko module Loading ibmvscsic.ko module Loading dm-mod.ko module Loading jbd.ko module Loading ext3.ko module Loading dm-mirror.ko module Loading dm-zero.ko module sd_mod: Unknown symbol scsi_device_get sd_mod: Unknown symbol __scsi_mode_sense sd_mod: Unknown symbol scsi_release_request sd_mod: Unknown symbol scsi_set_medium_removal sd_mod: Unknown symbol scsicam_bios_param sd_mod: Unknown symbol scsi_print_req_sense sd_mod: Unknown symbol scsi_allocate_request sd_mod: Unknown symbol scsi_print_sense sd_mod: Unknown symbol scsi_register_driver sd_mod: Unknown symbol scsi_device_put sd_mod: Unknown symbol scsi_logging_level sd_mod: Unknown symbol scsi_nonblockable_ioctl sd_mod: Unknown symbol scsi_test_unit_ready sd_mod: Unknown symbol scsi_ioctl sd_mod: Unknown symbol scsi_io_completion sd_mod: Unknown symbol scsi_block_when_processing_errors sd_mod: Unknown symbol scsi_wait_req insmod: error inserting '/lib/sd_mod.ko': -1 Unknown symbol in module ERROR: /bin/insmod exited abnormally! Loading dm-snapshot.ko module SCSI subsystem initialized Note that the "SCSI subsystem initialized" message occurs after the attempt to load sd_mod.ko... this message indicates the return of the scsi_mod init function to sys_init_module(). Only after that has happened will scsi_mod be marked live, and only then will it be possible to load a module dependent on it. Expected results: Something like: Loading scsi_mod.ko module SCSI subsystem initialized Loading sd_mod.ko module Loading ibmvscsic.ko module scsi0 : IBM POWER Virtual SCSI Adapter 1.5.3 With no errors from missing symbols. Additional info: The bug can be seen in the code: --> wait4(-1, &status, 0, NULL); if (!WIFEXITED(status) || WEXITSTATUS(status)) { printf("ERROR: %s exited abnormally!\n", args[0]); return 1; } Should be something like: for (;;) { int wpid = wait4(-1, &status, 0, NULL); if (wpid == pid) break; if (pid == -1) goto no_child_error; } Also, nash should probably check occasionally for such zombies in between executing commands.
Created attachment 110013 [details] printk-instrumented module loading
I need this fixed to be able to fix bug 142604 for RHEL4. Although that is only marked for RHEL3, it can occur in RHEL4 too.
Created attachment 110017 [details] Fix kallsyms vs insmod/rmmod race patch99909 - fix the race between insmod/rmmod modifying the module list whilst kallsyms_lookup() is walking it by stopping everything when the list is modified.
Created attachment 110018 [details] Fix nash to handle wait4() returning other pids The attached patch fixes nash to discard wait results for processes other than the one it's interested in.
Thanks for the patch; it's in rawhide now. Does this need to go to a RHEL Update release?
Definitely; hence why I logged it against RHEL4. I need a fixed mkinitrd rpm to be able to give IBM a fixed kernel for bug 142604.
Make that bug 145719 for RHEL4; bug 142604 is the RHEL3 version.
Fixed in U2.