Description of problem: The tool condor_gridmanager stops with segmentation fault when it is run manually from command line. Version-Release number of selected component (if applicable): condor-debuginfo-7.4.4-0.9 python-condorutils-1.4-5 condor-wallaby-base-db-1.3-2 condor-wallaby-client-3.4-1 condor-7.4.4-0.9 condor-wallaby-tools-3.4-1 How reproducible: 100% Steps to Reproduce: 1. run 'condor_gridmanager (-h)' Actual results: Segmentation fault Expected results: No segmentation fault Additional info:
Analyse of core file: Core was generated by `condor_gridmanager'. Program terminated with signal 11, Segmentation fault. #0 DC_Exit (status=1, shutdown_program=0x0) at daemon_core_main.cpp:270 270 delete daemonCore; (gdb) info threads * 1 Thread 12984 DC_Exit (status=1, shutdown_program=0x0) at daemon_core_main.cpp:270 (gdb) bt #0 DC_Exit (status=1, shutdown_program=0x0) at daemon_core_main.cpp:270 #1 0x00000000004f7430 in main (argc=1, argv=0x7fffec31c778) at daemon_core_main.cpp:1574
condor_gridmanager should probably be in libexec. This happens when run by root or as a user when passed only -o. Program received signal SIGSEGV, Segmentation fault. DC_Exit (status=1, shutdown_program=0x0) at daemon_core_main.cpp:270 270 delete daemonCore; (gdb) where #0 DC_Exit (status=1, shutdown_program=0x0) at daemon_core_main.cpp:270 #1 0x00000000004f7430 in main (argc=1, argv=0x7fff9ed1f258) at daemon_core_main.cpp:1574 gridmanager_main.cpp: void main_pre_dc_init( int argc, char* argv[] ) { ... } else if ( is_root() ) { dprintf( D_ALWAYS, "Don't know what user to run as!\n" ); DC_Exit( 1 ); ... Problem is DC_Exit in pre_dc_init tries to delete dc (daemonCore), which is NULL. Either DC_Exit should be more careful, or DC_Exit from pre_dc_init should be illegal. The case where -o is passed comes from pre_dc_init calling usage calling DC_Exit.
Created attachment 445965 [details] strace strace log from the run
Fixed upstream for 7.5.6 -- Author: Matthew Farrellee <matt@redhat> Added NULL detection around "delete daemonCore" in DC_Exit The issue was discovered when running condor_gridmanager from the command line. The gridmanager can call DC_Exit from within main_pre_dc_init, which is by definition before the global daemonCore instance is allocated. DC_Exit would blindly attempt to delete a NULL daemonCore. An alternative fix was to prevent the gridmanager from calling DC_Exit within main_pre_dc_init, but code already in DC_Exit tested for daemonCore == NULL, making it appear that it should handle all cases where daemonCore may be null. diff --git a/src/condor_daemon_core.V6/daemon_core_main.cpp b/src/condor_daemon_core.V6/daemon_core_main.cpp index 1301cbc..1821d6e 100644 --- a/src/condor_daemon_core.V6/daemon_core_main.cpp +++ b/src/condor_daemon_core.V6/daemon_core_main.cpp @@ -280,9 +280,12 @@ DC_Exit( int status, const char *shutdown_program ) #endif /* ! WIN32 */ // Now, delete the daemonCore object, since we allocated it. - unsigned long pid = daemonCore->getpid( ); - delete daemonCore; - daemonCore = NULL; + unsigned long pid = 0; + if (daemonCore) { + pid = daemonCore->getpid( ); + delete daemonCore; + daemonCore = NULL; + } // Free up the memory from the config hash table, too. clear_config();
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: C: condor_gridmanager deleted uninitialized memory when run as root or when passed -o. This is not a user concer, because condor_gridmanager is not intended to be run directly, from root or not, and is run properly when invoked from Condor. C: No significant consequence, because condor_gridmanager is invoked properly by Condor itself. F: Checks were put in place to avoid the improper delete. R: All is well.
Reproduced on RHEL5/x86_64 with: $CondorVersion: 7.4.5 Feb 4 2011 BuildID: RH-7.4.5-0.8.el5 PRE-RELEASE $ $CondorPlatform: X86_64-LINUX_RHEL5 $ # condor_gridmanager Segmentation fault
Retested over supported platforms x86,x86_64/RHEL5,RHEL6 with: condor-7.6.1-0.4 # condor_gridmanager # echo $? 1 No core file created. No crash found. >>> VERIFIED
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2011-0889.html