Bug 631782 - condor_gridmanager segmentation fault when run from command line
Summary: condor_gridmanager segmentation fault when run from command line
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor
Version: 1.0
Hardware: All
OS: Linux
low
medium
Target Milestone: 2.0
: ---
Assignee: Matthew Farrellee
QA Contact: Tomas Rusnak
URL:
Whiteboard:
Depends On:
Blocks: 693778
TreeView+ depends on / blocked
 
Reported: 2010-09-08 11:36 UTC by Lubos Trilety
Modified: 2011-06-23 15:41 UTC (History)
4 users (show)

Fixed In Version: condor-7.5.6-0.1
Doc Type: Bug Fix
Doc Text:
C: condor_gridmanager deleted uninitialized memory when run as root or when passed -o. This is not a user concer, because condor_gridmanager is not intended to be run directly, from root or not, and is run properly when invoked from Condor. C: No significant consequence, because condor_gridmanager is invoked properly by Condor itself. F: Checks were put in place to avoid the improper delete. R: All is well.
Clone Of:
Environment:
Last Closed: 2011-06-23 15:41:13 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
strace (29.32 KB, application/octet-stream)
2010-09-08 12:11 UTC, Lubos Trilety
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2011:0889 0 normal SHIPPED_LIVE Red Hat Enterprise MRG Grid 2.0 Release 2011-06-23 15:35:53 UTC

Description Lubos Trilety 2010-09-08 11:36:14 UTC
Description of problem:
The tool condor_gridmanager stops with segmentation fault when it is run manually from command line.

Version-Release number of selected component (if applicable):
condor-debuginfo-7.4.4-0.9
python-condorutils-1.4-5
condor-wallaby-base-db-1.3-2
condor-wallaby-client-3.4-1
condor-7.4.4-0.9
condor-wallaby-tools-3.4-1

How reproducible:
100%

Steps to Reproduce:
1. run 'condor_gridmanager (-h)'

  
Actual results:
Segmentation fault

Expected results:
No segmentation fault

Additional info:

Comment 1 Lubos Trilety 2010-09-08 12:03:56 UTC
Analyse of core file:
Core was generated by `condor_gridmanager'.
Program terminated with signal 11, Segmentation fault.
#0  DC_Exit (status=1, shutdown_program=0x0) at daemon_core_main.cpp:270
270		delete daemonCore;
(gdb) info threads
* 1 Thread 12984  DC_Exit (status=1, shutdown_program=0x0) at daemon_core_main.cpp:270
(gdb) bt
#0  DC_Exit (status=1, shutdown_program=0x0) at daemon_core_main.cpp:270
#1  0x00000000004f7430 in main (argc=1, argv=0x7fffec31c778) at daemon_core_main.cpp:1574

Comment 2 Matthew Farrellee 2010-09-08 12:05:57 UTC
condor_gridmanager should probably be in libexec.

This happens when run by root or as a user when passed only -o.

Program received signal SIGSEGV, Segmentation fault.
DC_Exit (status=1, shutdown_program=0x0) at daemon_core_main.cpp:270
270  delete daemonCore;
(gdb) where
#0  DC_Exit (status=1, shutdown_program=0x0) at daemon_core_main.cpp:270
#1  0x00000000004f7430 in main (argc=1, argv=0x7fff9ed1f258)
     at daemon_core_main.cpp:1574

gridmanager_main.cpp:
void
main_pre_dc_init( int argc, char* argv[] )
{
...
   } else if ( is_root() ) {
      dprintf( D_ALWAYS, "Don't know what user to run as!\n" );
      DC_Exit( 1 );
...

Problem is DC_Exit in pre_dc_init tries to delete dc (daemonCore), which is NULL.

Either DC_Exit should be more careful, or DC_Exit from pre_dc_init should be illegal. The case where -o is passed comes from pre_dc_init calling usage calling DC_Exit.

Comment 3 Lubos Trilety 2010-09-08 12:11:07 UTC
Created attachment 445965 [details]
strace

strace log from the run

Comment 4 Matthew Farrellee 2011-01-31 21:48:24 UTC
Fixed upstream for 7.5.6

--

Author: Matthew Farrellee <matt@redhat>

    Added NULL detection around "delete daemonCore" in DC_Exit

    The issue was discovered when running condor_gridmanager from the
    command line. The gridmanager can call DC_Exit from within
    main_pre_dc_init, which is by definition before the global daemonCore
    instance is allocated. DC_Exit would blindly attempt to delete a NULL
    daemonCore. An alternative fix was to prevent the gridmanager from
    calling DC_Exit within main_pre_dc_init, but code already in DC_Exit
    tested for daemonCore == NULL, making it appear that it should handle
    all cases where daemonCore may be null.

diff --git a/src/condor_daemon_core.V6/daemon_core_main.cpp b/src/condor_daemon_core.V6/daemon_core_main.cpp
index 1301cbc..1821d6e 100644
--- a/src/condor_daemon_core.V6/daemon_core_main.cpp
+++ b/src/condor_daemon_core.V6/daemon_core_main.cpp
@@ -280,9 +280,12 @@ DC_Exit( int status, const char *shutdown_program )
 #endif /* ! WIN32 */

      // Now, delete the daemonCore object, since we allocated it.
-  unsigned long  pid = daemonCore->getpid( );
-  delete daemonCore;
-  daemonCore = NULL;
+  unsigned long  pid = 0;
+  if (daemonCore) {
+     pid = daemonCore->getpid( );
+     delete daemonCore;
+     daemonCore = NULL;
+  }

      // Free up the memory from the config hash table, too.
   clear_config();

Comment 5 Matthew Farrellee 2011-02-14 17:14:31 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
C: condor_gridmanager deleted uninitialized memory when run as root or when passed -o. This is not a user concer, because condor_gridmanager is not intended to be run directly, from root or not, and is run properly when invoked from Condor.
C: No significant consequence, because condor_gridmanager is invoked properly by Condor itself.
F: Checks were put in place to avoid the improper delete.
R: All is well.

Comment 7 Tomas Rusnak 2011-05-04 09:13:53 UTC
Reproduced on RHEL5/x86_64 with:

$CondorVersion: 7.4.5 Feb  4 2011 BuildID: RH-7.4.5-0.8.el5 PRE-RELEASE $
$CondorPlatform: X86_64-LINUX_RHEL5 $

# condor_gridmanager
Segmentation fault

Comment 8 Tomas Rusnak 2011-05-04 09:20:18 UTC
Retested over supported platforms x86,x86_64/RHEL5,RHEL6 with:

condor-7.6.1-0.4

# condor_gridmanager 
# echo $?
1

No core file created. No crash found.

>>> VERIFIED

Comment 9 errata-xmlrpc 2011-06-23 15:41:13 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2011-0889.html


Note You need to log in before you can comment on or make changes to this bug.