Bug 783464

Summary: caroniad dumps corefile
Product: Red Hat Enterprise MRG Reporter: Martin Kudlej <mkudlej>
Component: condor-ec2-enhancedAssignee: grid-maint-list <grid-maint-list>
Status: CLOSED WONTFIX QA Contact: MRG Quality Engineering <mrgqe-bugs>
Severity: unspecified Docs Contact:
Priority: low    
Version: DevelopmentCC: esammons, ltoscano, matt, rrati, tstclair
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-05-26 20:14:31 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Martin Kudlej 2012-01-20 14:29:34 UTC
Description of problem:
I see this during automated testing:
/var/log/condor/core.23028: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from '/usr/bin/python /usr/sbin/caroniad'
  GNU gdb (GDB) Red Hat Enterprise Linux (7.2-50.el6)
  Copyright (C) 2010 Free Software Foundation, Inc.
  License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
  This is free software: you are free to change and redistribute it.
  There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
  and "show warranty" for details.
  This GDB was configured as "x86_64-redhat-linux-gnu".
  For bug reporting instructions, please see:
  warning: core file may not match specified executable file.
  [New Thread 23028]
  [Thread debugging using libthread_db enabled]
  Core was generated by `/usr/bin/python /usr/sbin/caroniad'.
  Program terminated with signal 3, Quit.
  #0  0x00000036ca6de2d3 in __select_nocancel () from /lib64/libc.so.6
  Missing separate debuginfos, use: debuginfo-install python-2.6.6-29.el6.x86_64
  (gdb) rax            0xfffffffffffffdfe	-514
  rbx            0x18890a0	25727136
  rcx            0xffffffffffffffff	-1
  rdx            0x0	0
  rsi            0x0	0
  rdi            0x0	0
  rbp            0x1dc6db0	0x1dc6db0
  rsp            0x7fff5ffab4e8	0x7fff5ffab4e8
  r8             0x7fff5ffab520	140734803653920
  r9             0x0	0
  r10            0x0	0
  r11            0x246	582
  r12            0x18890a0	25727136
  r13            0x18a32a8	25834152
  r14            0x1c5479c	29706140
  r15            0xffffffff	4294967295
  rip            0x36ca6de2d3	0x36ca6de2d3 <__select_nocancel+10>
  eflags         0x246	[ PF ZF IF ]
  cs             0x33	51
  ss             0x2b	43
  ds             0x0	0
  es             0x0	0
  fs             0x0	0
  gs             0x0	0
  (gdb) Using memory regions provided by the target.
  There are no memory regions defined.
  (gdb) 33   AT_SYSINFO_EHDR      System-supplied DSO's ELF header 0x7fff5ffff000
  16   AT_HWCAP             Machine-dependent CPU capability hints 0xbfebfbff
  6    AT_PAGESZ            System page size               4096
  17   AT_CLKTCK            Frequency of times()           100
  3    AT_PHDR              Program headers for program    0x400040
  4    AT_PHENT             Size of program header entry   56
  5    AT_PHNUM             Number of program headers      8
  7    AT_BASE              Base address of interpreter    0x0
  8    AT_FLAGS             Flags                          0x0
  9    AT_ENTRY             Entry point of program         0x400620
  11   AT_UID               Real user ID                   0
  12   AT_EUID              Effective user ID              0
  13   AT_GID               Real group ID                  0
  14   AT_EGID              Effective group ID             0
  23   AT_SECURE            Boolean, was exec setuid-like? 0
  25   AT_RANDOM            Address of 16 random bytes     0x7fff5ffac209
  31   AT_EXECFN            File name of executable        0x7fff5ffacfe5 "/usr/sbin/caroniad"
  15   AT_PLATFORM          String identifying platform    0x7fff5ffac219 "x86_64"
  0    AT_NULL              End of vector                  0x0
  (gdb) Stack level 0, frame at 0x7fff5ffab4f0:
   rip = 0x36ca6de2d3 in __select_nocancel; saved rip 0x7f0bf217d219
   called by frame at 0x7fff5ffab550
   Arglist at 0x7fff5ffab4e0, args: 
   Locals at 0x7fff5ffab4e0, Previous frame's sp is 0x7fff5ffab4f0
   Saved registers:
    rip at 0x7fff5ffab4e8
  (gdb) From                To                  Syms Read   Shared Object Library
  0x00000036cda3c680  0x00000036cdb1f0f8  Yes (*)     /usr/lib64/libpython2.6.so.1.0
  0x00000036caa05640  0x00000036caa10e58  Yes (*)     /lib64/libpthread.so.0
  0x00000036ca200de0  0x00000036ca201998  Yes (*)     /lib64/libdl.so.2
  0x00000036cce00e10  0x00000036cce01688  Yes (*)     /lib64/libutil.so.1
  0x00000036cae03ea0  0x00000036cae43fe8  Yes (*)     /lib64/libm.so.6
  0x00000036ca61ea20  0x00000036ca74c39c  Yes (*)     /lib64/libc.so.6
  0x00000036c9e00b00  0x00000036c9e197ab  Yes (*)     /lib64/ld-linux-x86-64.so.2
  0x00007f0bf905b1f0  0x00007f0bf9063648  Yes (*)     /lib64/libnss_files.so.2
  0x00007f0bf8e4d110  0x00007f0bf8e528c8  Yes (*)     /usr/lib64/python2.6/lib-dynload/_socketmodule.so
  0x00007f0bf8c45080  0x00007f0bf8c47c38  Yes (*)     /usr/lib64/python2.6/lib-dynload/_ssl.so
  0x00000036cee14540  0x00000036cee45fb8  Yes (*)     /usr/lib64/libssl.so.10
  0x00000036cde5ca00  0x00000036cdf238e8  Yes (*)     /usr/lib64/libcrypto.so.10
  0x00000036cea09d80  0x00000036cea36698  Yes (*)     /lib64/libgssapi_krb5.so.2
  0x00000036cd61a610  0x00000036cd68f4c8  Yes (*)     /lib64/libkrb5.so.3
  0x00007f0bf8a303f0  0x00007f0bf8a30fc8  Yes (*)     /lib64/libcom_err.so.2
  0x00000036ce2047c0  0x00000036ce21e468  Yes (*)     /lib64/libk5crypto.so.3
  0x00000036cb201f30  0x00000036cb20d1b8  Yes (*)     /lib64/libz.so.1
  0x00007f0bf8826840  0x00007f0bf882b9f8  Yes (*)     /lib64/libkrb5support.so.0
  0x00000036ce600bf0  0x00000036ce6011d8  Yes (*)     /lib64/libkeyutils.so.1
  0x00000036cca03930  0x00000036cca128d8  Yes (*)     /lib64/libresolv.so.2
  0x00000036cbe05850  0x00000036cbe15c88  Yes (*)     /lib64/libselinux.so.1
  0x00007f0bf8620aa0  0x00007f0bf8621bd8  Yes (*)     /usr/lib64/python2.6/lib-dynload/cStringIO.so
  0x00007f0bf2587820  0x00007f0bf258a7e8  Yes (*)     /usr/lib64/python2.6/lib-dynload/_struct.so
  0x00007f0bf23810c0  0x00007f0bf2382f78  Yes (*)     /usr/lib64/python2.6/lib-dynload/binascii.so
  0x00007f0bf217c8d0  0x00007f0bf217d898  Yes (*)     /usr/lib64/python2.6/lib-dynload/timemodule.so
  0x00007f0bf1f76600  0x00007f0bf1f78de8  Yes (*)     /usr/lib64/python2.6/lib-dynload/stropmodule.so
  0x00007f0bf1d72ec0  0x00007f0bf1d73938  Yes (*)     /usr/lib64/python2.6/lib-dynload/_functoolsmodule.so
  0x00007f0bf1b6d140  0x00007f0bf1b6fab8  Yes (*)     /usr/lib64/python2.6/lib-dynload/_collectionsmodule.so
  0x00007f0bf1964eb0  0x00007f0bf1966a28  Yes (*)     /usr/lib64/python2.6/lib-dynload/operator.so
  0x00007f0bf175fc80  0x00007f0bf1760198  Yes (*)     /usr/lib64/python2.6/lib-dynload/grpmodule.so
  0x00007f0bf155abe0  0x00007f0bf155c638  Yes (*)     /usr/lib64/python2.6/lib-dynload/selectmodule.so
  0x00007f0bf1355d10  0x00007f0bf1356a18  Yes (*)     /usr/lib64/python2.6/lib-dynload/fcntlmodule.so
  0x00007f0bf1145880  0x00007f0bf1150658  Yes (*)     /usr/lib64/python2.6/lib-dynload/cPickle.so
  0x00007f0bf0f3d3b0  0x00007f0bf0f3f7d8  Yes (*)     /usr/lib64/python2.6/lib-dynload/zlibmodule.so
  0x00007f0bf0c73a30  0x00007f0bf0c773e8  Yes (*)     /usr/lib64/python2.6/lib-dynload/arraymodule.so
  0x00007f0bf0a6c080  0x00007f0bf0a6e208  Yes (*)     /usr/lib64/python2.6/lib-dynload/mathmodule.so
  0x00007f0bf08681c0  0x00007f0bf0868ee8  Yes (*)     /usr/lib64/python2.6/lib-dynload/_randommodule.so
  0x00007f0bf0664540  0x00007f0bf0665078  Yes (*)     /usr/lib64/python2.6/lib-dynload/_hashlib.so
  0x00007f0bf0461b70  0x00007f0bf04621c8  Yes (*)     /usr/lib64/python2.6/lib-dynload/_bisectmodule.so
  0x00007f0bf0251750  0x00007f0bf0259b88  Yes (*)     /usr/lib64/python2.6/lib-dynload/datetime.so
  (*): Shared library is missing debugging information.
  (gdb) * 1 Thread 0x7f0bf935d700 (LWP 23028)  0x00000036ca6de2d3 in __select_nocancel () from /lib64/libc.so.6
  Thread 1 (Thread 0x7f0bf935d700 (LWP 23028)):
  #0  0x00000036ca6de2d3 in __select_nocancel () from /lib64/libc.so.6
  #1  0x00007f0bf217d219 in ?? () from /usr/lib64/python2.6/lib-dynload/timemodule.so
  #2  0x00000036cdade7f4 in PyEval_EvalFrameEx () from /usr/lib64/libpython2.6.so.1.0
  #3  0x00000036cdadf99f in PyEval_EvalFrameEx () from /usr/lib64/libpython2.6.so.1.0
  #4  0x00000036cdae0467 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.6.so.1.0
  #5  0x00000036cdade8b4 in PyEval_EvalFrameEx () from /usr/lib64/libpython2.6.so.1.0
  #6  0x00000036cdae0467 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.6.so.1.0
  #7  0x00000036cdade8b4 in PyEval_EvalFrameEx () from /usr/lib64/libpython2.6.so.1.0
  #8  0x00000036cdae0467 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.6.so.1.0
  #9  0x00000036cdae0542 in PyEval_EvalCode () from /usr/lib64/libpython2.6.so.1.0
  #10 0x00000036cdafb88c in ?? () from /usr/lib64/libpython2.6.so.1.0
  #11 0x00000036cdafb960 in PyRun_FileExFlags () from /usr/lib64/libpython2.6.so.1.0
  #12 0x00000036cdafce4c in PyRun_SimpleFileExFlags () from /usr/lib64/libpython2.6.so.1.0
  #13 0x00000036cdb094cf in Py_Main () from /usr/lib64/libpython2.6.so.1.0
  #14 0x00000036ca61ecdd in __libc_start_main () from /lib64/libc.so.6
  #15 0x0000000000400649 in _start ()
  (gdb) quit

Test installed stable MRG + all packages from all grid erratas which we are not shipped.
Test didn't changed any EC2 settings.

Version-Release number of selected component (if applicable):
condor-7.6.5-0.11.el6.x86_64
condor-aviary-7.6.5-0.11.el6.x86_64
condor-classads-7.6.5-0.11.el6.x86_64
condor-debuginfo-7.6.5-0.11.el6.x86_64
condor-ec2-enhanced-1.3.0-1.el6.noarch
condor-ec2-enhanced-hooks-1.3.0-1.el6.noarch
condor-job-hooks-1.5-4.el6.noarch
condor-kbdd-7.6.5-0.11.el6.x86_64
condor-plumage-7.6.5-0.11.el6.x86_64
condor-qmf-7.6.5-0.11.el6.x86_64
condor-vm-gahp-7.6.5-0.11.el6.x86_64
condor-wallaby-base-db-1.19-1.el6.noarch
condor-wallaby-client-4.1.2-1.el6.noarch
condor-wallaby-tools-4.1.2-1.el6.noarch
python-condorec2e-1.3.0-1.el6.noarch
python-condorutils-1.5-4.el6.noarch
python-qpid-0.12-1.el6.noarch
python-qpid-qmf-0.12-6.el6.x86_64
qpid-cpp-client-0.12-6.el6.x86_64
qpid-cpp-client-devel-0.12-6.el6.x86_64
qpid-cpp-client-devel-docs-0.12-6.el6.noarch
qpid-cpp-server-0.12-6.el6.x86_64
qpid-cpp-server-cluster-0.12-6.el6.x86_64
qpid-cpp-server-devel-0.12-6.el6.x86_64
qpid-cpp-server-store-0.12-6.el6.x86_64
qpid-java-client-0.10-9.el6.noarch
qpid-java-common-0.10-9.el6.noarch
qpid-java-example-0.10-9.el6.noarch
qpid-qmf-0.12-6.el6.x86_64
qpid-qmf-debuginfo-0.12-6.el6.x86_64
qpid-qmf-devel-0.12-6.el6.x86_64
qpid-tools-0.12-2.el6.noarch

How reproducible:
60%

Comment 1 Robert Rati 2012-04-05 17:52:32 UTC
I've been unable to reproduce with latest grid bits (including latest boto).  It's possible the crash is caused somewhere in boto, as one of the first things caroniad does it try to get the AMI user_data.  Can you provide a reproduction scenario?

Comment 2 Robert Rati 2012-04-10 14:59:24 UTC
I've been unable to reproduce with latest packages.

Comment 5 Luigi Toscano 2013-02-18 11:11:50 UTC
The reproducing scenario is missing. Sorry for that. I was unable to find a stable reproducer for a long time, but here it is a more stable one.

This is not about caroniad installed on a AMI. It happens when caroniad is installed on a normal system. Caroniad now is enabled upon installation; usually caroniad exits after a while and it is restarted again.

Sometimes, caroniad is not killed immediately but it stays running. 
When condor is shutting down and caroniad is still running, the following message in seen in the logs:
02/18/13 11:37:12 The CARONIAD (pid 15282) died due to signal 3 (Quit)
and a core dump is generated.

So maybe there is a different issue (caroniad does not exists immediately when no data are available).

Generation of core dumps has been enabled in the system and in condor.

This behavior is still be visible with latest packages.

Comment 6 Anne-Louise Tangring 2016-05-26 20:14:31 UTC
MRG-Grid is in maintenance and only customer escalations will be considered. This issue can be reopened if a customer escalation associated with it occurs.