Hide Forgot
Description of problem: I see this during automated testing: /var/log/condor/core.23028: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from '/usr/bin/python /usr/sbin/caroniad' GNU gdb (GDB) Red Hat Enterprise Linux (7.2-50.el6) Copyright (C) 2010 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: warning: core file may not match specified executable file. [New Thread 23028] [Thread debugging using libthread_db enabled] Core was generated by `/usr/bin/python /usr/sbin/caroniad'. Program terminated with signal 3, Quit. #0 0x00000036ca6de2d3 in __select_nocancel () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install python-2.6.6-29.el6.x86_64 (gdb) rax 0xfffffffffffffdfe -514 rbx 0x18890a0 25727136 rcx 0xffffffffffffffff -1 rdx 0x0 0 rsi 0x0 0 rdi 0x0 0 rbp 0x1dc6db0 0x1dc6db0 rsp 0x7fff5ffab4e8 0x7fff5ffab4e8 r8 0x7fff5ffab520 140734803653920 r9 0x0 0 r10 0x0 0 r11 0x246 582 r12 0x18890a0 25727136 r13 0x18a32a8 25834152 r14 0x1c5479c 29706140 r15 0xffffffff 4294967295 rip 0x36ca6de2d3 0x36ca6de2d3 <__select_nocancel+10> eflags 0x246 [ PF ZF IF ] cs 0x33 51 ss 0x2b 43 ds 0x0 0 es 0x0 0 fs 0x0 0 gs 0x0 0 (gdb) Using memory regions provided by the target. There are no memory regions defined. (gdb) 33 AT_SYSINFO_EHDR System-supplied DSO's ELF header 0x7fff5ffff000 16 AT_HWCAP Machine-dependent CPU capability hints 0xbfebfbff 6 AT_PAGESZ System page size 4096 17 AT_CLKTCK Frequency of times() 100 3 AT_PHDR Program headers for program 0x400040 4 AT_PHENT Size of program header entry 56 5 AT_PHNUM Number of program headers 8 7 AT_BASE Base address of interpreter 0x0 8 AT_FLAGS Flags 0x0 9 AT_ENTRY Entry point of program 0x400620 11 AT_UID Real user ID 0 12 AT_EUID Effective user ID 0 13 AT_GID Real group ID 0 14 AT_EGID Effective group ID 0 23 AT_SECURE Boolean, was exec setuid-like? 0 25 AT_RANDOM Address of 16 random bytes 0x7fff5ffac209 31 AT_EXECFN File name of executable 0x7fff5ffacfe5 "/usr/sbin/caroniad" 15 AT_PLATFORM String identifying platform 0x7fff5ffac219 "x86_64" 0 AT_NULL End of vector 0x0 (gdb) Stack level 0, frame at 0x7fff5ffab4f0: rip = 0x36ca6de2d3 in __select_nocancel; saved rip 0x7f0bf217d219 called by frame at 0x7fff5ffab550 Arglist at 0x7fff5ffab4e0, args: Locals at 0x7fff5ffab4e0, Previous frame's sp is 0x7fff5ffab4f0 Saved registers: rip at 0x7fff5ffab4e8 (gdb) From To Syms Read Shared Object Library 0x00000036cda3c680 0x00000036cdb1f0f8 Yes (*) /usr/lib64/libpython2.6.so.1.0 0x00000036caa05640 0x00000036caa10e58 Yes (*) /lib64/libpthread.so.0 0x00000036ca200de0 0x00000036ca201998 Yes (*) /lib64/libdl.so.2 0x00000036cce00e10 0x00000036cce01688 Yes (*) /lib64/libutil.so.1 0x00000036cae03ea0 0x00000036cae43fe8 Yes (*) /lib64/libm.so.6 0x00000036ca61ea20 0x00000036ca74c39c Yes (*) /lib64/libc.so.6 0x00000036c9e00b00 0x00000036c9e197ab Yes (*) /lib64/ld-linux-x86-64.so.2 0x00007f0bf905b1f0 0x00007f0bf9063648 Yes (*) /lib64/libnss_files.so.2 0x00007f0bf8e4d110 0x00007f0bf8e528c8 Yes (*) /usr/lib64/python2.6/lib-dynload/_socketmodule.so 0x00007f0bf8c45080 0x00007f0bf8c47c38 Yes (*) /usr/lib64/python2.6/lib-dynload/_ssl.so 0x00000036cee14540 0x00000036cee45fb8 Yes (*) /usr/lib64/libssl.so.10 0x00000036cde5ca00 0x00000036cdf238e8 Yes (*) /usr/lib64/libcrypto.so.10 0x00000036cea09d80 0x00000036cea36698 Yes (*) /lib64/libgssapi_krb5.so.2 0x00000036cd61a610 0x00000036cd68f4c8 Yes (*) /lib64/libkrb5.so.3 0x00007f0bf8a303f0 0x00007f0bf8a30fc8 Yes (*) /lib64/libcom_err.so.2 0x00000036ce2047c0 0x00000036ce21e468 Yes (*) /lib64/libk5crypto.so.3 0x00000036cb201f30 0x00000036cb20d1b8 Yes (*) /lib64/libz.so.1 0x00007f0bf8826840 0x00007f0bf882b9f8 Yes (*) /lib64/libkrb5support.so.0 0x00000036ce600bf0 0x00000036ce6011d8 Yes (*) /lib64/libkeyutils.so.1 0x00000036cca03930 0x00000036cca128d8 Yes (*) /lib64/libresolv.so.2 0x00000036cbe05850 0x00000036cbe15c88 Yes (*) /lib64/libselinux.so.1 0x00007f0bf8620aa0 0x00007f0bf8621bd8 Yes (*) /usr/lib64/python2.6/lib-dynload/cStringIO.so 0x00007f0bf2587820 0x00007f0bf258a7e8 Yes (*) /usr/lib64/python2.6/lib-dynload/_struct.so 0x00007f0bf23810c0 0x00007f0bf2382f78 Yes (*) /usr/lib64/python2.6/lib-dynload/binascii.so 0x00007f0bf217c8d0 0x00007f0bf217d898 Yes (*) /usr/lib64/python2.6/lib-dynload/timemodule.so 0x00007f0bf1f76600 0x00007f0bf1f78de8 Yes (*) /usr/lib64/python2.6/lib-dynload/stropmodule.so 0x00007f0bf1d72ec0 0x00007f0bf1d73938 Yes (*) /usr/lib64/python2.6/lib-dynload/_functoolsmodule.so 0x00007f0bf1b6d140 0x00007f0bf1b6fab8 Yes (*) /usr/lib64/python2.6/lib-dynload/_collectionsmodule.so 0x00007f0bf1964eb0 0x00007f0bf1966a28 Yes (*) /usr/lib64/python2.6/lib-dynload/operator.so 0x00007f0bf175fc80 0x00007f0bf1760198 Yes (*) /usr/lib64/python2.6/lib-dynload/grpmodule.so 0x00007f0bf155abe0 0x00007f0bf155c638 Yes (*) /usr/lib64/python2.6/lib-dynload/selectmodule.so 0x00007f0bf1355d10 0x00007f0bf1356a18 Yes (*) /usr/lib64/python2.6/lib-dynload/fcntlmodule.so 0x00007f0bf1145880 0x00007f0bf1150658 Yes (*) /usr/lib64/python2.6/lib-dynload/cPickle.so 0x00007f0bf0f3d3b0 0x00007f0bf0f3f7d8 Yes (*) /usr/lib64/python2.6/lib-dynload/zlibmodule.so 0x00007f0bf0c73a30 0x00007f0bf0c773e8 Yes (*) /usr/lib64/python2.6/lib-dynload/arraymodule.so 0x00007f0bf0a6c080 0x00007f0bf0a6e208 Yes (*) /usr/lib64/python2.6/lib-dynload/mathmodule.so 0x00007f0bf08681c0 0x00007f0bf0868ee8 Yes (*) /usr/lib64/python2.6/lib-dynload/_randommodule.so 0x00007f0bf0664540 0x00007f0bf0665078 Yes (*) /usr/lib64/python2.6/lib-dynload/_hashlib.so 0x00007f0bf0461b70 0x00007f0bf04621c8 Yes (*) /usr/lib64/python2.6/lib-dynload/_bisectmodule.so 0x00007f0bf0251750 0x00007f0bf0259b88 Yes (*) /usr/lib64/python2.6/lib-dynload/datetime.so (*): Shared library is missing debugging information. (gdb) * 1 Thread 0x7f0bf935d700 (LWP 23028) 0x00000036ca6de2d3 in __select_nocancel () from /lib64/libc.so.6 Thread 1 (Thread 0x7f0bf935d700 (LWP 23028)): #0 0x00000036ca6de2d3 in __select_nocancel () from /lib64/libc.so.6 #1 0x00007f0bf217d219 in ?? () from /usr/lib64/python2.6/lib-dynload/timemodule.so #2 0x00000036cdade7f4 in PyEval_EvalFrameEx () from /usr/lib64/libpython2.6.so.1.0 #3 0x00000036cdadf99f in PyEval_EvalFrameEx () from /usr/lib64/libpython2.6.so.1.0 #4 0x00000036cdae0467 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.6.so.1.0 #5 0x00000036cdade8b4 in PyEval_EvalFrameEx () from /usr/lib64/libpython2.6.so.1.0 #6 0x00000036cdae0467 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.6.so.1.0 #7 0x00000036cdade8b4 in PyEval_EvalFrameEx () from /usr/lib64/libpython2.6.so.1.0 #8 0x00000036cdae0467 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.6.so.1.0 #9 0x00000036cdae0542 in PyEval_EvalCode () from /usr/lib64/libpython2.6.so.1.0 #10 0x00000036cdafb88c in ?? () from /usr/lib64/libpython2.6.so.1.0 #11 0x00000036cdafb960 in PyRun_FileExFlags () from /usr/lib64/libpython2.6.so.1.0 #12 0x00000036cdafce4c in PyRun_SimpleFileExFlags () from /usr/lib64/libpython2.6.so.1.0 #13 0x00000036cdb094cf in Py_Main () from /usr/lib64/libpython2.6.so.1.0 #14 0x00000036ca61ecdd in __libc_start_main () from /lib64/libc.so.6 #15 0x0000000000400649 in _start () (gdb) quit Test installed stable MRG + all packages from all grid erratas which we are not shipped. Test didn't changed any EC2 settings. Version-Release number of selected component (if applicable): condor-7.6.5-0.11.el6.x86_64 condor-aviary-7.6.5-0.11.el6.x86_64 condor-classads-7.6.5-0.11.el6.x86_64 condor-debuginfo-7.6.5-0.11.el6.x86_64 condor-ec2-enhanced-1.3.0-1.el6.noarch condor-ec2-enhanced-hooks-1.3.0-1.el6.noarch condor-job-hooks-1.5-4.el6.noarch condor-kbdd-7.6.5-0.11.el6.x86_64 condor-plumage-7.6.5-0.11.el6.x86_64 condor-qmf-7.6.5-0.11.el6.x86_64 condor-vm-gahp-7.6.5-0.11.el6.x86_64 condor-wallaby-base-db-1.19-1.el6.noarch condor-wallaby-client-4.1.2-1.el6.noarch condor-wallaby-tools-4.1.2-1.el6.noarch python-condorec2e-1.3.0-1.el6.noarch python-condorutils-1.5-4.el6.noarch python-qpid-0.12-1.el6.noarch python-qpid-qmf-0.12-6.el6.x86_64 qpid-cpp-client-0.12-6.el6.x86_64 qpid-cpp-client-devel-0.12-6.el6.x86_64 qpid-cpp-client-devel-docs-0.12-6.el6.noarch qpid-cpp-server-0.12-6.el6.x86_64 qpid-cpp-server-cluster-0.12-6.el6.x86_64 qpid-cpp-server-devel-0.12-6.el6.x86_64 qpid-cpp-server-store-0.12-6.el6.x86_64 qpid-java-client-0.10-9.el6.noarch qpid-java-common-0.10-9.el6.noarch qpid-java-example-0.10-9.el6.noarch qpid-qmf-0.12-6.el6.x86_64 qpid-qmf-debuginfo-0.12-6.el6.x86_64 qpid-qmf-devel-0.12-6.el6.x86_64 qpid-tools-0.12-2.el6.noarch How reproducible: 60%
I've been unable to reproduce with latest grid bits (including latest boto). It's possible the crash is caused somewhere in boto, as one of the first things caroniad does it try to get the AMI user_data. Can you provide a reproduction scenario?
I've been unable to reproduce with latest packages.
The reproducing scenario is missing. Sorry for that. I was unable to find a stable reproducer for a long time, but here it is a more stable one. This is not about caroniad installed on a AMI. It happens when caroniad is installed on a normal system. Caroniad now is enabled upon installation; usually caroniad exits after a while and it is restarted again. Sometimes, caroniad is not killed immediately but it stays running. When condor is shutting down and caroniad is still running, the following message in seen in the logs: 02/18/13 11:37:12 The CARONIAD (pid 15282) died due to signal 3 (Quit) and a core dump is generated. So maybe there is a different issue (caroniad does not exists immediately when no data are available). Generation of core dumps has been enabled in the system and in condor. This behavior is still be visible with latest packages.
MRG-Grid is in maintenance and only customer escalations will be considered. This issue can be reopened if a customer escalation associated with it occurs.