Bug 486462
| Summary: | query workers during log rotation crash schedd and collector | ||
|---|---|---|---|
| Product: | Red Hat Enterprise MRG | Reporter: | Matthew Farrellee <matt> |
| Component: | condor | Assignee: | Matthew Farrellee <matt> |
| Status: | CLOSED WORKSFORME | QA Contact: | MRG Quality Engineering <mrgqe-bugs> |
| Severity: | medium | Docs Contact: | |
| Priority: | low | ||
| Version: | 1.1 | ||
| Target Milestone: | 1.3 | ||
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2010-01-25 18:26:57 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Matthew Farrellee
2009-02-19 21:08:54 UTC
SchedLog.old spans 2/19 11:00:04 to 2/19 11:00:07, so it was rotating rather quickly setting MAX_SCHEDD_LOG low and SCHEDD_DEBUG high may assist in reproducing this This exists in at least condor-7.2.2-0.1.el5 I believe this is related to query workers and log rotation. For instance, the log being rotated by a query worker at the same time the schedd tries to rotate the log. To reproduce, run while true; do condor_q > /dev/null 2>&1 ; done against a condor_schedd configured with MAX_SCHEDD_LOG = 1024 SCHEDD_DEBUG = D_FULLDEBUG SCHEDD_QUERY_WORKERS = 1 Set D_ALL reproduces faster. Set SCHEDD_QUERY_WORKERS = 0 to workaround the issue. This appears to also be an issue for the Collector. MAX_COLLECTOR_LOG = 1024 COLLECTOR_DEBUG = D_ALL COLLECTOR_QUERY_WORKERS = 1 In the 7.4 series, schedd defaults to 3 workers and collector defaults to 2. The latest failures were on F12 built binaries. *** glibc detected *** condor_schedd: double free or corruption (!prev): 0x08d66190 *** ======= Backtrace: ========= /lib/libc.so.6(+0x6e751)[0x76e751] /lib/libc.so.6(fclose+0x14a)[0x75e5ea] condor_schedd(fclose_wrapper+0x6a)[0x81a4ad4] condor_schedd(preserve_log_file+0xc3)[0x81a40fe] condor_schedd(debug_lock+0x399)[0x81a3e13] condor_schedd(_condor_dprintf_va+0x210)[0x81a362f] condor_schedd(dprintf+0x25)[0x81a4fe9] condor_schedd(_mark_thread_safe+0x12c)[0x81aa80c] condor_schedd(_Z12condor_writePKciPciii+0x5d0)[0x826d0b3] condor_schedd(_ZN3Buf5writeEPKciii+0x85)[0x8264923] condor_schedd(_ZN3Buf5flushEPKciPvii+0x87)[0x82649e9] condor_schedd(_ZN8ReliSock6SndMsg10snd_packetEPKciii+0xf8)[0x82618ee] condor_schedd(_ZN8ReliSock14end_of_messageEv+0xab)[0x8260da7] condor_schedd(_Z12do_Q_requestP8ReliSockRb+0x1170)[0x813a9ae] condor_schedd(_Z8handle_qP7ServiceiP6Stream+0xe0)[0x8130bb4] condor_schedd(_ZN10DaemonCore18CallCommandHandlerEiP6Streamb+0x187)[0x818a971] condor_schedd(_ZN10DaemonCore9HandleReqEP6StreamS1_+0x2ec8)[0x818de7e] condor_schedd(_ZN10DaemonCore9HandleReqEiP6Stream+0x39)[0x818afb3] condor_schedd(_ZN10DaemonCore24CallSocketHandler_workerEibP6Stream+0x29f)[0x818a5bb] condor_schedd(_ZN10DaemonCore35CallSocketHandler_worker_demarshallEPv+0x3b)[0x818a30b] condor_schedd(_ZN13CondorThreads8pool_addEPFvPvES0_PiPKc+0x29)[0x8232b7b] condor_schedd(_ZN10DaemonCore17CallSocketHandlerERib+0x1bc)[0x818a2ce] condor_schedd(_ZN10DaemonCore6DriverEv+0x180e)[0x8189fdc] condor_schedd(main+0x1ce0)[0x819de00] /lib/libc.so.6(__libc_start_main+0xe6)[0x716bb6] condor_schedd[0x80fc181] ======= Memory map: ======== 00110000-0013d000 r-xp 00000000 fd:00 533893 /usr/lib/libgsoapssl++.so.0.0.0 0013d000-0013f000 rw-p 0002c000 fd:00 533893 /usr/lib/libgsoapssl++.so.0.0.0 0013f000-0015c000 r-xp 00000000 fd:00 221510 /lib/libgcc_s-4.4.2-20091222.so.1 0015c000-0015d000 rw-p 0001c000 fd:00 221510 /lib/libgcc_s-4.4.2-20091222.so.1 0015d000-00187000 r-xp 00000000 fd:00 221866 /lib/libk5crypto.so.3.1 00187000-00188000 rw-p 0002a000 fd:00 221866 /lib/libk5crypto.so.3.1 00188000-001cd000 r-xp 00000000 fd:00 533441 /usr/lib/libfreebl3.so 001cd000-001ce000 rw-p 00044000 fd:00 533441 /usr/lib/libfreebl3.so 001ce000-001d2000 rw-p 00000000 00:00 0 001d2000-001d6000 r-xp 00000000 fd:00 223762 /lib/libuuid.so.1.3.0 001d6000-001d7000 rw-p 00003000 fd:00 223762 /lib/libuuid.so.1.3.0 001d9000-001f7000 r-xp 00000000 fd:00 107254 /lib/ld-2.11.1.so 001f7000-001f8000 r--p 0001d000 fd:00 107254 /lib/ld-2.11.1.so 001f8000-001f9000 rw-p 0001e000 fd:00 107254 /lib/ld-2.11.1.so 001f9000-0022b000 r-xp 00000000 fd:01 8593760 /home/matt/Documents/Condor/CONDOR_SRC/src/release_dir/libexec/MgmtScheddPlugin-plugin.so 0022b000-0022d000 rw-p 00031000 fd:01 8593760 /home/matt/Documents/Condor/CONDOR_SRC/src/release_dir/libexec/MgmtScheddPlugin-plugin.so 0022d000-0033a000 r-xp 00000000 fd:00 536727 /usr/lib/libqpidclient.so.3.0.0 0033a000-0033f000 rw-p 0010c000 fd:00 536727 /usr/lib/libqpidclient.so.3.0.0 0033f000-00340000 rw-p 00000000 00:00 0 00340000-0035a000 r-xp 00000000 fd:00 537303 /usr/lib/libqmf.so.1.0.0 0035a000-0035b000 rw-p 0001a000 fd:00 537303 /usr/lib/libqmf.so.1.0.0 0035b000-0039c000 r-xp 00000000 fd:00 534306 /usr/lib/libboost_program_options.so.1.39.0 0039c000-0039e000 rw-p 00040000 fd:00 534306 /usr/lib/libboost_program_options.so.1.39.0 0039e000-003b7000 r-xp 00000000 fd:00 533370 /usr/lib/libsasl2.so.2.0.23 003b7000-003b8000 rw-p 00018000 fd:00 533370 /usr/lib/libsasl2.so.2.0.23 003b8000-003bc000 r-xp 00000000 fd:00 534504 /usr/lib/sasl2/libplain.so.2.0.23 003bc000-003bd000 rw-p 00003000 fd:00 534504 /usr/lib/sasl2/libplain.so.2.0.23 003bd000-003c1000 r-xp 00000000 fd:00 534501 /usr/lib/sasl2/liblogin.so.2.0.23 003c1000-003c2000 rw-p 00003000 fd:00 534501 /usr/lib/sasl2/liblogin.so.2.0.23 003c2000-003c6000 r-xp 00000000 fd:00 533374 /usr/lib/sasl2/libanonymous.so.2.0.23 003c6000-003c7000 rw-p 00003000 fd:00 533374 /usr/lib/sasl2/libanonymous.so.2.0.23 003c7000-003cb000 r-xp 00000000 fd:00 536745 /usr/lib/sasl2/libcrammd5.so.2.0.23 003cb000-003cc000 rw-p 00003000 fd:00 536745 /usr/lib/sasl2/libcrammd5.so.2.0.23 003d0000-003e2000 r-xp 00000000 fd:00 223842 /lib/libz.so.1.2.3 003e2000-003e3000 rw-p 00011000 fd:00 223842 /lib/libz.so.1.2.3 003e3000-00456000 r-xp 00000000 fd:00 537378 /usr/lib/libqmfengine.so.1.0.1 00456000-00458000 rw-p 00073000 fd:00 537378 /usr/lib/libqmfengine.so.1.0.1 004d0000-004ec000 r-xp 00000000 fd:00 223830 /lib/libselinux.so.1 004ec000-004ed000 r--p 0001b000 fd:00 223830 /lib/libselinux.so.1 004ed000-004ee000 rw-p 0001c000 fd:00 223830 /lib/libselinux.so.1 00531000-00547000 r-xp 00000000 fd:00 221481 /lib/libpthread-2.11.1.so 00547000-00548000 r--p 00015000 fd:00 221481 /lib/libpthread-2.11.1.so 00548000-00549000 rw-p 00016000 fd:00 221481 /lib/libpthread-2.11.1.so 00549000-0054b000 rw-p 00000000 00:00 0 0058d000-005b5000 r-xp 00000000 fd:00 221459 /lib/libm-2.11.1.so 005b5000-005b6000 r--p 00027000 fd:00 221459 /lib/libm-2.11.1.so 005b6000-005b7000 rw-p 00028000 fd:00 221459 /lib/libm-2.11.1.so 00653000-00656000 r-xp 00000000 fd:00 221455 /lib/libdl-2.11.1.so 00656000-00657000 r--p 00002000 fd:00 221455 /lib/libdl-2.11.1.so 00657000-00658000 rw-p 00003000 fd:00 221455 /lib/libdl-2.11.1.so 00700000-0086f000 r-xp 00000000 fd:00 221446 /lib/libc-2.11.1.so 0086f000-00871000 r--p 0016e000 fd:00 221446 /lib/libc-2.11.1.so 00871000-00872000 rw-p 00170000 fd:00 221446 /lib/libc-2.11.1.so 00872000-00875000 rw-p 00000000 00:00 0 008a8000-008b0000 r-xp 00000000 fd:00 221875 /lib/libkrb5support.so.0.1 008b0000-008b1000 rw-p 00008000 fd:00 221875 /lib/libkrb5support.so.0.1 00927000-00928000 r-xp 00000000 00:00 0 [vdso] 00a97000-00a9a000 r-xp 00000000 fd:00 172765 /usr/lib/libboost_system.so.1.39.0 00a9a000-00a9b000 rw-p 00002000 fd:00 172765 /usr/lib/libboost_system.so.1.39.0 00a9d000-00aae000 r-xp 00000000 fd:00 172770 /usr/lib/libboost_filesystem.so.1.39.0 00aae000-00aaf000 rw-p 00011000 fd:00 172770 /usr/lib/libboost_filesystem.so.1.39.0 00abb000-00ac7000 r-xp 00000000 fd:00 536748 /usr/lib/sasl2/libdigestmd5.so.2.0.23 00ac7000-00ac8000 rw-p 0000b000 fd:00 536748 /usr/lib/sasl2/libdigestmd5.so.2.0.23 00ad3000-00b86000 r-xp 00000000 fd:00 221873 /lib/libkrb5.so.3.3 00b86000-00b8c000 rw-p 000b3000 fd:00 221873 /lib/libkrb5.so.3.3 00c53000-00c5a000 r-xp 00000000 fd:00 221452 /lib/libcrypt-2.11.1.so 00c5a000-00c5b000 r--p 00007000 fd:00 221452 /lib/libcrypt-2.11.1.so 00c5b000-00c5c000 rw-p 00008000 fd:00 221452 /lib/libcrypt-2.11.1.so 00c5c000-00c83000 rw-p 00000000 00:00 0 00d5c000-00d63000 r-xp 00000000 fd:00 221485 /lib/librt-2.11.1.so 00d63000-00d64000 r--p 00006000 fd:00 221485 /lib/librt-2.11.1.so 00d64000-00d65000 rw-p 00007000 fd:00 221485 /lib/librt-2.11.1.so 00dd3000-00dd5000 r-xp 00000000 fd:00 223858 /lib/libcom_err.so.2.1 00dd5000-00dd6000 rw-p 00002000 fd:00 223858 /lib/libcom_err.so.2.1 00dd8000-00dda000 r-xp 00000000 fd:00 223855 /lib/libkeyutils-1.2.so 00dda000-00ddb000 rw-p 00001000 fd:00 223855 /lib/libkeyutils-1.2.so 00e0c000-00e11000 r-xp 00000000 fd:00 533377 /usr/lib/sasl2/libsasldb.so.2.0.23 00e11000-00e12000 rw-p 00004000 fd:00 533377 /usr/lib/sasl2/libsasldb.so.2.0.23 00e29000-00e56000 r-xp 00000000 fd:00 221728 /lib/libgssapi_krb5.so.2.2 00e56000-00e57000 rw-p 0002d000 fd:00 221728 /lib/libgssapi_krb5.so.2.2 00eed000-00f01000 r-xp 00000000 fd:00 221483 /lib/libresolv-2.11.1.so 00f01000-00f02000 ---p 00014000 fd:00 221483 /lib/libresolv-2.11.1.so 00f02000-00f03000 r--p 00014000 fd:00 221483 /lib/libresolv-2.11.1.so 00f03000-00f04000 rw-p 00015000 fd:00 221483 /lib/libresolv-2.11.1.sozsh: segmentation fault (core dumped) condor_schedd -f #0 0x0065526d in vfprintf () from /lib/libc.so.6
#1 0x0065f95f in fprintf () from /lib/libc.so.6
#2 0x081a40dc in preserve_log_file (debug_level=0) at dprintf.c:649
#3 0x081a3e13 in debug_lock (debug_level=0) at dprintf.c:565
#4 0x081a362f in _condor_dprintf_va (flags=4096,
fmt=0x82bc258 "%s is undefined, using default value of %s\n",
args=0xb61fe768 "\251\262*\bP\302+\b") at dprintf.c:310
#5 0x081a4fe9 in dprintf (flags=4096,
fmt=0x82bc258 "%s is undefined, using default value of %s\n")
at dprintf_common.c:76
#6 0x081d78af in param_boolean (name=0x82ab2a9 "GLEXEC_JOB",
default_value=false, do_log=true, me=0x0, target=0x0, use_param_table=true)
at condor_config.cpp:1857
#7 0x0818e91a in DaemonCore::Send_Signal (this=0x98a44c8, msg=...,
nonblocking=false) at daemon_core.cpp:5143
#8 0x0818e636 in DaemonCore::Send_Signal (this=0x98a44c8, pid=23370, sig=17)
at daemon_core.cpp:5072
#9 0x0819bda6 in unix_sigchld () at daemon_core_main.cpp:1283
#10 <signal handler called>
#11 0x00869424 in __kernel_vsyscall ()
#12 0x006f1646 in epoll_wait () from /lib/libc.so.6
#13 0x00ebc6c1 in qpid::sys::Poller::wait(qpid::sys::Duration) ()
from /usr/lib/libqpidcommon.so.3
#14 0x00ebcd18 in qpid::sys::Poller::run() () from /usr/lib/libqpidcommon.so.3
#15 0x00f83ec5 in qpid::sys::Dispatcher::run() ()
from /usr/lib/libqpidcommon.so.3
#16 0x00b5cd70 in qpid::client::TCPConnector::run() ()
from /usr/lib/libqpidclient.so.3
#17 0x00eb1dc2 in ?? () from /usr/lib/libqpidcommon.so.3
#18 0x00473ab5 in start_thread () from /lib/libpthread.so.0
#19 0x006f0dce in clone () from /lib/libc.so.6
#0 0x00d13308 in free () from /lib/libc.so.6
#1 0x00d0caf4 in _IO_free_backup_area_internal () from /lib/libc.so.6
#2 0x00d0a9f2 in _IO_new_file_overflow () from /lib/libc.so.6
#3 0x00d09ba9 in _IO_new_file_xsputn () from /lib/libc.so.6
#4 0x00cdf31e in vfprintf () from /lib/libc.so.6
#5 0x00ce995f in fprintf () from /lib/libc.so.6
#6 0x081a3692 in _condor_dprintf_va (flags=33554432,
fmt=0x82b3da4 "Leaving thread safe %s [%s] in %s:%d %s()\n",
args=0xbf8e9ca8 "O=+\b|\362-\bp\361-\b\236") at dprintf.c:323
#7 0x081a4fe9 in dprintf (flags=33554432,
fmt=0x82b3da4 "Leaving thread safe %s [%s] in %s:%d %s()\n")
at dprintf_common.c:76
#8 0x081aa80c in _mark_thread_safe (start_or_stop=2, dologging=1,
descrip=0x82df27c "recv", func=0x82df274 "unknown",
file=0x82df170 "condor_rw.cpp", line=158) at mark_thread.c:107
#9 0x0826c929 in condor_read (peer_description=0x8c5409f "<127.0.0.1:33741>",
fd=21, buf=0xbf8e9df7 "\001", sz=5, timeout=20, flags=0)
at condor_rw.cpp:158
#10 0x0826146b in ReliSock::RcvMsg::rcv_packet (this=0x8c54114,
peer_description=0x8c5409f "<127.0.0.1:33741>", _sock=21, _timeout=20)
at reli_sock.cpp:677
#11 0x08260ce0 in ReliSock::handle_incoming_packet (this=0x8c54010)
at reli_sock.cpp:449
#12 0x08261154 in ReliSock::get_bytes (this=0x8c54010, dta=0xbf8e9ea4,
max_sz=4) at reli_sock.cpp:595
#13 0x0825afe6 in Stream::get (this=0x8c54010, i=@0xbf8e9ff8)
at stream.cpp:1468
#14 0x0825897e in Stream::code (this=0x8c54010, i=@0xbf8e9ff8)
at stream.cpp:147
#15 0x0813986d in do_Q_request (qmgmt_sock=0x8c54010, may_fork=@0xbf8ea02e)
at qmgmt_receivers.cpp:56
#16 0x08130bb4 in handle_q (sock=0x8c54010) at qmgmt.cpp:1314
#17 0x0818a971 in DaemonCore::CallCommandHandler (this=0x8ba94c8, req=1111,
stream=0x8c54010, delete_stream=false) at daemon_core.cpp:3532
#18 0x0818de7e in DaemonCore::HandleReq (this=0x8ba94c8, insock=0x8bd5b70,
asock=0x8c54010) at daemon_core.cpp:4854
#19 0x0818afb3 in DaemonCore::HandleReq (this=0x8ba94c8, socki=3,
asock=0x8c54010) at daemon_core.cpp:3754
#20 0x0818a5bb in DaemonCore::CallSocketHandler_worker (this=0x8ba94c8, i=3,
default_to_HandleCommand=true, asock=0x8c54010) at daemon_core.cpp:3450
#21 0x0818a30b in DaemonCore::CallSocketHandler_worker_demarshall (
arg=0x8c48c58) at daemon_core.cpp:3404
#22 0x08232b7b in CondorThreads::pool_add (
routine=0x818a2d0 <DaemonCore::CallSocketHandler_worker_demarshall(void*)>, arg=0x8c48c58, tid=0x0, descrip=0x8bd5370 "DC Command Handler")
at condor_threads.cpp:1106
#23 0x0818a2ce in DaemonCore::CallSocketHandler (this=0x8ba94c8,
i=@0xbf8ea5e8, default_to_HandleCommand=true) at daemon_core.cpp:3394
#24 0x08189fdc in DaemonCore::Driver (this=0x8ba94c8) at daemon_core.cpp:3307
#25 0x0819de00 in main (argc=1, argv=0xbf8eac08) at daemon_core_main.cpp:2276
$ cat log/SchedLog* Now in new log file /home/matt/Documents/Condor/Installation/log/SchedLog 01/14 23:10:50 (fd:20) (pid:28029) Leaving thread safe stop [recv] in condor_rw.cpp:158 unknown() 01/14 23:10:50 (fd:20) (pid:28029) selector 0xbfa2699c resetting 01/14 23:10:50 (fd:20) (pid:28029) condor_read(fd=19 <127.0.0.1:44717>,,size=5,timeout=1,flags=0) 01/14 23:10:50 (fd:20) (pid:28029) selector 0xbfa2699c adding fd 19 (socket:[276861]) 01/14 23:10:50 (fd:20) (pid:28029) condor_read(): fd=19 01/14 23:10:50 (fd:20) (pid:28029) Entering thread safe start [select] in selector.cpp:311 unknown() 01/14 23:10:50 (fd:20) (pid:28029) Leaving thread safe start [select] in selector.cpp:311 unknown() 01/14 23:10:50 (fd:20) (pid:28029) Entering thread safe stop [select] in selector.cpp:317 unknown() 01/14 23:10:50 GLEXEC_JOB is undefined, using default value of False Stack dump for process 28029 at timestamp 1263528650 (28 frames) condor_schedd(dprintf_dump_stack+0xc7)[0x81a4f53] condor_schedd[0x81a512a] [0x8bc400] /lib/libc.so.6(_IO_free_backup_area+0x34)[0xb77af4] /lib/libc.so.6(_IO_file_overflow+0x1f2)[0xb759f2] /lib/libc.so.6(_IO_file_xsputn+0x69)[0xb74ba9] /lib/libc.so.6(_IO_vfprintf+0x10e)[0xb4a31e] /lib/libc.so.6(_IO_fprintf+0x1f)[0xb5495f] condor_schedd(_condor_dprintf_va+0x273)[0x81a3692] condor_schedd(dprintf+0x25)[0x81a4fe9] condor_schedd(_mark_thread_safe+0x12c)[0x81aa80c] condor_schedd(_ZN8Selector7executeEv+0x138)[0x81ab480] condor_schedd(_Z11condor_readPKciPciii+0x261)[0x826c74d] condor_schedd(_ZN8ReliSock6RcvMsg10rcv_packetEPKcii+0x67)[0x826146b] condor_schedd(_ZN8ReliSock22handle_incoming_packetEv+0x70)[0x8260ce0] condor_schedd(_ZN8ReliSock9get_bytesEPvi+0x2e)[0x8261154] condor_schedd(_ZN6Stream3getERi+0x90)[0x825afe6] condor_schedd(_ZN6Stream4codeERi+0x44)[0x825897e] condor_schedd(_ZN10DaemonCore9HandleReqEP6StreamS1_+0xcb6)[0x818bc6c] condor_schedd(_ZN10DaemonCore9HandleReqEiP6Stream+0x39)[0x818afb3] condor_schedd(_ZN10DaemonCore24CallSocketHandler_workerEibP6Stream+0x29f)[0x818a5bb] condor_schedd(_ZN10DaemonCore35CallSocketHandler_worker_demarshallEPv+0x3b)[0x818a30b] condor_schedd(_ZN13CondorThreads8pool_addEPFvPvES0_PiPKc+0x29)[0x8232b7b] condor_schedd(_ZN10DaemonCore17CallSocketHandlerERib+0x1bc)[0x818a2ce] condor_schedd(_ZN10DaemonCore6DriverEv+0x180e)[0x8189fdc] condor_schedd(main+0x1ce0)[0x819de00] /lib/libc.so.6(__libc_start_main+0xe6)[0xb21bb6] condor_schedd[0x80fc181] Now in new log file /home/matt/Documents/Condor/Installation/log/SchedLog 01/14 23:10:50 (fd:20) (pid:28029) Leaving thread safe start [select] in selector.cpp:311 unknown() Now in new log file /home/matt/Documents/Condor/Installation/log/SchedLog WARNING: Failed to rotate log into file /home/matt/Documents/Condor/Installation/log/SchedLog.old! Likely cause is that another Condor process rotated the file at the same time. 01/14 23:10:50 (fd:20) (pid:28059) ForkWork: Child 28059 done, status 0 01/14 23:10:50 (fd:20) (pid:28029) Entering thread safe stop [select] in selector.cpp:317 unknown() 01/14 23:10:50 (fd:20) (pid:28029) Leaving thread safe stop [select] in selector.cpp:317 unknown() 01/14 23:10:50 (fd:20) (pid:28029) condor_read(): select returned 1 01/14 23:10:50 (fd:20) (pid:28029) Entering thread safe start [recv] in condor_rw.cpp:156 unknown() 01/14 23:10:50 (fd:20) (pid:28029) Leaving thread safe start [recv] in condor_rw.cpp:156 unknown() 01/14 23:10:50 (fd:20) (pid:28029) Entering thread safe stop [recv] in condor_rw.cpp:158 unknown() MaxLog = 1024, length = 0 Saving log file to "/home/matt/Documents/Condor/Installation/log/SchedLog.old" #0 0x00b7e308 in free () from /lib/libc.so.6
#1 0x00b77af4 in _IO_free_backup_area_internal () from /lib/libc.so.6
#2 0x00b759f2 in _IO_new_file_overflow () from /lib/libc.so.6
#3 0x00b74ba9 in _IO_new_file_xsputn () from /lib/libc.so.6
#4 0x00b4a31e in vfprintf () from /lib/libc.so.6
#5 0x00b5495f in fprintf () from /lib/libc.so.6
#6 0x081a3692 in _condor_dprintf_va (flags=33554432,
fmt=0x82b3da4 "Leaving thread safe %s [%s] in %s:%d %s()\n",
args=0xbfa268e8 "O=+\b2B+\buA+\b=\001") at dprintf.c:323
#7 0x081a4fe9 in dprintf (flags=33554432,
fmt=0x82b3da4 "Leaving thread safe %s [%s] in %s:%d %s()\n")
at dprintf_common.c:76
#8 0x081aa80c in _mark_thread_safe (start_or_stop=2, dologging=1,
descrip=0x82b4232 "select", func=0x82b422a "unknown",
file=0x82b4175 "selector.cpp", line=317) at mark_thread.c:107
#9 0x081ab480 in Selector::execute (this=0xbfa2699c) at selector.cpp:317
#10 0x0826c74d in condor_read (peer_description=0xa2f4e2f "<127.0.0.1:44717>",
fd=19, buf=0xbfa26a77 "", sz=5, timeout=1, flags=0) at condor_rw.cpp:120
#11 0x0826146b in ReliSock::RcvMsg::rcv_packet (this=0xa2f4ea4,
peer_description=0xa2f4e2f "<127.0.0.1:44717>", _sock=19, _timeout=1)
at reli_sock.cpp:677
#12 0x08260ce0 in ReliSock::handle_incoming_packet (this=0xa2f4da0)
at reli_sock.cpp:449
#13 0x08261154 in ReliSock::get_bytes (this=0xa2f4da0, dta=0xbfa26b24,
max_sz=4) at reli_sock.cpp:595
#14 0x0825afe6 in Stream::get (this=0xa2f4da0, i=@0xbfa26db0)
at stream.cpp:1468
#15 0x0825897e in Stream::code (this=0xa2f4da0, i=@0xbfa26db0)
at stream.cpp:147
#16 0x0818bc6c in DaemonCore::HandleReq (this=0xa2c34c8, insock=0xa2c85f0,
asock=0xa2f4da0) at daemon_core.cpp:4124
#17 0x0818afb3 in DaemonCore::HandleReq (this=0xa2c34c8, socki=0,
asock=0xa2f4da0) at daemon_core.cpp:3754
#18 0x0818a5bb in DaemonCore::CallSocketHandler_worker (this=0xa2c34c8, i=0,
default_to_HandleCommand=true, asock=0xa2f4da0) at daemon_core.cpp:3450
#19 0x0818a30b in DaemonCore::CallSocketHandler_worker_demarshall (
arg=0xa3337a0) at daemon_core.cpp:3404
#20 0x08232b7b in CondorThreads::pool_add (
routine=0x818a2d0 <DaemonCore::CallSocketHandler_worker_demarshall(void*)>, arg=0xa3337a0, tid=0x0, descrip=0xa2c2d78 "DC Command Handler")
at condor_threads.cpp:1106
#21 0x0818a2ce in DaemonCore::CallSocketHandler (this=0xa2c34c8,
i=@0xbfa270c8, default_to_HandleCommand=true) at daemon_core.cpp:3394
#22 0x08189fdc in DaemonCore::Driver (this=0xa2c34c8) at daemon_core.cpp:3307
#23 0x0819de00 in main (argc=1, argv=0xbfa276e8) at daemon_core_main.cpp:2276
#0 0x0016d26d in vfprintf () from /lib/libc.so.6
#1 0x081a3766 in _condor_dprintf_va (flags=33554432,
fmt=0x82b3da4 "Leaving thread safe %s [%s] in %s:%d %s()\n",
args=0xbfa78658 "I=+\b|\362-\bp\361-\b\234") at dprintf.c:353
#2 0x081a4fe9 in dprintf (flags=33554432,
fmt=0x82b3da4 "Leaving thread safe %s [%s] in %s:%d %s()\n")
at dprintf_common.c:76
#3 0x081aa80c in _mark_thread_safe (start_or_stop=1, dologging=1,
descrip=0x82df27c "recv", func=0x82df274 "unknown",
file=0x82df170 "condor_rw.cpp", line=156) at mark_thread.c:107
#4 0x0826c8c1 in condor_read (peer_description=0x890664f "<127.0.0.1:39081>",
fd=19, buf=0xbfa788fb "", sz=4, timeout=1, flags=2) at condor_rw.cpp:156
#5 0x0818b995 in DaemonCore::HandleReq (this=0x88d74c8, insock=0x88dc5f0,
asock=0x89065c0) at daemon_core.cpp:4064
#6 0x0818afb3 in DaemonCore::HandleReq (this=0x88d74c8, socki=0,
asock=0x89065c0) at daemon_core.cpp:3754
#7 0x0818a5bb in DaemonCore::CallSocketHandler_worker (this=0x88d74c8, i=0,
default_to_HandleCommand=true, asock=0x89065c0) at daemon_core.cpp:3450
#8 0x0818a30b in DaemonCore::CallSocketHandler_worker_demarshall (
arg=0x8908c50) at daemon_core.cpp:3404
#9 0x08232b7b in CondorThreads::pool_add (
routine=0x818a2d0 <DaemonCore::CallSocketHandler_worker_demarshall(void*)>, arg=0x8908c50, tid=0x0, descrip=0x88d6d78 "DC Command Handler")
at condor_threads.cpp:1106
#10 0x0818a2ce in DaemonCore::CallSocketHandler (this=0x88d74c8,
i=@0xbfa78cb8, default_to_HandleCommand=true) at daemon_core.cpp:3394
#11 0x08189fdc in DaemonCore::Driver (this=0x88d74c8) at daemon_core.cpp:3307
#12 0x0819de00 in main (argc=1, argv=0xbfa792d8) at daemon_core_main.cpp:2276
$ condor_schedd -f
Now in new log file /home/matt/Documents/Condor/Installation/log/SchedLog
01/14 23:16:02 (fd:2) (pid:7683) Read {*** glibc detected *** condor_schedd: double free or corruption (!prev): 0x08664c30 ***
======= Backtrace: =========
/lib/libc.so.6(+0x6e751)[0x451751]
condor_schedd[0x824be2b]
condor_schedd(_Z7ScannerRPKcR5Token+0xa8)[0x824bbfb]
$ cat log/dprintf_failure.SCHEDD dprintf() had a fatal error in pid 12904 Error writing debug log errno: 9 (Bad file descriptor) euid: 500, ruid: 500 $ cat log/dprintf_failure.SCHEDD dprintf() had a fatal error in pid 16473 Error writing debug log errno: 9 (Bad file descriptor) euid: 500, ruid: 500 I'm closing this until it can be reproduced on EL. |