Bug 486462 - query workers during log rotation crash schedd and collector
query workers during log rotation crash schedd and collector
Status: CLOSED WORKSFORME
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor (Show other bugs)
1.1
All Linux
low Severity medium
: 1.3
: ---
Assigned To: Matthew Farrellee
MRG Quality Engineering
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-02-19 16:08 EST by Matthew Farrellee
Modified: 2010-01-25 13:26 EST (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-01-25 13:26:57 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Matthew Farrellee 2009-02-19 16:08:54 EST
[root@node-14 ~]# ls -al /var/log/condor/SchedLog* /var/log/condor/dprintf_failure.SCHEDD 
-rw-r--r-- 1 condor condor     118 Feb 19 11:00 /var/log/condor/dprintf_failure.SCHEDD
-rw-r--r-- 1 condor condor 1000148 Feb 19 11:00 /var/log/condor/SchedLog.old

[root@node-14 ~]# cat /var/log/condor/dprintf_failure.SCHEDD 
dprintf() had a fatal error in pid 12280
Can't fclose debug log file
errno: 9 (Bad file descriptor)
euid: 64, ruid: 0

[root@node-14 condor]# tail -n 3 /var/log/condor/SchedLog.old
2/19 11:00:07 (pid:12280) .. 12725, 2228.61, F, <10.1.3.611:33648>, cur_hosts=1, status=2
MaxLog = 1000000, length = 1000063
Saving log file to "/var/log/condor/SchedLog.old"

[root@node-14 condor]# grep "status 44" MasterLog
2/19 10:39:09 The SCHEDD (pid 26359) exited with status 44
2/19 10:50:50 The SCHEDD (pid 12033) exited with status 44
2/19 11:00:07 The SCHEDD (pid 12280) exited with status 44
Comment 1 Matthew Farrellee 2009-02-19 16:26:33 EST
SchedLog.old spans 2/19 11:00:04 to 2/19 11:00:07, so it was rotating rather quickly

setting MAX_SCHEDD_LOG low and SCHEDD_DEBUG high may assist in reproducing this
Comment 2 Matthew Farrellee 2009-02-19 16:58:00 EST
This exists in at least condor-7.2.2-0.1.el5
Comment 3 Matthew Farrellee 2010-01-15 07:41:30 EST
I believe this is related to query workers and log rotation. For instance, the log being rotated by a query worker at the same time the schedd tries to rotate the log.

To reproduce, run

while true; do condor_q  > /dev/null 2>&1 ; done

against a condor_schedd configured with

MAX_SCHEDD_LOG = 1024
SCHEDD_DEBUG = D_FULLDEBUG
SCHEDD_QUERY_WORKERS = 1

Set D_ALL reproduces faster.

Set SCHEDD_QUERY_WORKERS = 0 to workaround the issue.
Comment 4 Matthew Farrellee 2010-01-15 07:46:57 EST
This appears to also be an issue for the Collector.

MAX_COLLECTOR_LOG = 1024
COLLECTOR_DEBUG = D_ALL
COLLECTOR_QUERY_WORKERS = 1
Comment 5 Matthew Farrellee 2010-01-15 07:47:32 EST
In the 7.4 series, schedd defaults to 3 workers and collector defaults to 2.
Comment 6 Matthew Farrellee 2010-01-23 08:22:48 EST
The latest failures were on F12 built binaries.
Comment 7 Matthew Farrellee 2010-01-25 12:06:46 EST
*** glibc detected *** condor_schedd: double free or corruption (!prev): 0x08d66190 ***
======= Backtrace: =========
/lib/libc.so.6(+0x6e751)[0x76e751]
/lib/libc.so.6(fclose+0x14a)[0x75e5ea]
condor_schedd(fclose_wrapper+0x6a)[0x81a4ad4]
condor_schedd(preserve_log_file+0xc3)[0x81a40fe]
condor_schedd(debug_lock+0x399)[0x81a3e13]
condor_schedd(_condor_dprintf_va+0x210)[0x81a362f]
condor_schedd(dprintf+0x25)[0x81a4fe9]
condor_schedd(_mark_thread_safe+0x12c)[0x81aa80c]
condor_schedd(_Z12condor_writePKciPciii+0x5d0)[0x826d0b3]
condor_schedd(_ZN3Buf5writeEPKciii+0x85)[0x8264923]
condor_schedd(_ZN3Buf5flushEPKciPvii+0x87)[0x82649e9]
condor_schedd(_ZN8ReliSock6SndMsg10snd_packetEPKciii+0xf8)[0x82618ee]
condor_schedd(_ZN8ReliSock14end_of_messageEv+0xab)[0x8260da7]
condor_schedd(_Z12do_Q_requestP8ReliSockRb+0x1170)[0x813a9ae]
condor_schedd(_Z8handle_qP7ServiceiP6Stream+0xe0)[0x8130bb4]
condor_schedd(_ZN10DaemonCore18CallCommandHandlerEiP6Streamb+0x187)[0x818a971]
condor_schedd(_ZN10DaemonCore9HandleReqEP6StreamS1_+0x2ec8)[0x818de7e]
condor_schedd(_ZN10DaemonCore9HandleReqEiP6Stream+0x39)[0x818afb3]
condor_schedd(_ZN10DaemonCore24CallSocketHandler_workerEibP6Stream+0x29f)[0x818a5bb]
condor_schedd(_ZN10DaemonCore35CallSocketHandler_worker_demarshallEPv+0x3b)[0x818a30b]
condor_schedd(_ZN13CondorThreads8pool_addEPFvPvES0_PiPKc+0x29)[0x8232b7b]
condor_schedd(_ZN10DaemonCore17CallSocketHandlerERib+0x1bc)[0x818a2ce]
condor_schedd(_ZN10DaemonCore6DriverEv+0x180e)[0x8189fdc]
condor_schedd(main+0x1ce0)[0x819de00]
/lib/libc.so.6(__libc_start_main+0xe6)[0x716bb6]
condor_schedd[0x80fc181]
======= Memory map: ========
00110000-0013d000 r-xp 00000000 fd:00 533893     /usr/lib/libgsoapssl++.so.0.0.0
0013d000-0013f000 rw-p 0002c000 fd:00 533893     /usr/lib/libgsoapssl++.so.0.0.0
0013f000-0015c000 r-xp 00000000 fd:00 221510     /lib/libgcc_s-4.4.2-20091222.so.1
0015c000-0015d000 rw-p 0001c000 fd:00 221510     /lib/libgcc_s-4.4.2-20091222.so.1
0015d000-00187000 r-xp 00000000 fd:00 221866     /lib/libk5crypto.so.3.1
00187000-00188000 rw-p 0002a000 fd:00 221866     /lib/libk5crypto.so.3.1
00188000-001cd000 r-xp 00000000 fd:00 533441     /usr/lib/libfreebl3.so
001cd000-001ce000 rw-p 00044000 fd:00 533441     /usr/lib/libfreebl3.so
001ce000-001d2000 rw-p 00000000 00:00 0 
001d2000-001d6000 r-xp 00000000 fd:00 223762     /lib/libuuid.so.1.3.0
001d6000-001d7000 rw-p 00003000 fd:00 223762     /lib/libuuid.so.1.3.0
001d9000-001f7000 r-xp 00000000 fd:00 107254     /lib/ld-2.11.1.so
001f7000-001f8000 r--p 0001d000 fd:00 107254     /lib/ld-2.11.1.so
001f8000-001f9000 rw-p 0001e000 fd:00 107254     /lib/ld-2.11.1.so
001f9000-0022b000 r-xp 00000000 fd:01 8593760    /home/matt/Documents/Condor/CONDOR_SRC/src/release_dir/libexec/MgmtScheddPlugin-plugin.so
0022b000-0022d000 rw-p 00031000 fd:01 8593760    /home/matt/Documents/Condor/CONDOR_SRC/src/release_dir/libexec/MgmtScheddPlugin-plugin.so
0022d000-0033a000 r-xp 00000000 fd:00 536727     /usr/lib/libqpidclient.so.3.0.0
0033a000-0033f000 rw-p 0010c000 fd:00 536727     /usr/lib/libqpidclient.so.3.0.0
0033f000-00340000 rw-p 00000000 00:00 0 
00340000-0035a000 r-xp 00000000 fd:00 537303     /usr/lib/libqmf.so.1.0.0
0035a000-0035b000 rw-p 0001a000 fd:00 537303     /usr/lib/libqmf.so.1.0.0
0035b000-0039c000 r-xp 00000000 fd:00 534306     /usr/lib/libboost_program_options.so.1.39.0
0039c000-0039e000 rw-p 00040000 fd:00 534306     /usr/lib/libboost_program_options.so.1.39.0
0039e000-003b7000 r-xp 00000000 fd:00 533370     /usr/lib/libsasl2.so.2.0.23
003b7000-003b8000 rw-p 00018000 fd:00 533370     /usr/lib/libsasl2.so.2.0.23
003b8000-003bc000 r-xp 00000000 fd:00 534504     /usr/lib/sasl2/libplain.so.2.0.23
003bc000-003bd000 rw-p 00003000 fd:00 534504     /usr/lib/sasl2/libplain.so.2.0.23
003bd000-003c1000 r-xp 00000000 fd:00 534501     /usr/lib/sasl2/liblogin.so.2.0.23
003c1000-003c2000 rw-p 00003000 fd:00 534501     /usr/lib/sasl2/liblogin.so.2.0.23
003c2000-003c6000 r-xp 00000000 fd:00 533374     /usr/lib/sasl2/libanonymous.so.2.0.23
003c6000-003c7000 rw-p 00003000 fd:00 533374     /usr/lib/sasl2/libanonymous.so.2.0.23
003c7000-003cb000 r-xp 00000000 fd:00 536745     /usr/lib/sasl2/libcrammd5.so.2.0.23
003cb000-003cc000 rw-p 00003000 fd:00 536745     /usr/lib/sasl2/libcrammd5.so.2.0.23
003d0000-003e2000 r-xp 00000000 fd:00 223842     /lib/libz.so.1.2.3
003e2000-003e3000 rw-p 00011000 fd:00 223842     /lib/libz.so.1.2.3
003e3000-00456000 r-xp 00000000 fd:00 537378     /usr/lib/libqmfengine.so.1.0.1
00456000-00458000 rw-p 00073000 fd:00 537378     /usr/lib/libqmfengine.so.1.0.1
004d0000-004ec000 r-xp 00000000 fd:00 223830     /lib/libselinux.so.1
004ec000-004ed000 r--p 0001b000 fd:00 223830     /lib/libselinux.so.1
004ed000-004ee000 rw-p 0001c000 fd:00 223830     /lib/libselinux.so.1
00531000-00547000 r-xp 00000000 fd:00 221481     /lib/libpthread-2.11.1.so
00547000-00548000 r--p 00015000 fd:00 221481     /lib/libpthread-2.11.1.so
00548000-00549000 rw-p 00016000 fd:00 221481     /lib/libpthread-2.11.1.so
00549000-0054b000 rw-p 00000000 00:00 0 
0058d000-005b5000 r-xp 00000000 fd:00 221459     /lib/libm-2.11.1.so
005b5000-005b6000 r--p 00027000 fd:00 221459     /lib/libm-2.11.1.so
005b6000-005b7000 rw-p 00028000 fd:00 221459     /lib/libm-2.11.1.so
00653000-00656000 r-xp 00000000 fd:00 221455     /lib/libdl-2.11.1.so
00656000-00657000 r--p 00002000 fd:00 221455     /lib/libdl-2.11.1.so
00657000-00658000 rw-p 00003000 fd:00 221455     /lib/libdl-2.11.1.so
00700000-0086f000 r-xp 00000000 fd:00 221446     /lib/libc-2.11.1.so
0086f000-00871000 r--p 0016e000 fd:00 221446     /lib/libc-2.11.1.so
00871000-00872000 rw-p 00170000 fd:00 221446     /lib/libc-2.11.1.so
00872000-00875000 rw-p 00000000 00:00 0 
008a8000-008b0000 r-xp 00000000 fd:00 221875     /lib/libkrb5support.so.0.1
008b0000-008b1000 rw-p 00008000 fd:00 221875     /lib/libkrb5support.so.0.1
00927000-00928000 r-xp 00000000 00:00 0          [vdso]
00a97000-00a9a000 r-xp 00000000 fd:00 172765     /usr/lib/libboost_system.so.1.39.0
00a9a000-00a9b000 rw-p 00002000 fd:00 172765     /usr/lib/libboost_system.so.1.39.0
00a9d000-00aae000 r-xp 00000000 fd:00 172770     /usr/lib/libboost_filesystem.so.1.39.0
00aae000-00aaf000 rw-p 00011000 fd:00 172770     /usr/lib/libboost_filesystem.so.1.39.0
00abb000-00ac7000 r-xp 00000000 fd:00 536748     /usr/lib/sasl2/libdigestmd5.so.2.0.23
00ac7000-00ac8000 rw-p 0000b000 fd:00 536748     /usr/lib/sasl2/libdigestmd5.so.2.0.23
00ad3000-00b86000 r-xp 00000000 fd:00 221873     /lib/libkrb5.so.3.3
00b86000-00b8c000 rw-p 000b3000 fd:00 221873     /lib/libkrb5.so.3.3
00c53000-00c5a000 r-xp 00000000 fd:00 221452     /lib/libcrypt-2.11.1.so
00c5a000-00c5b000 r--p 00007000 fd:00 221452     /lib/libcrypt-2.11.1.so
00c5b000-00c5c000 rw-p 00008000 fd:00 221452     /lib/libcrypt-2.11.1.so
00c5c000-00c83000 rw-p 00000000 00:00 0 
00d5c000-00d63000 r-xp 00000000 fd:00 221485     /lib/librt-2.11.1.so
00d63000-00d64000 r--p 00006000 fd:00 221485     /lib/librt-2.11.1.so
00d64000-00d65000 rw-p 00007000 fd:00 221485     /lib/librt-2.11.1.so
00dd3000-00dd5000 r-xp 00000000 fd:00 223858     /lib/libcom_err.so.2.1
00dd5000-00dd6000 rw-p 00002000 fd:00 223858     /lib/libcom_err.so.2.1
00dd8000-00dda000 r-xp 00000000 fd:00 223855     /lib/libkeyutils-1.2.so
00dda000-00ddb000 rw-p 00001000 fd:00 223855     /lib/libkeyutils-1.2.so
00e0c000-00e11000 r-xp 00000000 fd:00 533377     /usr/lib/sasl2/libsasldb.so.2.0.23
00e11000-00e12000 rw-p 00004000 fd:00 533377     /usr/lib/sasl2/libsasldb.so.2.0.23
00e29000-00e56000 r-xp 00000000 fd:00 221728     /lib/libgssapi_krb5.so.2.2
00e56000-00e57000 rw-p 0002d000 fd:00 221728     /lib/libgssapi_krb5.so.2.2
00eed000-00f01000 r-xp 00000000 fd:00 221483     /lib/libresolv-2.11.1.so
00f01000-00f02000 ---p 00014000 fd:00 221483     /lib/libresolv-2.11.1.so
00f02000-00f03000 r--p 00014000 fd:00 221483     /lib/libresolv-2.11.1.so
00f03000-00f04000 rw-p 00015000 fd:00 221483     /lib/libresolv-2.11.1.sozsh: segmentation fault (core dumped)  condor_schedd -f
Comment 8 Matthew Farrellee 2010-01-25 12:07:03 EST
#0  0x0065526d in vfprintf () from /lib/libc.so.6
#1  0x0065f95f in fprintf () from /lib/libc.so.6
#2  0x081a40dc in preserve_log_file (debug_level=0) at dprintf.c:649
#3  0x081a3e13 in debug_lock (debug_level=0) at dprintf.c:565
#4  0x081a362f in _condor_dprintf_va (flags=4096, 
    fmt=0x82bc258 "%s is undefined, using default value of %s\n", 
    args=0xb61fe768 "\251\262*\bP\302+\b") at dprintf.c:310
#5  0x081a4fe9 in dprintf (flags=4096, 
    fmt=0x82bc258 "%s is undefined, using default value of %s\n")
    at dprintf_common.c:76
#6  0x081d78af in param_boolean (name=0x82ab2a9 "GLEXEC_JOB", 
    default_value=false, do_log=true, me=0x0, target=0x0, use_param_table=true)
    at condor_config.cpp:1857
#7  0x0818e91a in DaemonCore::Send_Signal (this=0x98a44c8, msg=..., 
    nonblocking=false) at daemon_core.cpp:5143
#8  0x0818e636 in DaemonCore::Send_Signal (this=0x98a44c8, pid=23370, sig=17)
    at daemon_core.cpp:5072
#9  0x0819bda6 in unix_sigchld () at daemon_core_main.cpp:1283
#10 <signal handler called>
#11 0x00869424 in __kernel_vsyscall ()
#12 0x006f1646 in epoll_wait () from /lib/libc.so.6
#13 0x00ebc6c1 in qpid::sys::Poller::wait(qpid::sys::Duration) ()
   from /usr/lib/libqpidcommon.so.3
#14 0x00ebcd18 in qpid::sys::Poller::run() () from /usr/lib/libqpidcommon.so.3
#15 0x00f83ec5 in qpid::sys::Dispatcher::run() ()
   from /usr/lib/libqpidcommon.so.3
#16 0x00b5cd70 in qpid::client::TCPConnector::run() ()
   from /usr/lib/libqpidclient.so.3
#17 0x00eb1dc2 in ?? () from /usr/lib/libqpidcommon.so.3
#18 0x00473ab5 in start_thread () from /lib/libpthread.so.0
#19 0x006f0dce in clone () from /lib/libc.so.6
Comment 9 Matthew Farrellee 2010-01-25 12:07:16 EST
#0  0x00d13308 in free () from /lib/libc.so.6
#1  0x00d0caf4 in _IO_free_backup_area_internal () from /lib/libc.so.6
#2  0x00d0a9f2 in _IO_new_file_overflow () from /lib/libc.so.6
#3  0x00d09ba9 in _IO_new_file_xsputn () from /lib/libc.so.6
#4  0x00cdf31e in vfprintf () from /lib/libc.so.6
#5  0x00ce995f in fprintf () from /lib/libc.so.6
#6  0x081a3692 in _condor_dprintf_va (flags=33554432, 
    fmt=0x82b3da4 "Leaving thread safe %s [%s] in %s:%d %s()\n", 
    args=0xbf8e9ca8 "O=+\b|\362-\bp\361-\b\236") at dprintf.c:323
#7  0x081a4fe9 in dprintf (flags=33554432, 
    fmt=0x82b3da4 "Leaving thread safe %s [%s] in %s:%d %s()\n")
    at dprintf_common.c:76
#8  0x081aa80c in _mark_thread_safe (start_or_stop=2, dologging=1, 
    descrip=0x82df27c "recv", func=0x82df274 "unknown", 
    file=0x82df170 "condor_rw.cpp", line=158) at mark_thread.c:107
#9  0x0826c929 in condor_read (peer_description=0x8c5409f "<127.0.0.1:33741>", 
    fd=21, buf=0xbf8e9df7 "\001", sz=5, timeout=20, flags=0)
    at condor_rw.cpp:158
#10 0x0826146b in ReliSock::RcvMsg::rcv_packet (this=0x8c54114, 
    peer_description=0x8c5409f "<127.0.0.1:33741>", _sock=21, _timeout=20)
    at reli_sock.cpp:677
#11 0x08260ce0 in ReliSock::handle_incoming_packet (this=0x8c54010)
    at reli_sock.cpp:449
#12 0x08261154 in ReliSock::get_bytes (this=0x8c54010, dta=0xbf8e9ea4, 
    max_sz=4) at reli_sock.cpp:595
#13 0x0825afe6 in Stream::get (this=0x8c54010, i=@0xbf8e9ff8)
    at stream.cpp:1468
#14 0x0825897e in Stream::code (this=0x8c54010, i=@0xbf8e9ff8)
    at stream.cpp:147
#15 0x0813986d in do_Q_request (qmgmt_sock=0x8c54010, may_fork=@0xbf8ea02e)
    at qmgmt_receivers.cpp:56
#16 0x08130bb4 in handle_q (sock=0x8c54010) at qmgmt.cpp:1314
#17 0x0818a971 in DaemonCore::CallCommandHandler (this=0x8ba94c8, req=1111, 
    stream=0x8c54010, delete_stream=false) at daemon_core.cpp:3532
#18 0x0818de7e in DaemonCore::HandleReq (this=0x8ba94c8, insock=0x8bd5b70, 
    asock=0x8c54010) at daemon_core.cpp:4854
#19 0x0818afb3 in DaemonCore::HandleReq (this=0x8ba94c8, socki=3, 
    asock=0x8c54010) at daemon_core.cpp:3754
#20 0x0818a5bb in DaemonCore::CallSocketHandler_worker (this=0x8ba94c8, i=3, 
    default_to_HandleCommand=true, asock=0x8c54010) at daemon_core.cpp:3450
#21 0x0818a30b in DaemonCore::CallSocketHandler_worker_demarshall (
    arg=0x8c48c58) at daemon_core.cpp:3404
#22 0x08232b7b in CondorThreads::pool_add (
    routine=0x818a2d0 <DaemonCore::CallSocketHandler_worker_demarshall(void*)>, arg=0x8c48c58, tid=0x0, descrip=0x8bd5370 "DC Command Handler")
    at condor_threads.cpp:1106
#23 0x0818a2ce in DaemonCore::CallSocketHandler (this=0x8ba94c8, 
    i=@0xbf8ea5e8, default_to_HandleCommand=true) at daemon_core.cpp:3394
#24 0x08189fdc in DaemonCore::Driver (this=0x8ba94c8) at daemon_core.cpp:3307
#25 0x0819de00 in main (argc=1, argv=0xbf8eac08) at daemon_core_main.cpp:2276
Comment 10 Matthew Farrellee 2010-01-25 12:07:40 EST
$ cat log/SchedLog*
Now in new log file /home/matt/Documents/Condor/Installation/log/SchedLog
01/14 23:10:50 (fd:20) (pid:28029) Leaving thread safe stop [recv] in condor_rw.cpp:158 unknown()
01/14 23:10:50 (fd:20) (pid:28029) selector 0xbfa2699c resetting
01/14 23:10:50 (fd:20) (pid:28029) condor_read(fd=19 <127.0.0.1:44717>,,size=5,timeout=1,flags=0)
01/14 23:10:50 (fd:20) (pid:28029) selector 0xbfa2699c adding fd 19 (socket:[276861])
01/14 23:10:50 (fd:20) (pid:28029) condor_read(): fd=19
01/14 23:10:50 (fd:20) (pid:28029) Entering thread safe start [select] in selector.cpp:311 unknown()
01/14 23:10:50 (fd:20) (pid:28029) Leaving thread safe start [select] in selector.cpp:311 unknown()
01/14 23:10:50 (fd:20) (pid:28029) Entering thread safe stop [select] in selector.cpp:317 unknown()
01/14 23:10:50 GLEXEC_JOB is undefined, using default value of False
Stack dump for process 28029 at timestamp 1263528650 (28 frames)
condor_schedd(dprintf_dump_stack+0xc7)[0x81a4f53]
condor_schedd[0x81a512a]
[0x8bc400]
/lib/libc.so.6(_IO_free_backup_area+0x34)[0xb77af4]
/lib/libc.so.6(_IO_file_overflow+0x1f2)[0xb759f2]
/lib/libc.so.6(_IO_file_xsputn+0x69)[0xb74ba9]
/lib/libc.so.6(_IO_vfprintf+0x10e)[0xb4a31e]
/lib/libc.so.6(_IO_fprintf+0x1f)[0xb5495f]
condor_schedd(_condor_dprintf_va+0x273)[0x81a3692]
condor_schedd(dprintf+0x25)[0x81a4fe9]
condor_schedd(_mark_thread_safe+0x12c)[0x81aa80c]
condor_schedd(_ZN8Selector7executeEv+0x138)[0x81ab480]
condor_schedd(_Z11condor_readPKciPciii+0x261)[0x826c74d]
condor_schedd(_ZN8ReliSock6RcvMsg10rcv_packetEPKcii+0x67)[0x826146b]
condor_schedd(_ZN8ReliSock22handle_incoming_packetEv+0x70)[0x8260ce0]
condor_schedd(_ZN8ReliSock9get_bytesEPvi+0x2e)[0x8261154]
condor_schedd(_ZN6Stream3getERi+0x90)[0x825afe6]
condor_schedd(_ZN6Stream4codeERi+0x44)[0x825897e]
condor_schedd(_ZN10DaemonCore9HandleReqEP6StreamS1_+0xcb6)[0x818bc6c]
condor_schedd(_ZN10DaemonCore9HandleReqEiP6Stream+0x39)[0x818afb3]
condor_schedd(_ZN10DaemonCore24CallSocketHandler_workerEibP6Stream+0x29f)[0x818a5bb]
condor_schedd(_ZN10DaemonCore35CallSocketHandler_worker_demarshallEPv+0x3b)[0x818a30b]
condor_schedd(_ZN13CondorThreads8pool_addEPFvPvES0_PiPKc+0x29)[0x8232b7b]
condor_schedd(_ZN10DaemonCore17CallSocketHandlerERib+0x1bc)[0x818a2ce]
condor_schedd(_ZN10DaemonCore6DriverEv+0x180e)[0x8189fdc]
condor_schedd(main+0x1ce0)[0x819de00]
/lib/libc.so.6(__libc_start_main+0xe6)[0xb21bb6]
condor_schedd[0x80fc181]
Now in new log file /home/matt/Documents/Condor/Installation/log/SchedLog
01/14 23:10:50 (fd:20) (pid:28029) Leaving thread safe start [select] in selector.cpp:311 unknown()
Now in new log file /home/matt/Documents/Condor/Installation/log/SchedLog
WARNING: Failed to rotate log into file /home/matt/Documents/Condor/Installation/log/SchedLog.old!
Likely cause is that another Condor process rotated the file at the same time.
01/14 23:10:50 (fd:20) (pid:28059) ForkWork: Child 28059 done, status 0
01/14 23:10:50 (fd:20) (pid:28029) Entering thread safe stop [select] in selector.cpp:317 unknown()
01/14 23:10:50 (fd:20) (pid:28029) Leaving thread safe stop [select] in selector.cpp:317 unknown()
01/14 23:10:50 (fd:20) (pid:28029) condor_read(): select returned 1
01/14 23:10:50 (fd:20) (pid:28029) Entering thread safe start [recv] in condor_rw.cpp:156 unknown()
01/14 23:10:50 (fd:20) (pid:28029) Leaving thread safe start [recv] in condor_rw.cpp:156 unknown()
01/14 23:10:50 (fd:20) (pid:28029) Entering thread safe stop [recv] in condor_rw.cpp:158 unknown()
MaxLog = 1024, length = 0
Saving log file to "/home/matt/Documents/Condor/Installation/log/SchedLog.old"
Comment 11 Matthew Farrellee 2010-01-25 12:07:51 EST
#0  0x00b7e308 in free () from /lib/libc.so.6
#1  0x00b77af4 in _IO_free_backup_area_internal () from /lib/libc.so.6
#2  0x00b759f2 in _IO_new_file_overflow () from /lib/libc.so.6
#3  0x00b74ba9 in _IO_new_file_xsputn () from /lib/libc.so.6
#4  0x00b4a31e in vfprintf () from /lib/libc.so.6
#5  0x00b5495f in fprintf () from /lib/libc.so.6
#6  0x081a3692 in _condor_dprintf_va (flags=33554432, 
    fmt=0x82b3da4 "Leaving thread safe %s [%s] in %s:%d %s()\n", 
    args=0xbfa268e8 "O=+\b2B+\buA+\b=\001") at dprintf.c:323
#7  0x081a4fe9 in dprintf (flags=33554432, 
    fmt=0x82b3da4 "Leaving thread safe %s [%s] in %s:%d %s()\n")
    at dprintf_common.c:76
#8  0x081aa80c in _mark_thread_safe (start_or_stop=2, dologging=1, 
    descrip=0x82b4232 "select", func=0x82b422a "unknown", 
    file=0x82b4175 "selector.cpp", line=317) at mark_thread.c:107
#9  0x081ab480 in Selector::execute (this=0xbfa2699c) at selector.cpp:317
#10 0x0826c74d in condor_read (peer_description=0xa2f4e2f "<127.0.0.1:44717>", 
    fd=19, buf=0xbfa26a77 "", sz=5, timeout=1, flags=0) at condor_rw.cpp:120
#11 0x0826146b in ReliSock::RcvMsg::rcv_packet (this=0xa2f4ea4, 
    peer_description=0xa2f4e2f "<127.0.0.1:44717>", _sock=19, _timeout=1)
    at reli_sock.cpp:677
#12 0x08260ce0 in ReliSock::handle_incoming_packet (this=0xa2f4da0)
    at reli_sock.cpp:449
#13 0x08261154 in ReliSock::get_bytes (this=0xa2f4da0, dta=0xbfa26b24, 
    max_sz=4) at reli_sock.cpp:595
#14 0x0825afe6 in Stream::get (this=0xa2f4da0, i=@0xbfa26db0)
    at stream.cpp:1468
#15 0x0825897e in Stream::code (this=0xa2f4da0, i=@0xbfa26db0)
    at stream.cpp:147
#16 0x0818bc6c in DaemonCore::HandleReq (this=0xa2c34c8, insock=0xa2c85f0, 
    asock=0xa2f4da0) at daemon_core.cpp:4124
#17 0x0818afb3 in DaemonCore::HandleReq (this=0xa2c34c8, socki=0, 
    asock=0xa2f4da0) at daemon_core.cpp:3754
#18 0x0818a5bb in DaemonCore::CallSocketHandler_worker (this=0xa2c34c8, i=0, 
    default_to_HandleCommand=true, asock=0xa2f4da0) at daemon_core.cpp:3450
#19 0x0818a30b in DaemonCore::CallSocketHandler_worker_demarshall (
    arg=0xa3337a0) at daemon_core.cpp:3404
#20 0x08232b7b in CondorThreads::pool_add (
    routine=0x818a2d0 <DaemonCore::CallSocketHandler_worker_demarshall(void*)>, arg=0xa3337a0, tid=0x0, descrip=0xa2c2d78 "DC Command Handler")
    at condor_threads.cpp:1106
#21 0x0818a2ce in DaemonCore::CallSocketHandler (this=0xa2c34c8, 
    i=@0xbfa270c8, default_to_HandleCommand=true) at daemon_core.cpp:3394
#22 0x08189fdc in DaemonCore::Driver (this=0xa2c34c8) at daemon_core.cpp:3307
#23 0x0819de00 in main (argc=1, argv=0xbfa276e8) at daemon_core_main.cpp:2276
Comment 12 Matthew Farrellee 2010-01-25 12:08:07 EST
#0  0x0016d26d in vfprintf () from /lib/libc.so.6
#1  0x081a3766 in _condor_dprintf_va (flags=33554432, 
    fmt=0x82b3da4 "Leaving thread safe %s [%s] in %s:%d %s()\n", 
    args=0xbfa78658 "I=+\b|\362-\bp\361-\b\234") at dprintf.c:353
#2  0x081a4fe9 in dprintf (flags=33554432, 
    fmt=0x82b3da4 "Leaving thread safe %s [%s] in %s:%d %s()\n")
    at dprintf_common.c:76
#3  0x081aa80c in _mark_thread_safe (start_or_stop=1, dologging=1, 
    descrip=0x82df27c "recv", func=0x82df274 "unknown", 
    file=0x82df170 "condor_rw.cpp", line=156) at mark_thread.c:107
#4  0x0826c8c1 in condor_read (peer_description=0x890664f "<127.0.0.1:39081>", 
    fd=19, buf=0xbfa788fb "", sz=4, timeout=1, flags=2) at condor_rw.cpp:156
#5  0x0818b995 in DaemonCore::HandleReq (this=0x88d74c8, insock=0x88dc5f0, 
    asock=0x89065c0) at daemon_core.cpp:4064
#6  0x0818afb3 in DaemonCore::HandleReq (this=0x88d74c8, socki=0, 
    asock=0x89065c0) at daemon_core.cpp:3754
#7  0x0818a5bb in DaemonCore::CallSocketHandler_worker (this=0x88d74c8, i=0, 
    default_to_HandleCommand=true, asock=0x89065c0) at daemon_core.cpp:3450
#8  0x0818a30b in DaemonCore::CallSocketHandler_worker_demarshall (
    arg=0x8908c50) at daemon_core.cpp:3404
#9  0x08232b7b in CondorThreads::pool_add (
    routine=0x818a2d0 <DaemonCore::CallSocketHandler_worker_demarshall(void*)>, arg=0x8908c50, tid=0x0, descrip=0x88d6d78 "DC Command Handler")
    at condor_threads.cpp:1106
#10 0x0818a2ce in DaemonCore::CallSocketHandler (this=0x88d74c8, 
    i=@0xbfa78cb8, default_to_HandleCommand=true) at daemon_core.cpp:3394
#11 0x08189fdc in DaemonCore::Driver (this=0x88d74c8) at daemon_core.cpp:3307
#12 0x0819de00 in main (argc=1, argv=0xbfa792d8) at daemon_core_main.cpp:2276
Comment 13 Matthew Farrellee 2010-01-25 12:08:34 EST
$ condor_schedd -f
Now in new log file /home/matt/Documents/Condor/Installation/log/SchedLog
01/14 23:16:02 (fd:2) (pid:7683) 	Read {*** glibc detected *** condor_schedd: double free or corruption (!prev): 0x08664c30 ***
======= Backtrace: =========
/lib/libc.so.6(+0x6e751)[0x451751]
condor_schedd[0x824be2b]
condor_schedd(_Z7ScannerRPKcR5Token+0xa8)[0x824bbfb]
Comment 14 Matthew Farrellee 2010-01-25 12:08:51 EST
$ cat log/dprintf_failure.SCHEDD
dprintf() had a fatal error in pid 12904
Error writing debug log
errno: 9 (Bad file descriptor)
euid: 500, ruid: 500
Comment 15 Matthew Farrellee 2010-01-25 12:08:59 EST
$ cat log/dprintf_failure.SCHEDD
dprintf() had a fatal error in pid 16473
Error writing debug log
errno: 9 (Bad file descriptor)
euid: 500, ruid: 500
Comment 16 Matthew Farrellee 2010-01-25 13:26:57 EST
I'm closing this until it can be reproduced on EL.

Note You need to log in before you can comment on or make changes to this bug.