Bug 1575974

Summary: Condor shared port daemon crashes on startup on Fedora 28+
Product: [Fedora] Fedora Reporter: Bert DeKnuydt <Bert.Deknuydt>
Component: condorAssignee: Tim Theisen <tim>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 28CC: bbockelm, bcotton, eerlands, matt, steve.traylen, tim, tstclair, valtri
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: condor-8.6.11-1.fc27 condor-8.6.11-1.fc28 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-16 19:32:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Bert DeKnuydt 2018-05-08 12:59:36 UTC
Description of problem:

The condor startd coming with Fedora 28, package
condor-8.6.10-1.fc28.x86_64 crashes soon after
startup.  It gets restarted every so often,
but it never really works.

Version-Release number of selected component (if applicable):

condor-8.6.10-1.fc28.x86_64

How reproducible:

Always

Steps to Reproduce:
1. dnf install condor
2. systemctl start condor
3. tail /var/log/condor/StartLog

Actual results:

05/08/18 14:41:13 slot1: Changing activity: Idle -> Benchmarking
05/08/18 14:41:13 BenchMgr:StartBenchmarks()
Caught signal 6: si_code=4294967290, si_pid=6101, si_uid=0, si_addr=0x17D5
Stack dump for process 6101 at timestamp 1525783273 (23 frames)
/lib64/libcondor_utils_8_6_10.so(dprintf_dump_stack+0x28)[0x7fe29035eb48]
/lib64/libcondor_utils_8_6_10.so(_Z17unix_sig_coredumpiP9siginfo_tPv+0x6d)[0x7fe2904b0efd]
/lib64/libpthread.so.0(+0x11fb0)[0x7fe28b934fb0]
/lib64/libc.so.6(gsignal+0x10b)[0x7fe28b59af4b]
/lib64/libc.so.6(abort+0x12b)[0x7fe28b585591]
/lib64/libcondor_utils_8_6_10.so(+0x2a09bf)[0x7fe29045a9bf]
/lib64/libcondor_utils_8_6_10.so(_ZN15SharedPortState6HandleEP6Stream+0x14b)[0x7fe29045ab0b]
/lib64/libcondor_utils_8_6_10.so(_ZN16SharedPortClient10PassSocketEP4SockPKcS3_b+0xbd)[0x7fe29045acbd]
/lib64/libcondor_utils_8_6_10.so(_ZN8ReliSock28do_shared_port_local_connectEPKcbS1_+0xfd)[0x7fe290437d8d]
/lib64/libcondor_utils_8_6_10.so(_ZN4Sock15special_connectEPKcib+0x3b0)[0x7fe2904377c0]
/lib64/libcondor_utils_8_6_10.so(_ZN4Sock10do_connectEPKcib+0x83)[0x7fe290465603]
/lib64/libcondor_utils_8_6_10.so(_ZN6Daemon11connectSockEP4SockiP11CondorErrorbb+0x66)[0x7fe29046bb66]
/lib64/libcondor_utils_8_6_10.so(_ZN6Daemon8reliSockEilP11CondorErrorbb+0x68)[0x7fe29046bc18]
/lib64/libcondor_utils_8_6_10.so(_ZN6Daemon12startCommandEiN6Stream11stream_typeEPP4SockiP11CondorErroriPFvbS3_S6_PvES7_bPKcbSB_+0xa9)[0x7fe29046be39]
/lib64/libcondor_utils_8_6_10.so(_ZN6Daemon12startCommandEiN6Stream11stream_typeEiP11CondorErrorPKcbS5_+0x3f)[0x7fe29046c08f]
/lib64/libcondor_utils_8_6_10.so(_ZN11DCMessenger15sendBlockingMsgE18classy_counted_ptrI5DCMsgE+0x65)[0x7fe290478c95]
/lib64/libcondor_utils_8_6_10.so(_ZN6Daemon15sendBlockingMsgE18classy_counted_ptrI5DCMsgE+0x6b)[0x7fe29046b50b]
/lib64/libcondor_utils_8_6_10.so(_ZN10DaemonCore17SendAliveToParentEv+0x1a7)[0x7fe2904a6517]
/lib64/libcondor_utils_8_6_10.so(_ZN12TimerManager7TimeoutEPiPd+0x16e)[0x7fe2904be94e]
/lib64/libcondor_utils_8_6_10.so(_ZN10DaemonCore6DriverEv+0x620)[0x7fe2904a0f90]
/lib64/libcondor_utils_8_6_10.so(_Z7dc_mainiPPc+0x160e)[0x7fe2904b46ee]
/lib64/libc.so.6(__libc_start_main+0xeb)[0x7fe28b5871bb]
condor_startd(_start+0x2a)[0x557dcc2d415a]
05/08/18 14:41:23 ******************************************************
05/08/18 14:41:23 ** condor_startd (CONDOR_STARTD) STARTING UP

Expected results:

No crash :P

Additional info:

--> 'Upgrading' to 8.6.10-2.fc29.x86_64 from Fedora 29
    (on a F28 machine), results in exactly the same problem.

--> 'Downgrading' to condor-8.6.10-1.el7.x86_64 from RHEL
    (actually directly from HTCondor), results in a functioning
    system.

--> 'Downgrading' to condor-8.6.10-1.fc27.x86_64 does not work
    because of dependency hell.  But that version does work properly
    on Fedora 27 though.

--> When ran interactively, condor_startd crashed spitting out

# terminate called after throwing an instance of 'std::logic_error'
  what():  basic_string::_M_construct null not valid

   Don't know if this is helpful.

Comment 1 Tim Theisen 2018-06-05 17:57:57 UTC
I have reproduced the problem. The root cause is that the shared port daemon crashes on startup. You may be able to work around the problem by not using the shared port daemon. I expect to have a fix shortly.

Comment 2 Fedora Update System 2018-06-07 19:43:18 UTC
condor-8.6.11-1.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2018-cef46734da

Comment 3 Fedora Update System 2018-06-07 21:28:56 UTC
condor-8.6.11-1.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2018-46f1610313

Comment 4 Fedora Update System 2018-06-08 12:59:50 UTC
condor-8.6.11-1.fc28 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-46f1610313

Comment 5 Fedora Update System 2018-06-08 19:48:14 UTC
condor-8.6.11-1.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-cef46734da

Comment 6 Fedora Update System 2018-06-16 19:32:16 UTC
condor-8.6.11-1.fc27 has been pushed to the Fedora 27 stable repository. If problems still persist, please make note of it in this bug report.

Comment 7 Fedora Update System 2018-06-16 20:16:08 UTC
condor-8.6.11-1.fc28 has been pushed to the Fedora 28 stable repository. If problems still persist, please make note of it in this bug report.