Bug 1575974 - Condor shared port daemon crashes on startup on Fedora 28+
Summary: Condor shared port daemon crashes on startup on Fedora 28+
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: condor
Version: 28
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Tim Theisen
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-05-08 12:59 UTC by Bert DeKnuydt
Modified: 2018-06-16 20:16 UTC (History)
8 users (show)

Fixed In Version: condor-8.6.11-1.fc27 condor-8.6.11-1.fc28
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-06-16 19:32:16 UTC
Type: Bug


Attachments (Terms of Use)

Description Bert DeKnuydt 2018-05-08 12:59:36 UTC
Description of problem:

The condor startd coming with Fedora 28, package
condor-8.6.10-1.fc28.x86_64 crashes soon after
startup.  It gets restarted every so often,
but it never really works.

Version-Release number of selected component (if applicable):

condor-8.6.10-1.fc28.x86_64

How reproducible:

Always

Steps to Reproduce:
1. dnf install condor
2. systemctl start condor
3. tail /var/log/condor/StartLog

Actual results:

05/08/18 14:41:13 slot1: Changing activity: Idle -> Benchmarking
05/08/18 14:41:13 BenchMgr:StartBenchmarks()
Caught signal 6: si_code=4294967290, si_pid=6101, si_uid=0, si_addr=0x17D5
Stack dump for process 6101 at timestamp 1525783273 (23 frames)
/lib64/libcondor_utils_8_6_10.so(dprintf_dump_stack+0x28)[0x7fe29035eb48]
/lib64/libcondor_utils_8_6_10.so(_Z17unix_sig_coredumpiP9siginfo_tPv+0x6d)[0x7fe2904b0efd]
/lib64/libpthread.so.0(+0x11fb0)[0x7fe28b934fb0]
/lib64/libc.so.6(gsignal+0x10b)[0x7fe28b59af4b]
/lib64/libc.so.6(abort+0x12b)[0x7fe28b585591]
/lib64/libcondor_utils_8_6_10.so(+0x2a09bf)[0x7fe29045a9bf]
/lib64/libcondor_utils_8_6_10.so(_ZN15SharedPortState6HandleEP6Stream+0x14b)[0x7fe29045ab0b]
/lib64/libcondor_utils_8_6_10.so(_ZN16SharedPortClient10PassSocketEP4SockPKcS3_b+0xbd)[0x7fe29045acbd]
/lib64/libcondor_utils_8_6_10.so(_ZN8ReliSock28do_shared_port_local_connectEPKcbS1_+0xfd)[0x7fe290437d8d]
/lib64/libcondor_utils_8_6_10.so(_ZN4Sock15special_connectEPKcib+0x3b0)[0x7fe2904377c0]
/lib64/libcondor_utils_8_6_10.so(_ZN4Sock10do_connectEPKcib+0x83)[0x7fe290465603]
/lib64/libcondor_utils_8_6_10.so(_ZN6Daemon11connectSockEP4SockiP11CondorErrorbb+0x66)[0x7fe29046bb66]
/lib64/libcondor_utils_8_6_10.so(_ZN6Daemon8reliSockEilP11CondorErrorbb+0x68)[0x7fe29046bc18]
/lib64/libcondor_utils_8_6_10.so(_ZN6Daemon12startCommandEiN6Stream11stream_typeEPP4SockiP11CondorErroriPFvbS3_S6_PvES7_bPKcbSB_+0xa9)[0x7fe29046be39]
/lib64/libcondor_utils_8_6_10.so(_ZN6Daemon12startCommandEiN6Stream11stream_typeEiP11CondorErrorPKcbS5_+0x3f)[0x7fe29046c08f]
/lib64/libcondor_utils_8_6_10.so(_ZN11DCMessenger15sendBlockingMsgE18classy_counted_ptrI5DCMsgE+0x65)[0x7fe290478c95]
/lib64/libcondor_utils_8_6_10.so(_ZN6Daemon15sendBlockingMsgE18classy_counted_ptrI5DCMsgE+0x6b)[0x7fe29046b50b]
/lib64/libcondor_utils_8_6_10.so(_ZN10DaemonCore17SendAliveToParentEv+0x1a7)[0x7fe2904a6517]
/lib64/libcondor_utils_8_6_10.so(_ZN12TimerManager7TimeoutEPiPd+0x16e)[0x7fe2904be94e]
/lib64/libcondor_utils_8_6_10.so(_ZN10DaemonCore6DriverEv+0x620)[0x7fe2904a0f90]
/lib64/libcondor_utils_8_6_10.so(_Z7dc_mainiPPc+0x160e)[0x7fe2904b46ee]
/lib64/libc.so.6(__libc_start_main+0xeb)[0x7fe28b5871bb]
condor_startd(_start+0x2a)[0x557dcc2d415a]
05/08/18 14:41:23 ******************************************************
05/08/18 14:41:23 ** condor_startd (CONDOR_STARTD) STARTING UP

Expected results:

No crash :P

Additional info:

--> 'Upgrading' to 8.6.10-2.fc29.x86_64 from Fedora 29
    (on a F28 machine), results in exactly the same problem.

--> 'Downgrading' to condor-8.6.10-1.el7.x86_64 from RHEL
    (actually directly from HTCondor), results in a functioning
    system.

--> 'Downgrading' to condor-8.6.10-1.fc27.x86_64 does not work
    because of dependency hell.  But that version does work properly
    on Fedora 27 though.

--> When ran interactively, condor_startd crashed spitting out

# terminate called after throwing an instance of 'std::logic_error'
  what():  basic_string::_M_construct null not valid

   Don't know if this is helpful.

Comment 1 Tim Theisen 2018-06-05 17:57:57 UTC
I have reproduced the problem. The root cause is that the shared port daemon crashes on startup. You may be able to work around the problem by not using the shared port daemon. I expect to have a fix shortly.

Comment 2 Fedora Update System 2018-06-07 19:43:18 UTC
condor-8.6.11-1.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2018-cef46734da

Comment 3 Fedora Update System 2018-06-07 21:28:56 UTC
condor-8.6.11-1.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2018-46f1610313

Comment 4 Fedora Update System 2018-06-08 12:59:50 UTC
condor-8.6.11-1.fc28 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-46f1610313

Comment 5 Fedora Update System 2018-06-08 19:48:14 UTC
condor-8.6.11-1.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-cef46734da

Comment 6 Fedora Update System 2018-06-16 19:32:16 UTC
condor-8.6.11-1.fc27 has been pushed to the Fedora 27 stable repository. If problems still persist, please make note of it in this bug report.

Comment 7 Fedora Update System 2018-06-16 20:16:08 UTC
condor-8.6.11-1.fc28 has been pushed to the Fedora 28 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.