Hide Forgot
Description of problem: The condor_startd allows a reconfig to change its slot configuration. If a job is running during that reconfiguration, the startd will crash. Version-Release number of selected component (if applicable): Likely all, definitely 7.6.7-0.8. How reproducible: 100% Steps to Reproduce: 0. Enable partitionable slots SLOT_TYPE_1=cpus=100% NUM_SLOTS=1 NUM_SLOTS_TYPE_1 = 1 SLOT_TYPE_1_PARTITIONABLE=true 1. service condor start 2. echo 'cmd=/bin/sleep\nargs=1d\nqueue' | condor_submit 3. Wait for job to start running 4. Disable partitionable slots #SLOT_TYPE_1=cpus=100% #NUM_SLOTS=1 #NUM_SLOTS_TYPE_1 = 1 #SLOT_TYPE_1_PARTITIONABLE=true 5. condor_reconfig Actual results: In MasterLog - The STARTD (pid 10060) died due to signal 11 (Segmentation fault) In StartLog - Stack dump for process 10060 at timestamp 1333040167 (10 frames) condor_startd(dprintf_dump_stack+0x63)[0x532f93] condor_startd(linux_sig_coredump(int)+0x40)[0x4aeeb0] /lib64/libpthread.so.0[0x376300f4a0] condor_startd(ResMgr::reconfig_resources()+0x1fb)[0x4830eb] condor_startd(main_config()+0x17)[0x470437] condor_startd(handle_dc_sighup(Service*, int)+0x1a)[0x4b110a] condor_startd(DaemonCore::Driver()+0x231)[0x4a2e91] condor_startd(main+0x10fb)[0x4b08ab] /lib64/libc.so.6(__libc_start_main+0xfd)[0x3762c1ec5d] condor_startd[0x469b99] Expected results: No crash.
MRG-Grid is in maintenance and only customer escalations will be considered. This issue can be reopened if a customer escalation associated with it occurs.