Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 628034 - negotiator core on quota_dynamic =0
negotiator core on quota_dynamic =0
Status: CLOSED ERRATA
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor (Show other bugs)
1.2
All Linux
medium Severity high
: 1.3
: ---
Assigned To: Matthew Farrellee
Tomas Rusnak
:
Depends On:
Blocks: 528800
  Show dependency treegraph
 
Reported: 2010-08-27 13:26 EDT by Jon Thomas
Modified: 2010-10-14 12:14 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
An incorrect configuration of the negotiator resulted in a segmentation fault. This occurred when the 'quota' variable was set to 0 for a group that had subgroups. With this update, the segmentation fault no longer occurs in this situation.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-10-14 12:14:16 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
patch for zero quota (1.35 KB, patch)
2010-08-27 13:26 EDT, Jon Thomas
no flags Details | Diff


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2010:0773 normal SHIPPED_LIVE Moderate: Red Hat Enterprise MRG Messaging and Grid Version 1.3 2010-10-14 11:56:44 EDT

  None (edit)
Description Jon Thomas 2010-08-27 13:26:38 EDT
Created attachment 441579 [details]
patch for zero quota

Negotiator will core if the config sets a quota=0 for a  group that has sub groups.

GROUP_QUOTA_DYNAMIC_a1.a3 = 0
GROUP_QUOTA_DYNAMIC_a1.a3.a3 = .2

This is due to the a1.a3 group not being added into the array of groups.

This is somewhat a matter of config in that if the admin sets

GROUP_QUOTA_DYNAMIC_a1.a3 = 0

they should also set 

GROUP_QUOTA_DYNAMIC_a1.a3.a3 = 0

which would avoid the problem.

But an incorrect config shouldn't really core the negotiator.
Comment 1 Jon Thomas 2010-08-27 13:31:53 EDT
Added a patch.  

The patch will change behavior in that submitters to a group with a configured 0 quota will not fall into the bucket of non-group users. The flip side is that the tree isn't pruned.
Comment 2 Matthew Farrellee 2010-08-29 19:51:53 EDT
The semantic change is that if a group is explicitly set to 0, then all submitters to that group or any subgroup will get 0 quota, instead of being dropped in with the non-group submissions? That seems like a reasonable behavior to have anyway. It would be unexpected if a group with 0 quota was allowed to run jobs.
Comment 3 Jon Thomas 2010-08-30 08:59:31 EDT
The semantics with this patch are slightly different from comment #2.

a) If the group was listed in GROUP_NAMES, the quota would be set to zero and the jobs would not run.

b) If the group was not listed in GROUP_NAMES, the jobs would fall into the non-group submission bucket.


The original behavior was based upon allowing "unofficial groups" to run jobs in non-group space without having to change their sdf to unset AccountingGroup. 

The patch causes "a" to happen, but not "b". However enforcing "b" is also a trivial patch since we can just throw out any ClassAd with a "." when the ClassAds are attached to groupArray[0].
Comment 4 Matthew Farrellee 2010-09-09 10:16:27 EDT
Built in 7.4.4-0.11
Comment 6 Tomas Rusnak 2010-09-16 05:32:32 EDT
Reproduced on:.

$CondorVersion: 7.4.4 Aug  9 2010 BuildID: RH-7.4.4-0.9.el5 PRE-RELEASE $
$CondorPlatform: I386-LINUX_RHEL5 $

Config:

GROUP_NAMES = a1.a3.a3
GROUP_QUOTA_DYNAMIC_a1.a3 = 0
GROUP_QUOTA_DYNAMIC_a1.a3.a3 = .2


09/16/10 05:30:52 Phase 2:  Performing accounting ...
09/16/10 05:30:52 group a1.a3.a3 dynamic quota for 8 slots = 0.200
09/16/10 05:30:52 Group Table : group a1.a3.a3 quota 0.200 usage 0.000 prio 0.00
09/16/10 05:30:52 negotiationtime: slots 8 group a1.a3.a3 autoregroup false
09/16/10 05:30:52 negotiationtime:sorting
09/16/10 05:30:52 Sort : sorting group vector
09/16/10 05:30:52 Sort : stage two
09/16/10 05:30:52 midsort : grouparray group  parent -1 child -1  left -1 right -1 i 0
09/16/10 05:30:52 midsort : grouparray group a1.a3.a3 parent -1 child -1  left -1 right -1 i 1
09/16/10 05:30:52 Sorted : grouparray group  parent -1 child -1  left -1 right -1 i 0
09/16/10 05:30:52 Sorted : grouparray group a1.a3.a3 parent -1 child -1  left -1 right -1 i 1
09/16/10 05:30:52 Sort : leaving
09/16/10 05:30:52 negotiationtime: finished sort - slots 8 group  auto true quota 1.000000 maxAllowed 8.000000 numsubmits 0 parent -1 child -1  left -1 right -1 i 0
09/16/10 05:30:52 negotiationtime: finished sort - slots 8 group a1.a3.a3 auto false quota 0.200000 maxAllowed 0.000000 numsubmits 0 parent -1 child -1  left -1 right -1 i 1
09/16/10 05:30:52 negotiationtime: finished inserting submitters - slots 8 group  quota 1.000000 maxAllowed 8.000000 numsubmits 0  i 0
09/16/10 05:30:52 negotiationtime: finished inserting submitters - slots 8 group a1.a3.a3 quota 0.200000 maxAllowed 0.000000 numsubmits 0  i 1
Stack dump for process 3833 at timestamp 1284629452 (8 frames)
condor_negotiator(dprintf_dump_stack+0x44)[0x80dc924]
condor_negotiator[0x80de764]
[0x635420]
condor_negotiator(_ZN12TimerManager7TimeoutEv+0x14b)[0x80dbe8b]
condor_negotiator(_ZN10DaemonCore6DriverEv+0x244)[0x80c4824]
condor_negotiator(main+0xd80)[0x80d8280]
/lib/libc.so.6(__libc_start_main+0xdc)[0x819e9c]
condor_negotiator[0x80a31b1]
Comment 7 Tomas Rusnak 2010-09-16 05:49:50 EDT
MasterLog:

09/16/10 05:46:26 The NEGOTIATOR (pid 4147) died due to signal 11 (Segmentation fault)

# ps ax | grep condor
 4073 ?        Ss     0:00 condor_master -pidfile /var/run/condor/condor_master.pid
 4074 ?        Ss     0:00 condor_collector -f
 4077 ?        Ss     0:00 condor_schedd -f
 4078 ?        Ss     0:05 condor_startd -f
 4079 ?        S      0:00 condor_procd -A /var/run/condor/procd_pipe.SCHEDD -S 60 -C 64
 4142 pts/0    S+     0:00 grep condor

Config:

# cat /etc/condor/config.d/zzz_condor_config.test 
CREATE_CORE_FILES=True
#ABORT_ON_EXCEPTION=True
MAX_HISTORY_LOG=300*1024*1024
MAX_HISTORY_ROTATIONS=10
ALL_DEBUG = D_FULLDEBUG

GROUP_NAMES = a1.a3.a3

GROUP_QUOTA_DYNAMIC_a1.a3 = 0
GROUP_QUOTA_DYNAMIC_a1.a3.a3 = .2

# tailf /var/log/condor/NegotiatorLog
09/16/10 05:45:20 negotiationtime: finished inserting submitters - slots 8 group a1.a3.a3 quota 0.200000 maxAllowed 0.000000 numsubmits 0  i 1
Stack dump for process 4130 at timestamp 1284630320 (8 frames)
condor_negotiator(dprintf_dump_stack+0x44)[0x80dd1c4]
condor_negotiator[0x80df004]
[0xe71420]
condor_negotiator(_ZN12TimerManager7TimeoutEv+0x14b)[0x80dc72b]
condor_negotiator(_ZN10DaemonCore6DriverEv+0x244)[0x80c50c4]
condor_negotiator(main+0xd80)[0x80d8b20]
/lib/libc.so.6(__libc_start_main+0xdc)[0x819e9c]
condor_negotiator[0x80a31f1]

# condor -v
$CondorVersion: 7.4.4 Sep 14 2010 BuildID: RH-7.4.4-0.13.el5 PRE-RELEASE $
$CondorPlatform: I386-LINUX_RHEL5 $

Issue was not fixed in current packages. Could you check if patch was really included into build?
Comment 8 Jon Thomas 2010-09-16 09:54:32 EDT
It's there.

I'll take a look at why this is failing. Perhaps more recent code changes broke it.
Comment 9 Jon Thomas 2010-09-16 10:50:34 EDT
it looks like you hit a different failure based on:

GROUP_NAMES = a1.a3.a3

Try GROUP_NAMES = a1, a1.a3, a1.a3.a3

I'm looking at a way to fix this new issue.
Comment 10 Matthew Farrellee 2010-09-16 12:05:37 EDT
For a new issue, create a new BZ. If it blocks testing of this BZ, set the dependencies.
Comment 11 Tomas Rusnak 2010-09-17 05:57:17 EDT
Negotiator crash confirmed on:

$CondorVersion: 7.4.4 Aug  9 2010 BuildID: RH-7.4.4-0.9.el4 PRE-RELEASE $
$CondorPlatform: I386-LINUX_RHEL4 $

Stack dump for process 29714 at timestamp 1284716851 (8 frames)
condor_negotiator(dprintf_dump_stack+0x3f)[0x80d409f]
condor_negotiator[0x80d43fa]
/lib/tls/libpthread.so.0[0x131c98]
condor_negotiator(_ZN12TimerManager7TimeoutEv+0xf6)[0x80d2126]
condor_negotiator(_ZN10DaemonCore6DriverEv+0x17e)[0x80b779e]
condor_negotiator(main+0x133e)[0x80cdf0e]
/lib/tls/libc.so.6(__libc_start_main+0xd3)[0xce1e93]
condor_negotiator(__gxx_personality_v0+0x149)[0x8097711]

Retested over current packages (condor-7.4.4-0.14.el5) on all supported platforms x86,x86_64/RHEL4, RHEL5.

NegotiatorLog:

09/17/10 04:52:24 ---------- Started Negotiation Cycle ----------
09/17/10 04:52:24 Phase 1:  Obtaining ads from collector ...
09/17/10 04:52:24   Getting all public ads ...
09/17/10 04:52:24 Trying to query collector <IP>
09/17/10 04:52:24   Sorting 12 ads ...
09/17/10 04:52:24   Getting startd private ads ...
09/17/10 04:52:24 Trying to query collector <IP>
09/17/10 04:52:24 Got ads: 12 public and 8 private
09/17/10 04:52:24 Public ads include 0 submitter, 8 startd
09/17/10 04:52:24 Phase 1: numDynGroupSlots 8  untrimmedSlotWeightTotal 8.000000
09/17/10 04:52:24 Entering compute_significant_attrs()
09/17/10 04:52:24 Leaving compute_significant_attrs() - result=JobUniverse,LastCheckpointPlatform,NumCkpts
09/17/10 04:52:24 Phase 2:  Performing accounting ...
09/17/10 04:52:24 group a1 dynamic quota for 8 slots = 0.000
09/17/10 04:52:24 Group Table : group a1 quota 0.000 usage 0.000 prio nan
09/17/10 04:52:24 negotiationtime: slots 8 group a1 autoregroup false
09/17/10 04:52:24 group a1.a3 dynamic quota for 8 slots = 0.000
09/17/10 04:52:24 Group Table : group a1.a3 quota 0.000 usage 0.000 prio nan
09/17/10 04:52:24 negotiationtime: slots 8 group a1.a3 autoregroup false
09/17/10 04:52:24 group a1.a3.a3 dynamic quota for 8 slots = 0.200
09/17/10 04:52:24 Group Table : group a1.a3.a3 quota 0.200 usage 0.000 prio 0.00  
09/17/10 04:52:24 negotiationtime: slots 8 group a1.a3.a3 autoregroup false
09/17/10 04:52:24 negotiationtime:sorting
09/17/10 04:52:24 Sort : sorting group vector
09/17/10 04:52:24 Sorting : grouparray group a1.a3 parent -1 child -1  left -1 right -1 i 0
09/17/10 04:52:24 Sorting : grouparray group a1.a3.a3 parent -1 child -1  left -1 right -1 i 1
09/17/10 04:52:24 Sorting : grouparray group a1.a3.a3 parent -1 child -1  left -1 right -1 i 0
09/17/10 04:52:24 Sort : stage two
09/17/10 04:52:24 midsort : grouparray group  parent -1 child 1  left -1 right -1 i 0
09/17/10 04:52:24 midsort : grouparray group a1 parent 0 child 2  left -1 right -1 i 1
09/17/10 04:52:24 midsort : grouparray group a1.a3 parent 1 child 3  left -1 right -1 i 2
09/17/10 04:52:24 midsort : grouparray group a1.a3.a3 parent 2 child -1  left -1 right -1 i 3
09/17/10 04:52:24 Sorted : grouparray group  parent -1 child 1  left -1 right -1 i 0
09/17/10 04:52:24 Sorted : grouparray group a1 parent 0 child 2  left -1 right -1 i 1
09/17/10 04:52:24 Sorted : grouparray group a1.a3 parent 1 child 3  left -1 right -1 i 2
09/17/10 04:52:24 Sorted : grouparray group a1.a3.a3 parent 2 child -1  left -1 right -1 i 3
09/17/10 04:52:24 Sort : leaving
....
09/17/10 04:52:24 Group  - skipping, no submitters
09/17/10 04:52:24 Group a1 - skipping, no submitters
09/17/10 04:52:24 Group a1.a3 - skipping, no submitters
09/17/10 04:52:24 Group a1.a3.a3 - skipping, no submitters
09/17/10 04:52:24 Failed to match 0.000000 slots on iteration 1.
09/17/10 04:52:24 negotiationtime: finished  - slots 8 group  auto true quota 1.000000 maxAllowed 8.000000 nodemaxAllowed 0.000000 numsubmits 0 usage 0.000000
09/17/10 04:52:24 negotiationtime: finished  - slots 8 group a1 auto false quota 0.000000 maxAllowed 0.000000 nodemaxAllowed 0.000000 numsubmits 0 usage 0.000000
09/17/10 04:52:24 negotiationtime: finished  - slots 8 group a1.a3 auto false quota 0.000000 maxAllowed 0.000000 nodemaxAllowed 0.000000 numsubmits 0 usage 0.000000
09/17/10 04:52:24 negotiationtime: finished  - slots 8 group a1.a3.a3 auto false quota 0.200000 maxAllowed 0.000000 nodemaxAllowed 0.000000 numsubmits 0 usage 0.000000
09/17/10 04:52:24 ---------- Finished Negotiation Cycle ----------

# ps ax | grep condor
 7587 ?        Ss     0:00 condor_master -pidfile /var/run/condor/condor_master.pid
 7588 ?        Ss     0:00 condor_collector -f
 7590 ?        Ss     0:00 condor_negotiator -f
 7591 ?        Ss     0:00 condor_schedd -f
 7592 ?        Ss     0:05 condor_startd -f
 7593 ?        S      0:00 condor_procd -A /var/run/condor/procd_pipe.SCHEDD -S 60 -C 64

No regression found on current packages.

>>> VERIFIED
Comment 12 Matthew Farrellee 2010-09-21 15:37:30 EDT
Comment 6 and Comment 7 -> Bug 636271
Comment 13 Martin Prpič 2010-10-07 12:14:18 EDT
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
An incorrect configuration of the negotiator resulted in a segmentation fault. This occurred when the 'quota' variable was set to 0 for a group that had supgroups. With this update, the segmentation fault no longer occurs in this situation.
Comment 14 Florian Nadge 2010-10-08 06:22:18 EDT
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1 +1 @@
-An incorrect configuration of the negotiator resulted in a segmentation fault. This occurred when the 'quota' variable was set to 0 for a group that had supgroups. With this update, the segmentation fault no longer occurs in this situation.+An incorrect configuration of the negotiator resulted in a segmentation fault. This occurred when the 'quota' variable was set to 0 for a group that had subgroups. With this update, the segmentation fault no longer occurs in this situation.
Comment 16 errata-xmlrpc 2010-10-14 12:14:16 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0773.html

Note You need to log in before you can comment on or make changes to this bug.