Bug 628034
Summary: | negotiator core on quota_dynamic =0 | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise MRG | Reporter: | Jon Thomas <jthomas> | ||||
Component: | condor | Assignee: | Matthew Farrellee <matt> | ||||
Status: | CLOSED ERRATA | QA Contact: | Tomas Rusnak <trusnak> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 1.2 | CC: | fnadge, matt, trusnak | ||||
Target Milestone: | 1.3 | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: |
An incorrect configuration of the negotiator resulted in a segmentation fault. This occurred when the 'quota' variable was set to 0 for a group that had subgroups. With this update, the segmentation fault no longer occurs in this situation.
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2010-10-14 16:14:16 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 528800 | ||||||
Attachments: |
|
Added a patch. The patch will change behavior in that submitters to a group with a configured 0 quota will not fall into the bucket of non-group users. The flip side is that the tree isn't pruned. The semantic change is that if a group is explicitly set to 0, then all submitters to that group or any subgroup will get 0 quota, instead of being dropped in with the non-group submissions? That seems like a reasonable behavior to have anyway. It would be unexpected if a group with 0 quota was allowed to run jobs. The semantics with this patch are slightly different from comment #2. a) If the group was listed in GROUP_NAMES, the quota would be set to zero and the jobs would not run. b) If the group was not listed in GROUP_NAMES, the jobs would fall into the non-group submission bucket. The original behavior was based upon allowing "unofficial groups" to run jobs in non-group space without having to change their sdf to unset AccountingGroup. The patch causes "a" to happen, but not "b". However enforcing "b" is also a trivial patch since we can just throw out any ClassAd with a "." when the ClassAds are attached to groupArray[0]. Built in 7.4.4-0.11 Reproduced on:. $CondorVersion: 7.4.4 Aug 9 2010 BuildID: RH-7.4.4-0.9.el5 PRE-RELEASE $ $CondorPlatform: I386-LINUX_RHEL5 $ Config: GROUP_NAMES = a1.a3.a3 GROUP_QUOTA_DYNAMIC_a1.a3 = 0 GROUP_QUOTA_DYNAMIC_a1.a3.a3 = .2 09/16/10 05:30:52 Phase 2: Performing accounting ... 09/16/10 05:30:52 group a1.a3.a3 dynamic quota for 8 slots = 0.200 09/16/10 05:30:52 Group Table : group a1.a3.a3 quota 0.200 usage 0.000 prio 0.00 09/16/10 05:30:52 negotiationtime: slots 8 group a1.a3.a3 autoregroup false 09/16/10 05:30:52 negotiationtime:sorting 09/16/10 05:30:52 Sort : sorting group vector 09/16/10 05:30:52 Sort : stage two 09/16/10 05:30:52 midsort : grouparray group parent -1 child -1 left -1 right -1 i 0 09/16/10 05:30:52 midsort : grouparray group a1.a3.a3 parent -1 child -1 left -1 right -1 i 1 09/16/10 05:30:52 Sorted : grouparray group parent -1 child -1 left -1 right -1 i 0 09/16/10 05:30:52 Sorted : grouparray group a1.a3.a3 parent -1 child -1 left -1 right -1 i 1 09/16/10 05:30:52 Sort : leaving 09/16/10 05:30:52 negotiationtime: finished sort - slots 8 group auto true quota 1.000000 maxAllowed 8.000000 numsubmits 0 parent -1 child -1 left -1 right -1 i 0 09/16/10 05:30:52 negotiationtime: finished sort - slots 8 group a1.a3.a3 auto false quota 0.200000 maxAllowed 0.000000 numsubmits 0 parent -1 child -1 left -1 right -1 i 1 09/16/10 05:30:52 negotiationtime: finished inserting submitters - slots 8 group quota 1.000000 maxAllowed 8.000000 numsubmits 0 i 0 09/16/10 05:30:52 negotiationtime: finished inserting submitters - slots 8 group a1.a3.a3 quota 0.200000 maxAllowed 0.000000 numsubmits 0 i 1 Stack dump for process 3833 at timestamp 1284629452 (8 frames) condor_negotiator(dprintf_dump_stack+0x44)[0x80dc924] condor_negotiator[0x80de764] [0x635420] condor_negotiator(_ZN12TimerManager7TimeoutEv+0x14b)[0x80dbe8b] condor_negotiator(_ZN10DaemonCore6DriverEv+0x244)[0x80c4824] condor_negotiator(main+0xd80)[0x80d8280] /lib/libc.so.6(__libc_start_main+0xdc)[0x819e9c] condor_negotiator[0x80a31b1] MasterLog: 09/16/10 05:46:26 The NEGOTIATOR (pid 4147) died due to signal 11 (Segmentation fault) # ps ax | grep condor 4073 ? Ss 0:00 condor_master -pidfile /var/run/condor/condor_master.pid 4074 ? Ss 0:00 condor_collector -f 4077 ? Ss 0:00 condor_schedd -f 4078 ? Ss 0:05 condor_startd -f 4079 ? S 0:00 condor_procd -A /var/run/condor/procd_pipe.SCHEDD -S 60 -C 64 4142 pts/0 S+ 0:00 grep condor Config: # cat /etc/condor/config.d/zzz_condor_config.test CREATE_CORE_FILES=True #ABORT_ON_EXCEPTION=True MAX_HISTORY_LOG=300*1024*1024 MAX_HISTORY_ROTATIONS=10 ALL_DEBUG = D_FULLDEBUG GROUP_NAMES = a1.a3.a3 GROUP_QUOTA_DYNAMIC_a1.a3 = 0 GROUP_QUOTA_DYNAMIC_a1.a3.a3 = .2 # tailf /var/log/condor/NegotiatorLog 09/16/10 05:45:20 negotiationtime: finished inserting submitters - slots 8 group a1.a3.a3 quota 0.200000 maxAllowed 0.000000 numsubmits 0 i 1 Stack dump for process 4130 at timestamp 1284630320 (8 frames) condor_negotiator(dprintf_dump_stack+0x44)[0x80dd1c4] condor_negotiator[0x80df004] [0xe71420] condor_negotiator(_ZN12TimerManager7TimeoutEv+0x14b)[0x80dc72b] condor_negotiator(_ZN10DaemonCore6DriverEv+0x244)[0x80c50c4] condor_negotiator(main+0xd80)[0x80d8b20] /lib/libc.so.6(__libc_start_main+0xdc)[0x819e9c] condor_negotiator[0x80a31f1] # condor -v $CondorVersion: 7.4.4 Sep 14 2010 BuildID: RH-7.4.4-0.13.el5 PRE-RELEASE $ $CondorPlatform: I386-LINUX_RHEL5 $ Issue was not fixed in current packages. Could you check if patch was really included into build? It's there. I'll take a look at why this is failing. Perhaps more recent code changes broke it. it looks like you hit a different failure based on: GROUP_NAMES = a1.a3.a3 Try GROUP_NAMES = a1, a1.a3, a1.a3.a3 I'm looking at a way to fix this new issue. For a new issue, create a new BZ. If it blocks testing of this BZ, set the dependencies. Negotiator crash confirmed on:
$CondorVersion: 7.4.4 Aug 9 2010 BuildID: RH-7.4.4-0.9.el4 PRE-RELEASE $
$CondorPlatform: I386-LINUX_RHEL4 $
Stack dump for process 29714 at timestamp 1284716851 (8 frames)
condor_negotiator(dprintf_dump_stack+0x3f)[0x80d409f]
condor_negotiator[0x80d43fa]
/lib/tls/libpthread.so.0[0x131c98]
condor_negotiator(_ZN12TimerManager7TimeoutEv+0xf6)[0x80d2126]
condor_negotiator(_ZN10DaemonCore6DriverEv+0x17e)[0x80b779e]
condor_negotiator(main+0x133e)[0x80cdf0e]
/lib/tls/libc.so.6(__libc_start_main+0xd3)[0xce1e93]
condor_negotiator(__gxx_personality_v0+0x149)[0x8097711]
Retested over current packages (condor-7.4.4-0.14.el5) on all supported platforms x86,x86_64/RHEL4, RHEL5.
NegotiatorLog:
09/17/10 04:52:24 ---------- Started Negotiation Cycle ----------
09/17/10 04:52:24 Phase 1: Obtaining ads from collector ...
09/17/10 04:52:24 Getting all public ads ...
09/17/10 04:52:24 Trying to query collector <IP>
09/17/10 04:52:24 Sorting 12 ads ...
09/17/10 04:52:24 Getting startd private ads ...
09/17/10 04:52:24 Trying to query collector <IP>
09/17/10 04:52:24 Got ads: 12 public and 8 private
09/17/10 04:52:24 Public ads include 0 submitter, 8 startd
09/17/10 04:52:24 Phase 1: numDynGroupSlots 8 untrimmedSlotWeightTotal 8.000000
09/17/10 04:52:24 Entering compute_significant_attrs()
09/17/10 04:52:24 Leaving compute_significant_attrs() - result=JobUniverse,LastCheckpointPlatform,NumCkpts
09/17/10 04:52:24 Phase 2: Performing accounting ...
09/17/10 04:52:24 group a1 dynamic quota for 8 slots = 0.000
09/17/10 04:52:24 Group Table : group a1 quota 0.000 usage 0.000 prio nan
09/17/10 04:52:24 negotiationtime: slots 8 group a1 autoregroup false
09/17/10 04:52:24 group a1.a3 dynamic quota for 8 slots = 0.000
09/17/10 04:52:24 Group Table : group a1.a3 quota 0.000 usage 0.000 prio nan
09/17/10 04:52:24 negotiationtime: slots 8 group a1.a3 autoregroup false
09/17/10 04:52:24 group a1.a3.a3 dynamic quota for 8 slots = 0.200
09/17/10 04:52:24 Group Table : group a1.a3.a3 quota 0.200 usage 0.000 prio 0.00
09/17/10 04:52:24 negotiationtime: slots 8 group a1.a3.a3 autoregroup false
09/17/10 04:52:24 negotiationtime:sorting
09/17/10 04:52:24 Sort : sorting group vector
09/17/10 04:52:24 Sorting : grouparray group a1.a3 parent -1 child -1 left -1 right -1 i 0
09/17/10 04:52:24 Sorting : grouparray group a1.a3.a3 parent -1 child -1 left -1 right -1 i 1
09/17/10 04:52:24 Sorting : grouparray group a1.a3.a3 parent -1 child -1 left -1 right -1 i 0
09/17/10 04:52:24 Sort : stage two
09/17/10 04:52:24 midsort : grouparray group parent -1 child 1 left -1 right -1 i 0
09/17/10 04:52:24 midsort : grouparray group a1 parent 0 child 2 left -1 right -1 i 1
09/17/10 04:52:24 midsort : grouparray group a1.a3 parent 1 child 3 left -1 right -1 i 2
09/17/10 04:52:24 midsort : grouparray group a1.a3.a3 parent 2 child -1 left -1 right -1 i 3
09/17/10 04:52:24 Sorted : grouparray group parent -1 child 1 left -1 right -1 i 0
09/17/10 04:52:24 Sorted : grouparray group a1 parent 0 child 2 left -1 right -1 i 1
09/17/10 04:52:24 Sorted : grouparray group a1.a3 parent 1 child 3 left -1 right -1 i 2
09/17/10 04:52:24 Sorted : grouparray group a1.a3.a3 parent 2 child -1 left -1 right -1 i 3
09/17/10 04:52:24 Sort : leaving
....
09/17/10 04:52:24 Group - skipping, no submitters
09/17/10 04:52:24 Group a1 - skipping, no submitters
09/17/10 04:52:24 Group a1.a3 - skipping, no submitters
09/17/10 04:52:24 Group a1.a3.a3 - skipping, no submitters
09/17/10 04:52:24 Failed to match 0.000000 slots on iteration 1.
09/17/10 04:52:24 negotiationtime: finished - slots 8 group auto true quota 1.000000 maxAllowed 8.000000 nodemaxAllowed 0.000000 numsubmits 0 usage 0.000000
09/17/10 04:52:24 negotiationtime: finished - slots 8 group a1 auto false quota 0.000000 maxAllowed 0.000000 nodemaxAllowed 0.000000 numsubmits 0 usage 0.000000
09/17/10 04:52:24 negotiationtime: finished - slots 8 group a1.a3 auto false quota 0.000000 maxAllowed 0.000000 nodemaxAllowed 0.000000 numsubmits 0 usage 0.000000
09/17/10 04:52:24 negotiationtime: finished - slots 8 group a1.a3.a3 auto false quota 0.200000 maxAllowed 0.000000 nodemaxAllowed 0.000000 numsubmits 0 usage 0.000000
09/17/10 04:52:24 ---------- Finished Negotiation Cycle ----------
# ps ax | grep condor
7587 ? Ss 0:00 condor_master -pidfile /var/run/condor/condor_master.pid
7588 ? Ss 0:00 condor_collector -f
7590 ? Ss 0:00 condor_negotiator -f
7591 ? Ss 0:00 condor_schedd -f
7592 ? Ss 0:05 condor_startd -f
7593 ? S 0:00 condor_procd -A /var/run/condor/procd_pipe.SCHEDD -S 60 -C 64
No regression found on current packages.
>>> VERIFIED
Comment 6 and Comment 7 -> Bug 636271 Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: An incorrect configuration of the negotiator resulted in a segmentation fault. This occurred when the 'quota' variable was set to 0 for a group that had supgroups. With this update, the segmentation fault no longer occurs in this situation. Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1 +1 @@ -An incorrect configuration of the negotiator resulted in a segmentation fault. This occurred when the 'quota' variable was set to 0 for a group that had supgroups. With this update, the segmentation fault no longer occurs in this situation.+An incorrect configuration of the negotiator resulted in a segmentation fault. This occurred when the 'quota' variable was set to 0 for a group that had subgroups. With this update, the segmentation fault no longer occurs in this situation. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0773.html |
Created attachment 441579 [details] patch for zero quota Negotiator will core if the config sets a quota=0 for a group that has sub groups. GROUP_QUOTA_DYNAMIC_a1.a3 = 0 GROUP_QUOTA_DYNAMIC_a1.a3.a3 = .2 This is due to the a1.a3 group not being added into the array of groups. This is somewhat a matter of config in that if the admin sets GROUP_QUOTA_DYNAMIC_a1.a3 = 0 they should also set GROUP_QUOTA_DYNAMIC_a1.a3.a3 = 0 which would avoid the problem. But an incorrect config shouldn't really core the negotiator.