Bug 698791
Summary: | condor_q -better fails where condor_q works | ||||||
---|---|---|---|---|---|---|---|
Product: | [Retired] CloudForms Cloud Engine | Reporter: | wes hayutin <whayutin> | ||||
Component: | aeolus-conductor | Assignee: | Ian Main <imain> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | wes hayutin <whayutin> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 0.3.1 | CC: | akarol, clalance, cpelland, dajohnso, deltacloud-maint, dgao, ssachdev, whayutin | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2011-06-14 16:14:41 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 684278, 697919 | ||||||
Attachments: |
|
Description
wes hayutin
2011-04-21 19:43:39 UTC
Created attachment 493971 [details]
condorLog.txt
recreated on another machine
1 jobs; 1 idle, 0 running, 0 held [root@hp-xw8600-01 ~]# condor_q -- Submitter: hp-xw8600-01.rhts.eng.bos.redhat.com : <10.16.65.43:56162> : hp-xw8600-01.rhts.eng.bos.redhat.com ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 3.0 aeolus 4/26 15:24 0+00:00:00 I 0 0.0 job_test2_2 1 jobs; 1 idle, 0 running, 0 held [root@hp-xw8600-01 ~]# /etc/init.d/condor restart Stopping Condor daemons: [ OK ] Starting Condor daemons: [ OK ] [root@hp-xw8600-01 ~]# condor_q -better -- Submitter: hp-xw8600-01.rhts.eng.bos.redhat.com : <10.16.65.43:47121> : hp-xw8600-01.rhts.eng.bos.redhat.com --- 003.000: Request has not yet been considered by the matchmaker. [root@hp-xw8600-01 ~]# hostname hp-xw8600-01.rhts.eng.bos.redhat.com [root@hp-xw8600-01 ~]# cat /var/lib/condor/condor_config.local ALLOW_WRITE = * ALLOW_ADMINISTRATOR = * ALLOW_NEGOTIATOR = * ALLOW_NEGOTIATOR_SCHEDD = * COLLECTOR_HOST = localhost DAEMON_LIST = MASTER, SCHEDD, COLLECTOR, NEGOTIATOR MAX_GRIDMANAGER_LOG = 500000000 GRIDMANAGER_JOB_PROBE_INTERVAL = 30 GRIDMANAGER_DEBUG = D_FULLDEBUG NEGOTIATOR_DEBUG = D_FULLDEBUG COLLECTOR_DEBUG = D_FULLDEBUG DELTACLOUD_GAHP = $(SBIN)/deltacloud_server CLASSAD_LIFETIME = 0 # for the event log parsing (i.e. dbomatic) EVENT_LOG=$(LOG)/EventLog EVENT_LOG_USE_XML=True EVENT_LOG_JOB_AD_INFORMATION_ATTRS=Owner,GlobalJobId,Cmd,JobStartDate,JobCurrentStartDate,JobFinishedHookDone,DeltacloudProviderId,DeltacloudPublicNetworkAddresses,DeltacloudPrivateNetworkAddresses,DeltacloudAvailableActions,JobStatus,DeltacloudUsername CLASSAD_USER_LIBS = /usr/share/aeolus-conductor/classad_plugin/conductor_classad_plugin.so adding to ce-ami tracker per clalance's advice It does seem that condor jobs that are submitted to start/stop instances do not always work when this issue occurs. 004.000: Request is held. Hold reason: Create_Instance_Failure: InvalidAMIID.NotFound: The AMI ID 'ami-51693a14' does not exist --- 005.000: Request has not yet been considered by the matchmaker. [root@hp-xw8600-01 ~]# condor_rm 04.0 Job 4.0 marked for removal [root@hp-xw8600-01 ~]# condor_q -better -- Submitter: hp-xw8600-01.rhts.eng.bos.redhat.com : <10.16.65.43:47121> : hp-xw8600-01.rhts.eng.bos.redhat.com --- 003.000: Request is being serviced --- 004.000: Request is removed. error: bad form error: problem with ExprToProfile --- 005.000: Run analysis summary. Of 4 machines, 4 are rejected by your job's requirements 0 reject your job because of their own requirements 0 match but are serving users with a better priority in the pool 0 match but reject the job for unknown reasons 0 match but will not currently preempt their existing job 0 match but are currently offline 0 are available to run your job No successful match recorded. Last failed match: Tue Apr 26 16:16:45 2011 Reason for last match failure: no match found WARNING: Be advised: No resources matched request's constraints The Requirements expression for your job is: ( target.front_end_hardware_profile_id == "1" && target.image == "3" && target.realm == "2" && conductor_quota_check(4,other.provider_account_id) ) [root@hp-xw8600-01 ~]# condor_q -- Submitter: hp-xw8600-01.rhts.eng.bos.redhat.com : <10.16.65.43:47121> : hp-xw8600-01.rhts.eng.bos.redhat.com ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 3.0 aeolus 4/26 15:24 0+00:42:38 R 0 0.0 job_test2_2 4.0 aeolus 4/26 15:50 0+00:00:00 X 0 0.0 job_test03_3 5.0 aeolus 4/26 16:16 0+00:00:00 I 0 0.0 job_test05_4 2 jobs; 1 idle, 1 running, 0 held [root@hp-xw8600-01 ~]# condor_q -better -- Submitter: hp-xw8600-01.rhts.eng.bos.redhat.com : <10.16.65.43:47121> : hp-xw8600-01.rhts.eng.bos.redhat.com --- 003.000: Request is being serviced --- 004.000: Request is removed. error: bad form error: problem with ExprToProfile --- 005.000: Run analysis summary. Of 8 machines, 8 are rejected by your job's requirements 0 reject your job because of their own requirements 0 match but are serving users with a better priority in the pool 0 match but reject the job for unknown reasons 0 match but will not currently preempt their existing job 0 match but are currently offline 0 are available to run your job No successful match recorded. Last failed match: Tue Apr 26 16:16:45 2011 Reason for last match failure: no match found WARNING: Be advised: No resources matched request's constraints The Requirements expression for your job is: ( target.front_end_hardware_profile_id == "1" && target.image == "3" && target.realm == "2" && conductor_quota_check(4,other.provider_account_id) ) 4/26/11 16:17:45 submitterAbsShare = 1.000000 04/26/11 16:17:45 submitterLimit = 8.000000 04/26/11 16:17:45 submitterUsage = 0.000000 04/26/11 16:17:45 Socket to aeolus.eng.bos.redhat.com (<10.16.65.43:47121>) already in cache, reusing 04/26/11 16:17:45 Sending SEND_JOB_INFO/eom 04/26/11 16:17:45 Getting reply from schedd ... 04/26/11 16:17:45 Got JOB_INFO command; getting classad/eom 04/26/11 16:17:45 Request 00005.00000: 04/26/11 16:17:45 matchmakingAlgorithm: limit 8.000000 used 0.000000 pieLeft 8.000000 Stack dump for process 28352 at timestamp 1303849065 (25 frames) condor_negotiator(dprintf_dump_stack+0x63)[0x5420c3] condor_negotiator[0x53b392] /lib64/libpthread.so.0[0x3069e0f520] /lib64/libc.so.6(gsignal+0x35)[0x3069632a45] /lib64/libc.so.6(abort+0x175)[0x3069634225] /lib64/libglib-2.0.so.0(g_logv+0x53a)[0x7f3ea97e137a] /lib64/libglib-2.0.so.0(g_log+0x83)[0x7f3ea97e1413] /lib64/libgthread-2.0.so.0(g_thread_init+0x1db)[0x30702028ab] /usr/share/aeolus-conductor/classad_plugin/conductor_classad_plugin.so(_Z21conductor_quota_checkPKcRKSt6vectorIP8ExprTreeSaIS3_EER9EvalStateR5Value+0x210)[0x7f3ea9eed890] condor_negotiator(_ZNK7classad9Operation9_EvaluateERNS_9EvalStateERNS_5ValueE+0x117)[0x4f3d17] condor_negotiator(_ZNK7classad9Operation9_EvaluateERNS_9EvalStateERNS_5ValueE+0x75)[0x4f3c75] condor_negotiator(_ZNK7classad18AttributeReference9_EvaluateERNS_9EvalStateERNS_5ValueE+0xb3)[0x4fdb23] condor_negotiator(_ZNK7classad9Operation9_EvaluateERNS_9EvalStateERNS_5ValueE+0x117)[0x4f3d17] condor_negotiator(_ZNK7classad7ClassAd12EvaluateExprEPKNS_8ExprTreeERNS_5ValueE+0x42)[0x4dc312] condor_negotiator(_ZN7classad12MatchClassAd13EvalMatchExprEPNS_8ExprTreeE+0x34)[0x4f0a34] condor_negotiator(_Z8IsAMatchPN14compat_classad7ClassAdES1_+0xe)[0x53ea7e] condor_negotiator(_ZN10Matchmaker20matchmakingAlgorithmEPKcS1_RN14compat_classad7ClassAdERNS2_27ClassAdListDoesNotDeleteAdsEdddddb+0x3a0)[0x472310] condor_negotiator(_ZN10Matchmaker9negotiateEPKcPKN14compat_classad7ClassAdEdddRNS2_27ClassAdListDoesNotDeleteAdsER9HashTableI8MyStringS9_ERK17CondorVersionInfoblRiRdSG_+0x8c6)[0x477da6] condor_negotiator(_ZN10Matchmaker18negotiateWithGroupEiddRN14compat_classad27ClassAdListDoesNotDeleteAdsER9HashTableI8MyStringS4_ES2_ffPKc+0xe5b)[0x4794bb] condor_negotiator(_ZN10Matchmaker15negotiationTimeEv+0x1060)[0x47adf0] condor_negotiator(_ZN12TimerManager7TimeoutEv+0x129)[0x49bad9] condor_negotiator(_ZN10DaemonCore6DriverEv+0x277)[0x48c447] condor_negotiator(main+0x10db)[0x49a5bb] /lib64/libc.so.6(__libc_start_main+0xfd)[0x306961ec9d] condor_negotiator[0x462639] (END) perm close |