Hide Forgot
Description of problem: There was useful description about job matching in "condor_q -better-analyze": 2770.000: Run analysis summary. Of 12 machines, 0 are rejected by your job's requirements 4 reject your job because of their own requirements 0 match but are serving users with a better priority in the pool 8 match but reject the job for unknown reasons 0 match but will not currently preempt their existing job 0 match but are currently offline 0 are available to run your job The following attributes are missing from the job ClassAd: CheckpointPlatform and now there is just: Request 7.0 did not match any resource's constraints There is the same output from -analyze as from -better-analyze so I think they are same parameters now. But I think there should be possibility to get information in old format about job matching. Version-Release number of selected component (if applicable): condor-7.6.1-0.4.el6.i686 How reproducible: 100% Steps to Reproduce: 1. install condor pool 2. submit job which doesn't match with any slot 3. condor_q -bet
output of condor_q -l: RECIPE = "SECRET_SAUCE" LastJobStatus = 0 ImageSize_RAW = 0 Submission = "python_test_submit" Cmd = "/bin/echo" ImageSize = 0 PeriodicRemove = false Iwd = "/tmp" PeriodicHold = false JobStatus = 1 ClusterId = 7 RemoteUserCpu = 0.0 MinHosts = 1 JobUniverse = 5 PeriodicRelease = false ScheddBday = 1303994942 Requirements = ( FileSystemDomain =!= undefined && Arch =!= undefined ) ShouldTransferFiles = "NO" GlobalJobId = "_hostname_#7.0#1303990674" LastRejMatchReason = "no match found" MaxHosts = 1 ServerTime = 1303995411 ProcId = 0 CurrentHosts = 0 OnExitRemove = true AutoClusterAttrs = "ImageSize,JobUniverse,LastCheckpointPlatform,NumCkpts,JobStart,RequestCpus,RequestDisk,RequestMemory,LastPeriodicCheckpoint,Requirements,NiceUser,ConcurrencyLimits" AutoClusterId = 2 TargetType = "Machine" QDate = 1303990674 OnExitHold = false JobPrio = 0 Args = "test_hu" CurrentTime = time() User = "root@_hostname_" LastRejMatchTime = 1303995394 MyType = "Job" Owner = "root"
For another job: RECIPE = "SECRET_SAUCE" LastJobStatus = 0 ImageSize_RAW = 0 Submission = "python_test_submit" ImageSize = 0 Cmd = "/bin/sleep" PeriodicRemove = false Iwd = "/tmp" PeriodicHold = false JobStatus = 1 ClusterId = 8 RemoteUserCpu = 0.0 MinHosts = 1 JobUniverse = 5 PeriodicRelease = false Requirements = ( TARGET.Arch =!= undefined ) && ( TARGET.OpSys == "LINUX" ) && ( TARGET.Disk >= 0 ) && ( ( TARGET.Memory * 1024 ) >= 0 ) && ( TARGET.FileSystemDomain =!= undefined ) ShouldTransferFiles = "NO" GlobalJobId = "_hostname_#8.0#1303995376" LastRejMatchReason = "no match found" MaxHosts = 1 ServerTime = 1303995695 ProcId = 0 CurrentHosts = 0 OnExitRemove = true AutoClusterAttrs = "ImageSize,JobUniverse,LastCheckpointPlatform,NumCkpts,JobStart,RequestCpus,RequestDisk,RequestMemory,LastPeriodicCheckpoint,Requirements,NiceUser,ConcurrencyLimits" AutoClusterId = 4 TargetType = "Machine" QDate = 1303995376 OnExitHold = false JobPrio = 0 Args = "120" CurrentTime = time() User = "root@_hostname_" LastRejMatchTime = 1303995684 MyType = "Job" Owner = "root" it has strange output: Reason for last match failure: no match found The Requirements expression for your job is: ( TARGET.Arch isnt undefined ) && ( TARGET.OpSys == "LINUX" ) && ( TARGET.Disk >= 0 ) && ( ( TARGET.Memory * 1024 ) >= 0 ) && ( TARGET.FileSystemDomain isnt undefined ) Condition Machines Matched Suggestion --------- ---------------- ---------- 1 ( TARGET.OpSys == "LINUX" ) 2 2 ( TARGET.Arch isnt undefined ) 17 3 ( TARGET.Disk >= 0 ) 17 4 ( ( 1024 * TARGET.Memory ) >= 0 ) 17 5 ( TARGET.FileSystemDomain isnt undefined )17 Does it means that 2 machines matched or "no match found"?
I am getting what appears to be correct and reasonable output from condor_q -bet: $ condor_q -bet -- Submitter: localhost.localdomain : <192.168.1.2:59694> : localhost.localdomain --- 009.000: Run analysis summary. Of 20 machines, 20 are rejected by your job's requirements 0 reject your job because of their own requirements 0 match but are serving users with a better priority in the pool 0 match but reject the job for unknown reasons 0 match but will not currently preempt their existing job 0 match but are currently offline 0 are available to run your job No successful match recorded. Last failed match: Tue May 3 11:36:02 2011 Reason for last match failure: no match found WARNING: Be advised: No resources matched request's constraints The Requirements expression for your job is: ( ( TARGET.Arch is "unobtainium" ) ) && ( TARGET.OpSys == "LINUX" ) && ( TARGET.Disk >= DiskUsage ) && ( ( TARGET.Memory * 1024 ) >= ImageSize ) && ( ( RequestMemory * 1024 ) >= ImageSize ) && ( ( TARGET.HasFileTransfer ) || ( TARGET.FileSystemDomain == MY.FileSystemDomain ) ) Condition Machines Matched Suggestion --------- ---------------- ---------- 1 ( ( TARGET.Arch is "unobtainium" ) )0 REMOVE 2 ( TARGET.OpSys == "LINUX" ) 20 3 ( TARGET.Disk >= 30 ) 20 4 ( ( 1024 * TARGET.Memory ) >= 30 )20 5 ( ( 1024 * ceiling(ifThenElse(JobVMMemory isnt undefined,JobVMMemory,2.929687500000000E-02)) ) >= 30 ) 20 6 ( ( TARGET.HasFileTransfer ) || ( TARGET.FileSystemDomain == "localhost.localdomain" ) ) 20
(In reply to comment #0) What I'm seeing looks correct. Can you attach your config and the job submission file you used so that I can try a more precise repro?
Another example with user condor: Out = "/tmp/mrg_$(Cluster).$(Process).out" LastJobStatus = 0 ImageSize_RAW = 0 Submission = "host1#30" ImageSize = 0 cmd = "/bin/sleep" PeriodicRemove = false iwd = "/tmp" PeriodicHold = false JobStatus = 1 ClusterId = 30 RemoteUserCpu = 0.0 MinHosts = 1 JobUniverse = 5 PeriodicRelease = false requirements = true ShouldTransferFiles = "NO" GlobalJobId = "host1#30.0#1304691827" UserLog = "/tmp/mrg_$(Cluster).$(Process).log" MaxHosts = 1 ServerTime = 1304691941 ProcId = 0 Err = "/tmp/mrg_$(Cluster).$(Process).err" CurrentHosts = 0 OnExitRemove = true AutoClusterAttrs = "ImageSize,JobUniverse,LastCheckpointPlatform,NumCkpts,JobStart,RequestCpus,RequestDisk,RequestMemory,LastPeriodicCheckpoint,Requirements,NiceUser,ConcurrencyLimits" AutoClusterId = 1 TargetType = "Machine" QDate = 1304691827 OnExitHold = false JobPrio = 0 args = "1" CurrentTime = time() User = "condor@host2" MyType = "Job" owner = "condor" QMF submit dictionary: {'iwd': '/tmp', 'requirements': 'TRUE', '!!descriptors': {'requirements': 'com.redhat.grid.Expression'}, 'args': '1', 'cmd': '/bin/sleep', 'Err': '/tmp/mrg_$(Cluster).$(Process).err', 'UserLog': '/tmp/mrg_$(Cluster).$(Process).log', 'JobUniverse': 5, 'owner': 'condor', 'Out': '/tmp/mrg_$(Cluster).$(Process).out'} And output from condor_q -bet is just: -- Submitter: host1 : <ip1:35215> : host1 Request 30.0 did not match any resource's constraints
It may help if you set NEGOTIATOR_DEBUG = D_FULLDEBUG | D_MATCH, and attach the log output of the negotiation cycle attempting to match one of these idle jobs. Also, set TOOL_DEBUG = D_FULLDEBUG, and attach output from condor_q -bet
Created attachment 497756 [details] host1 configuration and logs
Created attachment 497757 [details] host2 configuration and logs