Created attachment 495760 [details] condor config, condor log files, condor_q log file Description of problem: It is not possible to get all classads by "condor_q -l" for some jobs. Version-Release number of selected component (if applicable): condor-7.6.1-0.4.el6.i686 How reproducible: 100% Steps to Reproduce: 1. install condor pool with aviary, QMF, Dynamic slots 2. submit simple job via QMF 3. condor_q -l _clusterid_ Actual results: It is no possible to get "long" information by "condor_q -l" Expected results: It will be possible to get "long" information by "condor_q -l". Simple job: 'universe' : 'vanilla', 'executable' : '/bin/sleep', 'arguments' : '1', 'error' : '/tmp/mrg_$(Cluster).$(Process).err', 'output' : '/tmp/mrg_$(Cluster).$(Process).out', 'log' : '/tmp/mrg_$(Cluster).$(Process).log', 'iwd' : '/tmp', 'requirements' : '(FileSystemDomain =!= UNDEFINED && Arch =!= UNDEFINED)', 'queue': '1',
What is the frequency of this
I suspect an issue with evaluating the Error attribute, possible a reserved word in ClassAds. Side note - Output includes $(Cluster).$(Process), those should be evaluated during submission. The job has no Cluster nor Process attribute. FYI, the stderr file attribute is Err.
I've done it via aviary/qmf. This is QMF simple job: ad = {"cmd": "/bin/sleep", "args": "1", "requirements": "(FileSystemDomain =!= UNDEFINED && Arch =!= UNDEFINED)", "iwd": "/tmp", "owner": "condor", 'error' : '/tmp/mrg_$(Cluster).$(Process).err', 'output' : '/tmp/mrg_$(Cluster).$(Process).out', 'log' : '/tmp/mrg_$(Cluster).$(Process).log', "!!descriptors": {"requirements": "com.redhat.grid.Expression"} } Is anywhere any list of condor reserver classads which should not be submitted because it can raise error? How can I process variables like $(Cluster) via QMF and Aviary? How does process them condor_submit?
(In reply to comment #4) > I suspect an issue with evaluating the Error attribute, possible a reserved > word in ClassAds. > > FYI, the stderr file attribute is Err. In support of that theory, condor_submit appears to forbid including 'error' as a variable: $ cjs -n 1 -dur 600 -a '+error = "/tmp/error.txt"' using temp dir /tmp/cjs_jsub_ReyEv2 for jsub files preparing 1 jobs in submission file: /tmp/cjs_jsub_ReyEv2/cjs.jsub submitting 1 jobs via jsub file /tmp/cjs_jsub_ReyEv2/cjs.jsub Submitting job(s) ERROR: Parse error in expression: error = "/tmp/error.txt" ^^^ Error in submit file WARNING! submit failed with code 1 It works OK if you use something not exactly 'error': [eje@rorschach hfs_func_tests]$ cjs -n 1 -dur 600 -a '+erro = "/tmp/error.txt"' using temp dir /tmp/cjs_jsub_QNJXVR for jsub files preparing 1 jobs in submission file: /tmp/cjs_jsub_QNJXVR/cjs.jsub submitting 1 jobs via jsub file /tmp/cjs_jsub_QNJXVR/cjs.jsub Submitting job(s). 1 job(s) submitted to cluster 15. submit was successful I assume the QMF/aviary connection is that they somehow by-pass this check on submission, until 'condor_q -l' hits it.
(In reply to comment #5) > How can I process variables like $(Cluster) via QMF and Aviary? How does > process them condor_submit? I was under the impression that $(cluster) works correctly. There is also a '$$()' syntax that evaluates at execution time: See http://www.cs.wisc.edu/condor/manual/v7.6/2_5Submitting_Job.html#SECTION00356100000000000000 A special-purpose Machine Ad substitution macro can be used in string attributes in the submit description file. The macro has the form $$(MachineAdAttribute) The $$() informs Condor to substitute the requested MachineAdAttribute from the machine where the job will be executed.
(In reply to comment #5) > Is anywhere any list of condor reserver classads which should not be submitted > because it can raise error? I notice that the condor doc appears to be wrong: http://www.cs.wisc.edu/condor/manual/v7.6/2_5Submitting_Job.html#SECTION00351200000000000000 #################### # # Example 2: demonstrate use of multiple # directories for data organization. # #################### Executable = mathematica Universe = vanilla input = test.data output = loop.out error = loop.error Log = loop.log Initialdir = run_1 Queue Initialdir = run_2 Queue
FYI $ echo 'cmd=/bin/sleep\nargs=1d\nqueue' | condor_submit Submitting job(s). 1 job(s) submitted to cluster 14. $ condor_q -l | grep ClusterId ClusterId = 14 $ condor_qedit 14.0 error 1 Set attribute "error". $ condor_q -l | grep ClusterId -- Failed to fetch ads from: <127.0.0.1:54561> : eeyore.local
(In reply to comment #8) > (In reply to comment #5) > > > Is anywhere any list of condor reserver classads which should not be submitted > > because it can raise error? > > I notice that the condor doc appears to be wrong: > > http://www.cs.wisc.edu/condor/manual/v7.6/2_5Submitting_Job.html#SECTION00351200000000000000 > It is not: you can specify error and output in the job submission file; they are remapped in the resulting classad: .... output=/tmp/foobar.out error=/tmp/foobar.err TransferOutputRemaps = "_condor_stdout=/tmp/foobar.out;_condor_stderr=/tmp/foobar.err"
FYI, list of reserved words for classads according to classad 2.4 doc: The following words are reserved, meaning that they may not be used as attribute names. error false is isnt parent true undefined Recognition of reserved words is independent of case. For example, false, FALSE, and False are all reserved words. http://www.cs.wisc.edu/condor/classad/refman/node3.html#SECTION00031000000000000000
I've confirmed with Pete that neither qmf nor aviary filter 'error' in the same way that condor_submit does. So this construction should use 'err' instead of 'error': ad = {"cmd": "/bin/sleep", "args": "1", "requirements": "(FileSystemDomain =!= UNDEFINED && Arch =!= UNDEFINED)", "iwd": "/tmp", "owner": "condor", 'error' : '/tmp/mrg_$(Cluster).$(Process).err', 'output' : '/tmp/mrg_$(Cluster).$(Process).out', 'log' : '/tmp/mrg_$(Cluster).$(Process).log', "!!descriptors": {"requirements": "com.redhat.grid.Expression"} } I posted an RFE to have aviary catch the use of classad keywords as attribute names Bug 702489 Pete also mentioned that $() and $$() aren't currently handled, which also has an RFE: Bug 702492
Because of comment #13 I close this bug as NOTABUG.