Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 522292

Summary: DAGMan/DAG submission version compatibility improvements
Product: Red Hat Enterprise MRG Reporter: Pete MacKinnon <pmackinn>
Component: gridAssignee: Pete MacKinnon <pmackinn>
Status: CLOSED ERRATA QA Contact: Jan Sarenik <jsarenik>
Severity: medium Docs Contact:
Priority: medium    
Version: 1.1.6CC: iboverma, jsarenik, jthomas, lbrindle, matt, tao
Target Milestone: 1.2   
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Grid enhancement DAGMan submission version compatibility checking was improved, reducing the need for he -allowVersionMismatch option on condor_dagman.
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-12-03 09:16:01 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 527551    
Attachments:
Description Flags
OO spreadsheet with mixed-version UW dag tests
none
Patch to add compatability checking
none
Test scripts used for modifying args and testing UW dag tests
none
Test script running unmodified dagman tests
none
Test scripts
none
shell script none

Description Pete MacKinnon 2009-09-10 01:27:30 UTC
Currently, there is an option on condor_dagman (-allowVersionMismatch) that simply is a brute force check as to whether the condor_submit_dag and condor_dagman executables are the same. If not, the option simply logs the warning and allows the user to do submissions assuming that they are fully aware that a version mismatch is in play.

We should look for ways to:
a) identify where compatibility between the two components is specifically an issue
b) have smarter option(s) for compatibility checking

Comment 1 Pete MacKinnon 2009-09-10 01:29:37 UTC
Our focus is from 7.2 onward

Comment 2 Pete MacKinnon 2009-09-10 16:24:36 UTC
By "same", that is to say "same version level"

Comment 3 Pete MacKinnon 2009-09-24 22:03:15 UTC
Initial compatibility tests:

- 5 concurrent diamond dags
- 4 nodes each
- AllowVersionMismatch set
- dagman argument changed for different versions

Results:
7.4.0-0.4 dag submit to 7.2.4 dagman - no issues
7.2.4 dag submit to 7.4.0-0.4 dagman - no issues

Other tests:
- rescue compatibility (old & new)
- tweaking dag logging to scrutinize new lazy log behavior

Comment 4 Pete MacKinnon 2009-09-30 14:42:36 UTC
Will try another round using the UW DAG tests

Comment 5 Pete MacKinnon 2009-10-02 19:08:47 UTC
Created attachment 363516 [details]
OO spreadsheet with mixed-version UW dag tests

Ran the 42 UW dagman tests with baselines for 7.2.4 and 7.4.0-0.4, and submit/dagman mixes. Stork flunked out on all but I wasn't setup for stork testing.

In general, there are 2 cautionary areas:
1) gittrac #435: dagman core dump when dag has POST script and all submits fail
2) default node logs

We can call these out in our Release Notes. Also, I will provide a patch that essentially relaxes the hard -AllowVersionMismatch restriction. Possible solutions:
1) maintain a data structure that maps versions to feature compat (just for dagman). This would mean more accurate version compat checking at runtime.
2) simply log a warning if the versions are both > 7.2 and continue

Comment 6 Pete MacKinnon 2009-10-08 14:43:18 UTC
Created attachment 364126 [details]
Patch to add compatability checking

These code changes make use of the CondorVersionInfo class which assumes backward-compatabilty. Safe for now but we haven't tested earlier than 7.2.4.

Comment 7 Matthew Farrellee 2009-10-08 16:03:07 UTC
Built into 7.4.0-0.6

Comment 8 Pete MacKinnon 2009-10-12 14:56:08 UTC
Created attachment 364478 [details]
Test scripts used for modifying args and testing UW dag tests

Comment 12 Irina Boverman 2009-10-28 17:33:00 UTC
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
DAGMan/DAG submission version compatibility checking was improved when using -allowVersionMismatch option on condor_dagman (522292)

Comment 13 Matthew Farrellee 2009-10-28 17:44:25 UTC
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1 +1 @@
-DAGMan/DAG submission version compatibility checking was improved when using -allowVersionMismatch option on condor_dagman (522292)+DAGMan/DAG submission version compatibility checking was improved, reducing the need for -allowVersionMismatch option on condor_dagman (522292)

Comment 14 Lana Brindley 2009-11-04 02:35:04 UTC
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1 +1,3 @@
+Grid enhancement
+
 DAGMan/DAG submission version compatibility checking was improved, reducing the need for -allowVersionMismatch option on condor_dagman (522292)

Comment 15 Lana Brindley 2009-11-04 03:15:05 UTC
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1,3 +1,3 @@
 Grid enhancement
 
-DAGMan/DAG submission version compatibility checking was improved, reducing the need for -allowVersionMismatch option on condor_dagman (522292)+DAGMan submission version compatibility checking was improved, reducing the need for he -allowVersionMismatch option on condor_dagman.

Comment 16 Jan Sarenik 2009-11-06 13:32:18 UTC
Should all tests contained in list_dagman succeed?

I mean the original versions from condor-test-7.4.1-0.4.el5
now, not the modified by scripts attached to this BZ before.

Comment 17 Pete MacKinnon 2009-11-06 20:18:54 UTC
Jan,

Not sure what you mean by "list_dagman". All 42 of the upstream 7.4.0 dagman tests as of Oct 1 EXCEPT for the two stork tests (stork wasn't configured) passed for me when testing compatibility (CSD & DM at same version level). Please refer to the previously attached spreadsheet.

I see that a few more dagman tests have crept into 7.4.1-0.4. I can't speak to those since I haven't had a chance to test them yet. Let me know if your test results differ.

\Pete

Comment 18 Jan Sarenik 2009-11-09 08:48:51 UTC
Just to clarify "list_dagman", I meant:
/usr/libexec/condor/test/condor_tests/list_dagman
which belongs to package condor-test

Comment 19 Jan Sarenik 2009-11-09 13:52:33 UTC
Created attachment 368210 [details]
Test script running unmodified dagman tests

Is there anything wrong with this script or is it buggy
condor when job_dagman_large_dag.run always fails
on condor-7.4.1-0.4.el5 ?

Comment 20 Pete MacKinnon 2009-11-11 17:45:15 UTC
job_dagman_large_dag.run ran fine for me on F11 with 7.4.1-0.5.

Where exactly is this list_dagman test stored?

Comment 22 Jan Sarenik 2009-11-12 09:55:18 UTC
I am running the tests together with modifications
made by add_dag_args.pl now. It seems they are running
fine. Expect this bug to be VERI in few hours.

Comment 23 Jan Sarenik 2009-11-12 13:17:51 UTC
Created attachment 369211 [details]
Test scripts

Comment 24 Jan Sarenik 2009-11-12 14:10:45 UTC
Now I know why job_dagman_large_dag.run is failing on RHEL,
the packages contain file
/usr/libexec/condor/test/condor_tests/create_large_dag
which should be executable (is called from that test)
and is packaged as not executable.

I will ping MattF to rebuild packages if possible.

Comment 25 Jan Sarenik 2009-11-12 14:11:20 UTC
Quick fix:

chmod a+x /usr/libexec/condor/test/condor_tests/create_large_dag

Comment 26 Jan Sarenik 2009-11-12 14:15:57 UTC
Besides this one, I have not noticed any unexpected behavior
during my testing, including the tests with modified arguments.
I did all the tests on all available platforms,
(RHEL4,RHEL5) x (i386,x86_64)

Comment 27 Matthew Farrellee 2009-11-12 21:15:04 UTC
The exec bit should be set after 7.4.1-0.5

Comment 28 Jan Sarenik 2009-11-16 08:13:08 UTC
I forgot to do 7.2 submit to 7.4 and backwards.
Working on it now.

Comment 29 Jan Sarenik 2009-11-16 12:24:38 UTC
With old condor_submit_dag (from condor-7.2.2-0.9.el5.i386.rpm)
submitting to normally installed condor-7.4.1-0.5.el5.i386.rpm
tests are being submitted.

With new condor_submit_dag (from condor-7.4.1-0.5.el5.i386.rpm)
trying to submit dags to condor-7.2.2-0.9.el5.i386.rpm, I am getting
this error and assume this behavior is expected:

----------------------------------------------------------------------------
11/16 13:19:15 Version mismatch: condor_submit_dag ($CondorVersion: 7.4.1 Nov  9 2009 BuildID: RH-7.4.1-0.5.el5 PRE-RELEASE $) vs. condor_dagman ($CondorVersion: 7.2.1 Mar 25 2009 BuildID: RH-7.2.2-0.9.el5 $)
11/16 13:19:15 **** condor_scheduniv_exec.1.0 (condor_DAGMAN) pid 3575 EXITING WITH STATUS 1

Comment 30 Pete MacKinnon 2009-11-16 14:36:36 UTC
The newer CSD to older dagamn behaviour is expected since it is enforced by the dagman executable (and whatever version mismatch logic it has).

Comment 31 Jan Sarenik 2009-11-16 14:56:12 UTC
Created attachment 369709 [details]
shell script

This script was used to verify the bug on supported architectures.

Comment 32 errata-xmlrpc 2009-12-03 09:16:01 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-1633.html