Bug 709659 - condor doesn't delete temporary directories on Windows and jobs don't run on windows 2003
Summary: condor doesn't delete temporary directories on Windows and jobs don't run on ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor
Version: Development
Hardware: Unspecified
OS: Windows
high
high
Target Milestone: 2.0
: ---
Assignee: Timothy St. Clair
QA Contact: Martin Kudlej
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-06-01 10:18 UTC by Martin Kudlej
Modified: 2011-06-27 14:21 UTC (History)
3 users (show)

Fixed In Version: condor-win-7.6.1-0.10
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-06-27 14:21:16 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Martin Kudlej 2011-06-01 10:18:05 UTC
Description of problem:
I've tried simple jobs for Windows submitted from Linux:
universe = vanilla
executable = /root/wait.bat
arguments = 1
requirements = ( Arch=="Intel" || Arch=="x86_64" ) && ( OpSys=="WINNT51" || OpSys=="WINNT52" || OpSys=="WINNT60" || OpSys=="WINNT61" )
should_transfer_files = YES
when_to_transfer_output = ON_EXIT
iwd = /tmp

queue 10

$cat /root/wait.bat
@ping 127.0.0.1 -n %1% -w 1000 > nul

and it works for all supported Windows versions except of Windows 2003.
I think it doesn't work because Condor doesn't delete temporary directories \execute\dir_*

Version-Release number of selected component (if applicable):
condor-win-7.6.1-0.8
Windows 2003 -> doesn't execute jobs, condor doesn't delete temporary directories
other Windows versions -> jobs run, condor doesn't delete temporary directories

On all machine is the same condor configuration.
Group "Internal Default Group":
Group ID: 1
Name: Internal Default Group
Features (priority: name):
  0: ExecuteNode
  1: Master
  2: NodeAccess
  3: ConsoleMaster
  4: ConsoleExecuteNode
  5: QMF
Parameters:
  SEC_DEFAULT_AUTHENTICATION_METHODS = claimtobe
  QMF_BROKER_HOST = host1
  ALLOW_WRITE = *
  UID_DOMAIN = host1
  NUM_CPUS = 10
  START = true
  ABORT_ON_EXCEPTION = false
  CREATE_CORE_FILES = true
  ALLOW_READ = *
  SEC_CLIENT_AUTHENTICATION_METHODS = claimtobe
  CONDOR_HOST = host1


How reproducible:
100%

Steps to Reproduce:
1. install condor on Linux as CM + scheduler
2. install condor on Windows as execute nodes
3. run above job
  
Actual results:
Condor doesn't delete temporary directories and because of that jobs don't run on Windows 2003.

Expected results:
Condor will delete temporary directories and all jobs will run on all supported Windows versions.

Comment 4 Timothy St. Clair 2011-06-01 15:12:21 UTC
In investigating I noticed "something" had stale file handles open and did not allow me to remove.  If the condition is recreated, use procexp.exe to determine what is keeping the file handles open, it is the root cause of why the directories are not removed.  

This can happen with some 3rd party apps that scan, and anti-virus software is notoriously bad in this area.

Comment 7 Timothy St. Clair 2011-06-02 16:37:44 UTC
So it seems the upstream introduction of:

WINDOWS_RMDIR = ...condor_rmdir.exe 

is failing outright, with no information.  It appears to be causing the root corner case error it was originally trying to solve.  When removing the param and defaulting to hard coded system shell: rmdir /s /q, it appears to correctly remove the directories.

Comment 9 Martin Kudlej 2011-06-03 15:18:14 UTC
Tested on all supported versions of Windows with condor-win-7.6.1-0.10 and it works. --> VERIFIED


Note You need to log in before you can comment on or make changes to this bug.