Hide Forgot
Description of problem: submit file: should_transfer_files=YES executable='C:\mrg\ver.bat' iwd='C:\users\admini~1.mku\appdata\local\temp' log='c:\users\admini~1.mku\appdata\local\temp\mrg.log' output='c:\users\admini~1.mku\appdata\local\temp\mrg.out' universe=vanilla arguments=1 error='c:\users\admini~1.mku\appdata\local\temp\mrg.err' when_to_transfer_output=ON_EXIT queue Error message: c:\mrg>C:\condor\bin\condor_submit.exe -name _scheduler_ _file_ Submitting job(s) ERROR: No such directory: C:\mrg\'c:\users\admini~1.mku\appdata\local\temp\' If I remove single and double quotes condor_submit submit file without error, but I see segmentation fault of Shadow: 06/09/11 04:04:09 Initializing a VANILLA shadow for job 359776.0 06/09/11 04:04:09 (359776.0) (18936): WriteUserLog::initialize: safe_open_wrapper("c:\users\admini~1.mku\appdata\local\temp/c:\users\admini~1.mku\appdata\local\temp\mrg_1.1.log") failed - errno 2 (No such file or directory) 06/09/11 04:04:09 (359776.0) (18936): WriteUserLog::initialize: failed to open file 06/09/11 04:04:09 (359776.0) (18936): Failed to initialize user log to c:\users\admini~1.mku\appdata\local\temp/c:\users\admini~1.mku\appdata\local\temp\mrg_1.1.log 06/09/11 04:04:09 (359776.0) (18936): Job 359776.0 going into Hold state (code 22,0): Failed to initialize user log to c:\users\admini~1.mku\appdata\local\temp/c:\users\admini~1.mku\appdata\local\temp\mrg_1.1.log 06/09/11 04:04:09 (359776.0) (18936): RemoteResource::killStarter(): DCStartd object NULL! Stack dump for process 18936 at timestamp 1307606649 (13 frames) condor_shadow(dprintf_dump_stack+0x4a)[0x817b1da] condor_shadow[0x814e4f6] [0xe7e420] condor_shadow(_ZN10BaseShadow16updateJobInQueueE8update_t+0xb2)[0x80b0442] condor_shadow(_ZN10BaseShadow7holdJobEPKcii+0x12b)[0x80b22cb] condor_shadow(_ZN10BaseShadow11initUserLogEv+0x131)[0x80b2561] condor_shadow(_ZN10BaseShadow8baseInitEPN14compat_classad7ClassAdEPKcS4_+0x2f9)[0x80b34c9] condor_shadow(_ZN9UniShadow4initEPN14compat_classad7ClassAdEPKcS4_+0x41)[0x80a6711] condor_shadow(_Z10initShadowPN14compat_classad7ClassAdE+0xd0)[0x80aab60] condor_shadow(_Z11startShadowPN14compat_classad7ClassAdE+0x68)[0x80aacb8] condor_shadow(main+0x1162)[0x80eae12] /lib/libc.so.6(__libc_start_main+0xdc)[0x5f1e9c] condor_shadow[0x80a5f31] Similar situation is for similar QMF jobs(submitted from Windows): $ condor_history -back -l 55159 | sort args = "1" AutoClusterAttrs = "ImageSize,JobUniverse,LastCheckpointPlatform,NumCkpts,JobStart,LastPeriodicCheckpoint,RequestCpus,RequestDisk,RequestMemory,Requirements,NiceUser,ConcurrencyLimits" AutoClusterId = 15 ClusterId = 55159 cmd = "ver.bat" CumulativeSlotTime = 7.000000 CurrentHosts = 0 CurrentTime = time() EnteredCurrentStatus = 1307612747 Err = "c:\users\admini~1.mku\appdata\local\temp\mrg_1.1.err" GlobalJobId = "_scheduler_#55159.0#1307612629" ImageSize = 0 ImageSize_RAW = 0 iwd = "c:\" JobCurrentStartDate = 1307612746 JobFinishedHookDone = 1307612748 JobLastStartDate = 1307612745 JobPrio = 0 JobRunCount = 69 JobStartDate = 1307612637 JobStatus = 3 JobUniverse = 5 LastJobLeaseRenewal = 1307612746 LastJobStatus = 1 LastMatchTime = 1307612746 LastPublicClaimId = "<_ip_:1068>#1307501598#424#..." LastRemoteHost = "slot2@mkudlej_windows2003_64" LastSuspensionTime = 0 MachineAttrCpus0 = 1 MachineAttrSlotWeight0 = 1 MaxHosts = 1 MinHosts = 1 MyType = "Job" NumJobMatches = 69 NumShadowStarts = 69 OrigMaxHosts = 1 Out = "c:\users\admini~1.mku\appdata\local\temp\mrg_1.1.out" owner = "condor" ProcId = 0 QDate = 1307612629 RemoteUserCpu = 0.0 RemoteWallClockTime = 7.000000 RemoveReason = "via condor_rm (by user condor)" requirements = ( FileSystemDomain =!= undefined ) && ( Arch =!= undefined ) && ( OpSys == "WINNT51" || OpSys == "WINNT52" || OpSys == "WINNT60" || OpSys == "WINNT61" ) ShouldTransferFiles = "NO" should_transfer_files = "YES" StartdPrincipal = "unauthenticated@unmapped/_ip_" Submission = "_scheduler_#55159" TargetType = "Machine" User = "condor@_broker_" UserLog = "c:\users\admini~1.mku\appdata\local\temp\mrg_1.1.log" when_to_transfer_output = "ON_EXIT" $ tail ShadowLog 06/09/11 05:45:48 Setting maximum accepts per cycle 4. 06/09/11 05:45:48 Initializing a VANILLA shadow for job 55148.0 06/09/11 05:45:48 (55148.0) (30261): WriteUserLog::initialize: safe_open_wrapper("c:\/c:\users\admini~1.mku\appdata\local\temp\mrg_1.1.log") failed - errno 2 (No such file or directory) 06/09/11 05:45:48 (55148.0) (30261): WriteUserLog::initialize: failed to open file 06/09/11 05:45:48 (55148.0) (30261): Failed to initialize user log to c:\/c:\users\admini~1.mku\appdata\local\temp\mrg_1.1.log 06/09/11 05:45:48 (55148.0) (30261): Job 55148.0 going into Hold state (code 22,0): Failed to initialize user log to c:\/c:\users\admini~1.mku\appdata\local\temp\mrg_1.1.log 06/09/11 05:45:48 (55148.0) (30261): RemoteResource::killStarter(): DCStartd object NULL! Stack dump for process 30261 at timestamp 1307612748 (13 frames) condor_shadow(dprintf_dump_stack+0x44)[0x81322d4] condor_shadow[0x815ae87] [0xed6400] condor_shadow(_ZN10BaseShadow16updateJobInQueueE8update_t+0xb1)[0x80bac81] condor_shadow(_ZN10BaseShadow7holdJobEPKcii+0x131)[0x80bc651] condor_shadow(_ZN10BaseShadow11initUserLogEv+0x113)[0x80bc863] condor_shadow(_ZN10BaseShadow8baseInitEPN14compat_classad7ClassAdEPKcS4_+0x2cb)[0x80bd56b] condor_shadow(_ZN9UniShadow4initEPN14compat_classad7ClassAdEPKcS4_+0x31)[0x80ad831] condor_shadow(_Z10initShadowPN14compat_classad7ClassAdE+0xcd)[0x80b2c1d] condor_shadow(_Z11startShadowPN14compat_classad7ClassAdE+0x62)[0x80b2d92] condor_shadow(main+0x13a5)[0x80e3045] /lib/libc.so.6(__libc_start_main+0xe6)[0x585cc6] condor_shadow[0x80a20f1] I've read https://bugzilla.redhat.com/show_bug.cgi?id=610265#c6 and according this table submitted classads are ok and starter should execute job in "EXECUTE/exec". All paths are absolute but Shadow doesn't recognized them as absolute. So shadow crashes: $ gdb `which condor_shadow` core.30331 GNU gdb (GDB) Red Hat Enterprise Linux (7.2-48.el6) Copyright (C) 2010 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "i686-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /usr/sbin/condor_shadow...Reading symbols from /usr/lib/debug/usr/sbin/condor_shadow.debug...done. done. [New Thread 30331] Missing separate debuginfo for Try: yum --disablerepo='*' --enablerepo='*-debuginfo' install /usr/lib/debug/.build-id/23/e5a71140d5c8345a1c915447c466c23f43dc02 Reading symbols from /lib/libdl-2.12.so...Reading symbols from /usr/lib/debug/lib/libdl-2.12.so.debug...done. done. Loaded symbols for /lib/libdl-2.12.so Reading symbols from /usr/lib/libclassad.so.1.1.0...Reading symbols from /usr/lib/debug/usr/lib/libclassad.so.1.1.0.debug...done. done. Loaded symbols for /usr/lib/libclassad.so.1.1.0 Reading symbols from /lib/libexpat.so.1.5.2...Reading symbols from /usr/lib/debug/lib/libexpat.so.1.5.2.debug...done. done. Loaded symbols for /lib/libexpat.so.1.5.2 Reading symbols from /lib/libpcre.so.0.0.1...Reading symbols from /usr/lib/debug/lib/libpcre.so.0.0.1.debug...done. done. Loaded symbols for /lib/libpcre.so.0.0.1 Reading symbols from /usr/lib/libssl.so.1.0.0...Reading symbols from /usr/lib/debug/usr/lib/libssl.so.1.0.0.debug...done. done. Loaded symbols for /usr/lib/libssl.so.1.0.0 Reading symbols from /usr/lib/libcrypto.so.1.0.0...Reading symbols from /usr/lib/debug/usr/lib/libcrypto.so.1.0.0.debug...done. done. Loaded symbols for /usr/lib/libcrypto.so.1.0.0 Reading symbols from /lib/libkrb5.so.3.3...Reading symbols from /usr/lib/debug/lib/libkrb5.so.3.3.debug...done. done. Loaded symbols for /lib/libkrb5.so.3.3 Reading symbols from /lib/libcom_err.so.2.1...Reading symbols from /usr/lib/debug/lib/libcom_err.so.2.1.debug...done. done. Loaded symbols for /lib/libcom_err.so.2.1 Reading symbols from /lib/libk5crypto.so.3.1...Reading symbols from /usr/lib/debug/lib/libk5crypto.so.3.1.debug...done. done. Loaded symbols for /lib/libk5crypto.so.3.1 Reading symbols from /lib/libkrb5support.so.0.1...Reading symbols from /usr/lib/debug/lib/libkrb5support.so.0.1.debug...done. done. Loaded symbols for /lib/libkrb5support.so.0.1 Reading symbols from /usr/lib/libstdc++.so.6.0.13...Reading symbols from /usr/lib/debug/usr/lib/libstdc++.so.6.0.13.debug...done. done. Loaded symbols for /usr/lib/libstdc++.so.6.0.13 Reading symbols from /lib/libm-2.12.so...Reading symbols from /usr/lib/debug/lib/libm-2.12.so.debug...done. done. Loaded symbols for /lib/libm-2.12.so Reading symbols from /lib/libgcc_s-4.4.5-20110214.so.1...Reading symbols from /usr/lib/debug/lib/libgcc_s-4.4.5-20110214.so.1.debug...done. done. Loaded symbols for /lib/libgcc_s-4.4.5-20110214.so.1 Reading symbols from /lib/libpthread-2.12.so...Reading symbols from /usr/lib/debug/lib/libpthread-2.12.so.debug...done. [Thread debugging using libthread_db enabled] done. Loaded symbols for /lib/libpthread-2.12.so Reading symbols from /lib/libc-2.12.so...Reading symbols from /usr/lib/debug/lib/libc-2.12.so.debug...done. done. Loaded symbols for /lib/libc-2.12.so Reading symbols from /lib/ld-2.12.so...Reading symbols from /usr/lib/debug/lib/ld-2.12.so.debug...done. done. Loaded symbols for /lib/ld-2.12.so Reading symbols from /lib/libgssapi_krb5.so.2.2...Reading symbols from /usr/lib/debug/lib/libgssapi_krb5.so.2.2.debug...done. done. Loaded symbols for /lib/libgssapi_krb5.so.2.2 Reading symbols from /lib/libresolv-2.12.so...Reading symbols from /usr/lib/debug/lib/libresolv-2.12.so.debug...done. done. Loaded symbols for /lib/libresolv-2.12.so Reading symbols from /lib/libz.so.1.2.3...Reading symbols from /usr/lib/debug/lib/libz.so.1.2.3.debug...done. done. Loaded symbols for /lib/libz.so.1.2.3 Reading symbols from /lib/libkeyutils.so.1.3...Reading symbols from /usr/lib/debug/lib/libkeyutils.so.1.3.debug...done. done. Loaded symbols for /lib/libkeyutils.so.1.3 Reading symbols from /lib/libselinux.so.1...Reading symbols from /usr/lib/debug/lib/libselinux.so.1.debug...done. done. Loaded symbols for /lib/libselinux.so.1 Reading symbols from /lib/libnss_files-2.12.so...Reading symbols from /usr/lib/debug/lib/libnss_files-2.12.so.debug...done. done. Loaded symbols for /lib/libnss_files-2.12.so Reading symbols from /lib/libnss_dns-2.12.so...Reading symbols from /usr/lib/debug/lib/libnss_dns-2.12.so.debug...done. done. Loaded symbols for /lib/libnss_dns-2.12.so Core was generated by `condor_shadow -f 55160.0 --schedd=<_ip_:43223> --xfer-queue=limit=upload'. Program terminated with signal 11, Segmentation fault. #0 0x00a73424 in __kernel_vsyscall () (gdb) thread apply all bt Thread 1 (Thread 0xb7820750 (LWP 30331)): #0 0x00a73424 in __kernel_vsyscall () #1 0x007167e0 in raise (sig=11) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:42 #2 0x0815aed4 in sig_backtrace_handler (signum=11) at /usr/src/debug/condor-7.6.0/src/condor_utils/dprintf_config.cpp:75 #3 <signal handler called> #4 0x080bac81 in BaseShadow::updateJobInQueue (this=0x85ab258, type=U_HOLD) at /usr/src/debug/condor-7.6.0/src/condor_shadow.V6.1/baseshadow.cpp:1177 #5 0x080bc651 in BaseShadow::holdJob (this=0x85ab258, reason=0x8596c98 "Failed to initialize user log to c:\\/c:\\users\\admini~1.mku\\appdata\\local\\temp\\mrg_1.1.log", hold_reason_code=22, hold_reason_subcode=0) at /usr/src/debug/condor-7.6.0/src/condor_shadow.V6.1/baseshadow.cpp:422 #6 0x080bc863 in BaseShadow::initUserLog (this=0x85ab258) at /usr/src/debug/condor-7.6.0/src/condor_shadow.V6.1/baseshadow.cpp:826 #7 0x080bd56b in BaseShadow::baseInit (this=0x85ab258, job_ad=0x85a8f08, schedd_addr=0xbfcb5ccf "<ip:49370>", xfer_queue_contact_info=0xbfcb5ca0 "limit=upload,download;addr=<ip:49370>") at /usr/src/debug/condor-7.6.0/src/condor_shadow.V6.1/baseshadow.cpp:159 #8 0x080ad831 in UniShadow::init (this=0x85ab258, job_ad=0x85a8f08, schedd_addr=0xbfcb5ccf "<ip:49370>", xfer_queue_contact_info=0xbfcb5ca0 "limit=upload,download;addr=<ip:49370>") at /usr/src/debug/condor-7.6.0/src/condor_shadow.V6.1/shadow.cpp:102 #9 0x080b2c1d in initShadow (ad=0x85a8f08) at /usr/src/debug/condor-7.6.0/src/condor_shadow.V6.1/shadow_v61_main.cpp:272 #10 0x080b2d92 in startShadow (ad=0x85a8f08) at /usr/src/debug/condor-7.6.0/src/condor_shadow.V6.1/shadow_v61_main.cpp:292 #11 0x080e3045 in main (argc=6, argv=0xbfcb5ad8) at /usr/src/debug/condor-7.6.0/src/condor_daemon_core.V6/daemon_core_main.cpp:2374 Version-Release number of selected component (if applicable): condor-7.6.1-0.10 condor-win-7.6.1-0.11 How reproducible: 100% Steps to Reproduce: 1. setup pool: Linux1 - CM, Sched, Exec; Linux2 - Sched, Exec; Windows - Exec 2. disable authentication with claimtobe 3. add all users who will submit from Windows to Linux machine 4. Submit Windows job from Windows Actual results: Condor_shadow doesn't recognize absolute path. Expected results: Condor will recognize absolute path and there will be NO corefiles in $(LOG) directory.
I think this is b/c you are using windows short names "~".. I use full absolute paths all the time. Either way we should probably support windows short named paths.
Short names should be avoided when possible because of conflicts with the CLASSAD language. The correct method would be to quote, but even then, it's likely not the best solution. I think the best method will be to through an error not allowing short names during submit and force the user to specify the full path.
<retract last comment> Can not repro with latest build (condor-7.6.3-0.1) using: Error=C:\condor\tests\UUUUUU~1.FOO\mrg_$(Cluster).$(Process).err Output=C:\condor\tests\UUUUUU~1.FOO\mrg_$(Cluster).$(Process).out Log=C:\condor\tests\UUUUUU~1.FOO\mrg_$(Cluster).$(Process).log condor_submit -remote my_schedd my.sub --------------------------------------------------------------- Could you please provide repro info with the latest build.