running /usr/libexec/condor/condor_chirp get_job_attr ResidentSetSize will coredump if ResidentSetSize isn't yet set. also see ERROR getting rss: invalid literal for int(): chirp: couldn't get response from server: Illegal seek Thread 1 (process 24257): #0 0x00000030b0030265 in raise () from /lib64/libc.so.6 No symbol table info available. #1 0x00000030b0031d10 in abort () from /lib64/libc.so.6 No symbol table info available. #2 0x00000000004517cb in chirp_fatal_response () at chirp_client.c:561 No locals. #3 0x0000000000451915 in convert_result (result=24257) at chirp_client.c:522 No locals. #4 0x0000000000451acd in simple_command (c=0x942b10, fmt=0x4dc043 "get_job_attr %s\n") at chirp_client.c:678 result = <value optimized out> command = "get_job_attr ResidentSetSize\n\000\000\000�����\177\000\000X0\200�0\000\000\000\210����*\000\000\006.\200�0\000\000\0008˭��*\000\000\006\000\000\000\000\000\000\000 ����\177\000\000&�\000�0", '\0' <repeats 11 times>, "\001\000\000\000\000\000\000\000p���\000\000\000\000\001\000\000\000\000\000\000\000`\034\200�0\000\000\000 ���0", '\0' <repeats 11 times>, "\0300\200�0\000\000\000o����\177\000\0000����\177", '\0' <repeats 11 times>, "����*\000\000����\025\000\000\000�", '\0' <repeats 15 times>... args = {{gp_offset = 24, fp_offset = 2054513515, overflow_arg_area = 0x7fffffffe000, reg_save_area = 0x7fffffffdf20}} #5 0x0000000000451e9d in chirp_client_get_job_attr (c=0x5ec1, name=0x6 <Address 0x6 out of bounds>, expr=0x7fffffffe038) at chirp_client.c:374 result = <value optimized out> #6 0x00000000004510c1 in chirp_get_job_attr (argc=<value optimized out>, argv=0x7fffffffe128) at condor_chirp.cpp:306 client = (struct chirp_client *) 0x0 p = 0x0 #7 0x00000030b001d994 in __libc_start_main () from /lib64/libc.so.6 No symbol table info available. #8 0x0000000000450d29 in _start ()
upstream https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=522,0
Ticket #522: condor_chirp fails when querying the value of a non-existing attribute When querying the value of a non-existing attribute the condor_chirp get_job_attr command aborts, returning "abnormal program termination" on Windows and a core from SIGABRT on non-Windows. [Append remarks] Remarks: 2010-Jul-15 00:05:24 by matt: Here's the skinny... On the shadow side, A get_job_attr for an attribute that does not exist hits pseudo_ops.cpp:pseudo_get_job_attr(name, expr) and returns -1: e = ad->Lookup(name); if(e) { ... } else { ...; return -1; }" The -1 gets returned to the "case CONDOR_get_job_attr" in NTreceivers.cpp, which happily handles it by encoding a response of "code(-1); code(0);" -- the -1 is the return value and 0 is the default errno. On the starter side, The receiver in io_proxy_handler.cpp eventually calls IOProxyHandler::convert to translate the errno (remember it was 0) into a CHIRP_ERROR code to send to condor_chirp. However, errno is not a known code, resulting in CHIRP_ERROR_UNKNOWN and a dprintf of "Starter ioproxy server got unknown unix errno:0" On the condor_chirp side, Result of CHIRP_ERROR_UNKNOWN is received, which triggers an unceremonious fprintf to stderr of "chirp: couldn't get response from server: Success" followed swiftly by abort(). The "Success" is from strerror(errno) and is meaningless. This behavior is definitely broken. 2010-Jul-15 00:28:05 by matt: Options for resolving this broken behavior - condor_chirp/PROTOCOL equates get_job_attr with getenv, which returns NULL if the env name isn't present * Stop aborting, return non-zero - however, abort() is a actually triggered by a problem in the protocol * Make unix_errno=0 known to IOProxyHandler::convert - however, requires picking an error code for 0, maybe CHIRP_ERROR_DOESNT_EXIST, changing all chirp client implementations to handle the new code, results in breaking wire protocol between new starter and old chirp clients * Change pseudo_get_job_attr to set errno, maybe to ENOENT - better than converting errno=0 to CHIRP_ERROR_DOESNT_EXIST in IOProxyHandler::convert, but has all the same drawbacks * Change pseudo_get_job_attr to return UNDEFINED - requires no protocol changes and no client changes, aligns well with ClassAd semantics and getenv("DOESNT_EXIST") -> NULL (Lookup("DOESNT_EXIST") -> UNDEFINED) 2010-Jul-15 00:29:29 by matt: diff --git a/src/condor_shadow.V6.1/pseudo_ops.cpp b/src/condor_shadow.V6.1/pseudo_ops.cpp index c71e1c2..c80230f 100644 --- a/src/condor_shadow.V6.1/pseudo_ops.cpp +++ b/src/condor_shadow.V6.1/pseudo_ops.cpp @@ -705,8 +705,9 @@ pseudo_get_job_attr( const char *name, MyString &expr ) dprintf(D_SYSCALLS,"pseudo_get_job_attr(%s) = %s\n",name,expr.Value()); return 0; } else { - dprintf(D_SYSCALLS,"pseudo_get_job_attr(%s) failed\n",name); - return -1; + dprintf(D_SYSCALLS,"pseudo_get_job_attr(%s) is UNDEFINED\n",name); + expr = "UNDEFINED"; + return 0; } }
Resolved upstream, will be built post 7.4.4-0.4
Tested with (version): condor-7.4.4-0.8 Tested on: RHEL5 x86_64 - passed RHEL5 i386 - passed RHEL4 x86_64 - passed RHEL4 i386 - passed >>> VERIFIED
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Previously, when querying the value of a non-existing attribute the "condor_chirp get_job_attr" command aborted, returning "abnormal program termination" on a Windows system and a core from SIGABRT on a non-Windows system. With this update, these errors no longer occur and 'condor_chirp' works as expected.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0773.html