Hide Forgot
Created attachment 489782 [details] logs and dumped configuration Description of problem: I've set up condor-aviary according bug 692861 and restart condor. $ cat SchedLog ... 04/04/11 08:23:33 (pid:3346) ScheddPlugin registration succeeded 04/04/11 08:23:33 (pid:3346) Successfully loaded plugin: /usr/lib/condor/plugins/AviaryScheddPlugin-plugin.so 04/04/11 08:23:33 (pid:3346) Successfully loaded plugin: /usr/lib/condor/plugins/MgmtScheddPlugin-plugin.so 04/04/11 08:23:33 (pid:3346) Failed in creating DLL 04/04/11 08:23:33 (pid:3346) ERROR "Failed to initialize Axis2SoapProvider" at line 76 in file /builddir/build/BUILD/condor-7.5.6/src/condor_contrib/aviary/src/AviaryScheddPlugin.cpp Stack dump for process 3346 at timestamp 1301919813 (11 frames) condor_schedd(dprintf_dump_stack+0x4a)[0x8250b0a] condor_schedd[0x8255cb6] [0x6b2420] /lib/libc.so.6(abort+0x221)[0x1f7821] condor_schedd(_EXCEPT_+0xa6)[0x822a426] /usr/lib/condor/plugins/AviaryScheddPlugin-plugin.so(_ZN6aviary3job18AviaryScheddPlugin15earlyInitializeEv+0x199)[0x190df9] condor_schedd(_ZN19ScheddPluginManager15EarlyInitializeEv+0x33)[0x8129b53] condor_schedd(_Z9main_initiPPc+0xb5)[0x80c6f05] condor_schedd(main+0x1162)[0x8131db2] /lib/libc.so.6(__libc_start_main+0xdc)[0x1e2e9c] condor_schedd[0x80c6921] $ cat aviary_job.axis2.log ... [Mon Apr 4 08:23:33 2011] [debug] phase.c(121) axis2_handler_t *request_uri_based_dispatcher added to the index 0 of the phase Transport [Mon Apr 4 08:23:33 2011] [debug] phase.c(121) axis2_handler_t *addressing_based_dispatcher added to the index 1 of the phase Transport [Mon Apr 4 08:23:33 2011] [debug] phase.c(121) axis2_handler_t *rest_dispatcher added to the index 0 of the phase Dispatch [Mon Apr 4 08:23:33 2011] [debug] phase.c(121) axis2_handler_t *soap_message_body_based_dispatcher added to the index 1 of the phase Dispatch [Mon Apr 4 08:23:33 2011] [debug] phase.c(121) axis2_handler_t *soap_action_based_dispatcher added to the index 2 of the phase Dispatch [Mon Apr 4 08:23:33 2011] [debug] phase.c(121) axis2_handler_t *dispatch_post_conditions_evaluator added to the index 0 of the phase PostDispatch [Mon Apr 4 08:23:33 2011] [debug] phase.c(121) axis2_handler_t *context_handler added to the index 1 of the phase PostDispatch [Mon Apr 4 08:23:33 2011] [debug] conf_builder.c(227) No custom dispatching order found. Continue with the default dispatching order [Mon Apr 4 08:23:33 2011] [error] class_loader.c(152) Loading shared library /usr//lib/libaxis2_http_sender.so Failed. DLERROR IS /usr//lib/libaxis2_http_sender.so: cannot open shared object file: No such file or directory [Mon Apr 4 08:23:33 2011] [error] conf_builder.c(848) Transport sender is NULL for transport http, unable to continue [Mon Apr 4 08:23:33 2011] [error] conf_builder.c(250) Processing transport senders failed, unable to continue [Mon Apr 4 08:23:33 2011] [error] dep_engine.c(750) Populating Axis2 Configuration failed [Mon Apr 4 08:23:33 2011] [error] conf_init.c(59) Loading deployment engine failed [Mon Apr 4 08:23:33 2011] [error] conf_init.c(107) Loading configuration context failed for repository /usr/. [Mon Apr 4 08:23:33 2011] [error] http_receiver.c(126) unable to create private configuration context for repo path /usr/ [Mon Apr 4 08:23:33 2011] [error] /builddir/build/BUILD/condor-7.5.6/src/condor_contrib/aviary/src/Axis2SoapProvider.cpp(129) HTTP server create failed: 103: Failed in creating DLL Version-Release number of selected component (if applicable): condor-aviary-7.6.0-0.4.el5 condor-7.6.0-0.4.el5 e2fsprogs-libs-1.39-23.el5_5.1 expat-1.95.8-8.3.el5_5.3 glibc-2.5-58 krb5-libs-1.6.1-55.el5_6.1 libgcc-4.1.2-50.el5 libstdc++-4.1.2-50.el5 openssl-0.9.8e-12.el5_5.7 pcre-6.6-6.el5_6.1 wso2-axis2-2.1.0-0.5.el5 wso2-wsf-cpp-2.1.0-0.5.el5 How reproducible: 100% Steps to Reproduce: 1. install condor pool and condor-aviary 2. set up aviary according bug 692861 3. restart condor Actual results: It raises exception in Scheduler. Expected results: Tests from http://git.fedorahosted.org/git/?p=grid.git;a=tree;f=src/condor_contrib/aviary/test;h=d47b050bb84fea8a4e5e03ec1f4630ee95081609;hb=f88ee96c9b38739f5388648bc187397276cb07af should work.
*** Bug 692944 has been marked as a duplicate of this bug. ***
From Bug 692944 - Matthew Farrellee 2011-04-01 14:47:49 EDT $ rpm -q condor-aviary wso2-axis2 condor-aviary-7.6.0-0.4.el6.x86_64 wso2-axis2-2.1.0-0.4.el6.x86_64 $ CONDOR_CONFIG=only_env _CONDOR_LOG=$PWD _CONDOR_WSFCPP_HOME=/usr /usr/sbin/aviary_query_server -t -f ... ERROR "Failed to initialize Axis2SoapProvider" Find the real reason in aviary_query.axis2.log, [error] class_loader.c(152) Loading shared library /usr//lib/libaxis2_http_sender.so Failed. DLERROR IS /usr//lib/libaxis2_http_sender.so: cannot open shared object file: No such file or directory The issue appears to be axis2 looking only in /usr/lib, while the library is present in /usr/lib64, $ rpm -qf /usr/lib64/libaxis2_http_sender.so.0.6.0 wso2-axis2-2.1.0-0.4.el6.x86_64 A temporary workaround is to symlink into /usr/lib, which must also be done for libaxis2_http_receiver and libwsf_cpp_msg_recv.
I have a plan to use a different init path in the axis2 code along with some under-documented config parameters in the xml config to remedy all this. Needs testing though. Steal this BZ and you will lose your hand...
I've tried it on i386 system where isn't /usr/lib64 and it doesn't work: export CONDOR_CONFIG=only_env; export _CONDOR_LOG=$PWD; export _CONDOR_WSFCPP_HOME=/usr; export _CONDOR_ALL_DEBUG=D_ALL; export _CONDOR_AXIS2_DEBUG_LEVEL=10; /usr/sbin/aviary_query_server -t -f ... DaemonCore--> Timers DaemonCore--> ~~~~~~ DaemonCore--> id = 1, when = 1302252550, period = 0, handler_descrip=<dc_touch_log_file> DaemonCore--> id = 2, when = 1302252550, period = 0, handler_descrip=<dc_touch_lock_files> DaemonCore--> id = 3, when = 1302252550, period = 300, handler_descrip=<check_session_cache> DaemonCore--> id = 4, when = 1302252550, period = 1801, handler_descrip=<handle_cookie_refresh> DaemonCore--> id = 6, when = 1302252550, period = 10, handler_descrip=<JobLogMirror::TimerHandler_JobLogPolling> DaemonCore--> id = 0, when = 1302252565, period = 120, handler_descrip=<check_parent> DaemonCore--> id = 5, when = 1302281350, period = 28800, handler_descrip=<DaemonCore::refreshDNS()> leaving DaemonCore NewTimer, id=6 HTTP_PORT is undefined, using default value of 9091 Failed in creating DLL ERROR "Failed to initialize Axis2SoapProvider" at line 94 in file /builddir/build/BUILD/condor-7.5.6/src/condor_contrib/aviary/src/aviary_query_server.cpp packages: condor-aviary-7.6.0-0.5.el5 condor-7.6.0-0.5.el5 wso2-axis2-2.1.0-0.6.el5 python-condorutils-1.5-2.el5 condor-wallaby-client-4.0-5.el5 condor-qmf-7.6.0-0.5.el5 condor-wallaby-tools-4.0-5.el5
So we can make use of a similar configuration model that gives us "almost" entire control over where Axis2/C goes looking for shared libraries (32-bit v 64-bit). However, in the case of the messageReceiver shared lib that is loaded on message receipt, it unfortunately makes use of a hard coded path relative to an assumed repo directory structure. This leaves the following options: 1) still use a repo structure of /usr/axis2.xml, /usr/lib{64}, /usr/services and do softlinking fix up of lib locations in package post-install to keep the Axis2/C engine happy 2) same as above but patch the hardcoded dir path for 64-bit builds so no soft linking is required post install 3) implement a more substantial patch than #2, where we either add a parameter to the service.xml file for the location of the msgRcvr lib, or (probably better) just have it use whatever lib parameter was specified in the axis2.xml file #3 would give us a fully customizable repo layout like: config: /var/lib/condor/aviary/axis2.xml service impl libs: /var/lib/condor/aviary/services/* ws02 & axis2/C libs: /usr/lib or /usr/lib64 Both options #2 and #3 would require us to develop a patch and carry it until WS02 and/or Apache Axis2/C adopted it upstream.
The other argument for #3 is it avoids potential /usr/axis2.xml collisions (overwrites, incompatible parameters, etc.).
Created attachment 491085 [details] Patch to provide full flexibility for deployment directories Rob, We'll need to apply and carry this patch to cure what ails us. Basically, Axis2/C doesn't cover all the bases of a deployment that is not based on a repo dir layout it expects. We can specify the lib, services, and modules directories in an axis2.xml but the engine still falls down when it needs to load message receiver libs. So, with this patch we should have complete flexibility over deployment and be able to store our axis2.xml config and services in something like /var/lib/condor/aviary but still load the various packaged libs for wso2 and axis2c from /usr/lib or /usr/lib64. I might need to revisit this is if I venture into the module territories for rampart but am hopeful it will work there also.
The patch is incorporated into the 0.7 build of wso2
Created attachment 491350 [details] RHEL5 spec file patch for new aviary config dir
Created attachment 491351 [details] RHEL6 spec file patch for new aviary config dir
Also get this from FH V7_6-aviary-branch: commit ee3c7981536808f5f3ed97a04c40299127822d32 Author: Peter MacKinnon <pmackinn> Date: Mon Apr 11 15:31:02 2011 -0400 Various deployment improvements: - now use axis2.xml as the repo config without a lib or services structure - parameters in axis2.xml point to location of libs and services - test scripts updated with new default WSDL file URI in /var/lib/condor... - use cmake configuration to generate proper lib loc at build time /usr/lib or /usr/lib64
Included in condor-7.6.0-0.7
The error from comment #1 is still there: Packages: condor-7.6.1-0.1.el6.x86_64 condor-aviary-7.6.1-0.1.el6.x86_64 condor-classads-7.6.1-0.1.el6.x86_64 condor-wallaby-base-db-1.12-1.el6.noarch condor-wallaby-client-4.0-5.el6.noarch condor-wallaby-tools-4.0-5.el6.noarch python-condorutils-1.5-2.el6.noarch python-qpid-0.10-1.el6.noarch python-qpid-qmf-0.10-6.el6.x86_64 python-wallabyclient-4.0-5.el6.noarch qpid-cpp-client-0.10-3.el6.x86_64 qpid-cpp-server-0.10-3.el6.x86_64 qpid-qmf-0.10-6.el6.x86_64 ruby-qpid-qmf-0.10-6.el6.x86_64 ruby-wallaby-0.10.5-3.el6.noarch wallaby-0.10.5-3.el6.noarch wallaby-utils-0.10.5-3.el6.noarch wso2-axis2-2.1.0-0.7.el6.x86_64 wso2-rampart-2.1.0-0.7.el6.x86_64 wso2-wsf-cpp-2.1.0-0.7.el6.x86_64 OSes: RHEL 5.6/6.1 x i386/x86_64 Steps to Reproduce: 1. install qpidd, condor, condor-aviary, remote configuration 2. set up condor by remote configuration: Group Memberships: Internal Default Group Features Applied: Master NodeAccess ExecuteNode QueryServer Axis2Home AviaryScheduler CentralManager Scheduler Explicitly Set Parameters: ALLOW_WRITE = * CONDOR_HOST = 127.0.0.1 ALLOW_READ = * Plus debug settings: # Enable core dump generation CREATE_CORE_FILES=True ABORT_ON_EXCEPTION=True # Increase the size of logs MAX_HISTORY_LOG=300*1024*1024 MAX_HISTORY_ROTATIONS=10 MAX_C_GAHP_LOG=20000000 MAX_COLLECTOR_LOG=20000000 MAX_GRIDMANAGER_LOG=20000000 MAX_HAD_LOG=20000000 MAX_HDFS_LOG=20000000 MAX_JOB_ROUTER_LOG=20000000 MAX_KBDD_LOG=20000000 MAX_LEASEMANAGER_LOG=20000000 MAX_MASTER_LOG=20000000 MAX_NEGOTIATOR_LOG=20000000 MAX_NEGOTIATOR_MATCH_LOG=20000000 MAX_REPLICATION_LOG=20000000 MAX_ROOSTER_LOG=20000000 MAX_SCHEDD_LOG=20000000 MAX_SHADOW_LOG=20000000 MAX_STARTD_LOG=20000000 MAX_STARTER_LOG=20000000 MAX_TRANSFERER_LOG=20000000 MAX_TRIGGERD_LOG=20000000 MAX_VM_GAHP_LOG=20000000 QMF_BROKER_HOST = 127.0.0.1 $ cat ScheddLog ... 04/20/11 12:28:25 (pid:9339) DaemonCore: command socket at <_ip_:42170> 04/20/11 12:28:25 (pid:9339) DaemonCore: private command socket at <_ip_:42170> 04/20/11 12:28:25 (pid:9339) Setting maximum accepts per cycle 4. 04/20/11 12:28:25 (pid:9339) ClassAdLogPlugin registration succeeded 04/20/11 12:28:25 (pid:9339) ScheddPlugin registration succeeded 04/20/11 12:28:25 (pid:9339) Successfully loaded plugin: /usr/lib64/condor/plugins/AviaryScheddPlugin-plugin.so 04/20/11 12:28:25 (pid:9339) Failed in creating DLL 04/20/11 12:28:25 (pid:9339) ERROR "Failed to initialize Axis2SoapProvider" at line 76 in file /builddir/build/BUILD/condor-7.5.6/src/condor_contrib/aviary/src/AviaryScheddPlugin.cpp Stack dump for process 9339 at timestamp 1303295305 (11 frames) condor_schedd(dprintf_dump_stack+0x63)[0x565293] condor_schedd[0x5cd322] /lib64/libpthread.so.0(+0xf520)[0x7f0f26d76520] /lib64/libc.so.6(abort+0xd4)[0x7f0f26a0a184] condor_schedd(_EXCEPT_+0x12b)[0x5671ab] /usr/lib64/condor/plugins/AviaryScheddPlugin-plugin.so(_ZN6aviary3job18AviaryScheddPlugin15earlyInitializeEv+0x2ea)[0x7f0f25b337ba] condor_schedd(_ZN19ScheddPluginManager15EarlyInitializeEv+0x50)[0x4d2c90] condor_schedd(_Z9main_initiPPc+0x7d)[0x47fedd] condor_schedd(main+0x10df)[0x4dbc6f] /lib64/libc.so.6(__libc_start_main+0xfd)[0x7f0f269f4c9d] condor_schedd[0x47d459] $ cat MasterLog ... 04/20/11 12:23:32 Started process "/usr/sbin/aviary_query_server", pid and pgroup = 8980 04/20/11 12:23:32 Started DaemonCore process "/usr/sbin/condor_schedd", pid and pgroup = 8981 04/20/11 12:23:32 The QUERY_SERVER (pid 8980) died due to signal 6 (Aborted) 04/20/11 12:23:32 Sending obituary for "/usr/sbin/aviary_query_server" 04/20/11 12:23:32 restarting /usr/sbin/aviary_query_server in 17 seconds 04/20/11 12:23:32 The SCHEDD (pid 8981) died due to signal 11 (Segmentation fault) 04/20/11 12:23:32 Sending obituary for "/usr/sbin/condor_schedd" 04/20/11 12:23:32 restarting /usr/sbin/condor_schedd in 17 seconds ... $cat QueryServerLog ... 04/20/11 12:28:25 Setting maximum accepts per cycle 4. 04/20/11 12:28:25 ****************************************************** 04/20/11 12:28:25 ** aviary_query_server (CONDOR_QUERY_SERVER) STARTING UP 04/20/11 12:28:25 ** /usr/sbin/aviary_query_server 04/20/11 12:28:25 ** SubsystemInfo: name=QUERY_SERVER type=DAEMON(12) class=DAEMON(1) 04/20/11 12:28:25 ** Configuration: subsystem:QUERY_SERVER local:<NONE> class:DAEMON 04/20/11 12:28:25 ** $CondorVersion: 7.6.1 Apr 13 2011 BuildID: RH-7.6.1-0.1.el6 $ 04/20/11 12:28:25 ** $CondorPlatform: X86_64-RedHat_6.0 $ 04/20/11 12:28:25 ** PID = 9338 04/20/11 12:28:25 ** Log last touched 4/20 12:26:08 04/20/11 12:28:25 ****************************************************** 04/20/11 12:28:25 Using config source: /etc/condor/condor_config 04/20/11 12:28:25 Using local config sources: 04/20/11 12:28:25 /etc/condor/config.d/00personal_condor.config 04/20/11 12:28:25 /etc/condor/config.d/61aviary.config 04/20/11 12:28:25 /etc/condor/config.d/99configd.config 04/20/11 12:28:25 /etc/condor/config.d/zzz_condor_config.test 04/20/11 12:28:25 /var/lib/condor/wallaby_node.config 04/20/11 12:28:25 DaemonCore: command socket at <_ip_:56817> 04/20/11 12:28:25 DaemonCore: private command socket at <_ip_:56817> 04/20/11 12:28:25 Setting maximum accepts per cycle 4. 04/20/11 12:28:25 main_init() called 04/20/11 12:28:25 Failed in creating DLL 04/20/11 12:28:25 ERROR "Failed to initialize Axis2SoapProvider" at line 94 in file /builddir/build/BUILD/condor-7.5.6/src/condor_contrib/aviary/src/aviary_query_server.cpp Stack dump for process 9338 at timestamp 1303295305 (10 frames) aviary_query_server(dprintf_dump_stack+0x63)[0x4d1483] aviary_query_server[0x515712] /lib64/libpthread.so.0(+0xf520)[0x7f8eec2ce520] /lib64/libc.so.6(gsignal+0x35)[0x7f8eebf60a45] /lib64/libc.so.6(abort+0x175)[0x7f8eebf62225] aviary_query_server(_EXCEPT_+0x12b)[0x4d339b] aviary_query_server(_Z9main_initiPPc+0x1a2)[0x457532] aviary_query_server(main+0x10df)[0x4662df] /lib64/libc.so.6(__libc_start_main+0xfd)[0x7f8eebf4cc9d] aviary_query_server[0x456f09] ... -->Assigned
Created attachment 493453 [details] configuration and log files for comment #14
comment #14 is a different bug, please file a new BZ
I've filed bug 698207.
Retested over all supported archs x86,x86_64/RHEL5,RHEL6 with: condor-aviary-7.6.1-0.4 condor-7.6.1-0.4 ScheddLog: 05/02/11 14:53:10 (pid:4758) ****************************************************** 05/02/11 14:53:10 (pid:4758) ** condor_schedd (CONDOR_SCHEDD) STARTING UP 05/02/11 14:53:10 (pid:4758) ** /usr/sbin/condor_schedd 05/02/11 14:53:10 (pid:4758) ** SubsystemInfo: name=SCHEDD type=SCHEDD(5) class=DAEMON(1) 05/02/11 14:53:10 (pid:4758) ** Configuration: subsystem:SCHEDD local:<NONE> class:DAEMON 05/02/11 14:53:10 (pid:4758) ** $CondorVersion: 7.6.1 Apr 27 2011 BuildID: RH-7.6.1-0.4.el6 $ 05/02/11 14:53:10 (pid:4758) ** $CondorPlatform: X86_64-RedHat_6.0 $ 05/02/11 14:53:10 (pid:4758) ** PID = 4758 05/02/11 14:53:10 (pid:4758) ** Log last touched 5/2 14:53:08 05/02/11 14:53:10 (pid:4758) ****************************************************** 05/02/11 14:53:10 (pid:4758) Using config source: /etc/condor/condor_config 05/02/11 14:53:10 (pid:4758) Using local config sources: 05/02/11 14:53:10 (pid:4758) /etc/condor/config.d/00personal_condor.config 05/02/11 14:53:10 (pid:4758) /etc/condor/config.d/60condor-qmf.config 05/02/11 14:53:10 (pid:4758) /etc/condor/config.d/61aviary.config 05/02/11 14:53:10 (pid:4758) /etc/condor/config.d/zzz_condor_config.test 05/02/11 14:53:10 (pid:4758) DaemonCore: command socket at <IP:46525> 05/02/11 14:53:10 (pid:4758) DaemonCore: private command socket at <10.34.37.121:46525> 05/02/11 14:53:10 (pid:4758) Setting maximum accepts per cycle 4. 05/02/11 14:53:10 (pid:4758) ClassAdLogPlugin registration succeeded 05/02/11 14:53:10 (pid:4758) ScheddPlugin registration succeeded 05/02/11 14:53:10 (pid:4758) Successfully loaded plugin: /usr/lib64/condor/plugins/MgmtScheddPlugin-plugin.so 05/02/11 14:53:11 (pid:4758) ClassAdLogPlugin registration succeeded 05/02/11 14:53:11 (pid:4758) ScheddPlugin registration succeeded 05/02/11 14:53:11 (pid:4758) Successfully loaded plugin: /usr/lib64/condor/plugins/AviaryScheddPlugin-plugin.so 05/02/11 14:53:11 (pid:4758) Successfully loaded plugin: /usr/lib64/condor/plugins/AviaryScheddPlugin-plugin.so 05/02/11 14:53:11 (pid:4758) Axis2 listener on http port: 9090 05/02/11 14:53:11 (pid:4758) History file rotation is enabled. 05/02/11 14:53:11 (pid:4758) Maximum history file size is: 314572800 bytes 05/02/11 14:53:11 (pid:4758) Number of rotated history files is: 10 05/02/11 14:53:12 (pid:4758) "/usr/sbin/condor_shadow.std -classad" did not produce any output, ignoring 05/02/11 14:53:17 (pid:4758) TransferQueueManager stats: active up=0/10 down=0/10; waiting up=0 down=0; wait time up=0s down=0s No dynamic library loading issues found. >>> VERIFIED