| Summary: | scheduler aviary plugin and query_server fail loading dependent libraries, repo structure issue | ||
|---|---|---|---|
| Product: | Red Hat Enterprise MRG | Reporter: | Martin Kudlej <mkudlej> |
| Component: | condor-aviary | Assignee: | Pete MacKinnon <pmackinn> |
| Status: | CLOSED ERRATA | QA Contact: | Tomas Rusnak <trusnak> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | Development | CC: | iboverma, jneedle, matt, rrati, trusnak |
| Target Milestone: | 2.0 | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | wso2-wsf-cpp-2.1.0-0.7 | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2011-06-27 15:31:18 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Bug Depends On: | |||
| Bug Blocks: | 695771 | ||
| Attachments: | |||
|
Description
Martin Kudlej
2011-04-04 14:11:38 UTC
*** Bug 692944 has been marked as a duplicate of this bug. *** From Bug 692944 - Matthew Farrellee 2011-04-01 14:47:49 EDT $ rpm -q condor-aviary wso2-axis2 condor-aviary-7.6.0-0.4.el6.x86_64 wso2-axis2-2.1.0-0.4.el6.x86_64 $ CONDOR_CONFIG=only_env _CONDOR_LOG=$PWD _CONDOR_WSFCPP_HOME=/usr /usr/sbin/aviary_query_server -t -f ... ERROR "Failed to initialize Axis2SoapProvider" Find the real reason in aviary_query.axis2.log, [error] class_loader.c(152) Loading shared library /usr//lib/libaxis2_http_sender.so Failed. DLERROR IS /usr//lib/libaxis2_http_sender.so: cannot open shared object file: No such file or directory The issue appears to be axis2 looking only in /usr/lib, while the library is present in /usr/lib64, $ rpm -qf /usr/lib64/libaxis2_http_sender.so.0.6.0 wso2-axis2-2.1.0-0.4.el6.x86_64 A temporary workaround is to symlink into /usr/lib, which must also be done for libaxis2_http_receiver and libwsf_cpp_msg_recv. I have a plan to use a different init path in the axis2 code along with some under-documented config parameters in the xml config to remedy all this. Needs testing though. Steal this BZ and you will lose your hand... I've tried it on i386 system where isn't /usr/lib64 and it doesn't work: export CONDOR_CONFIG=only_env; export _CONDOR_LOG=$PWD; export _CONDOR_WSFCPP_HOME=/usr; export _CONDOR_ALL_DEBUG=D_ALL; export _CONDOR_AXIS2_DEBUG_LEVEL=10; /usr/sbin/aviary_query_server -t -f ... DaemonCore--> Timers DaemonCore--> ~~~~~~ DaemonCore--> id = 1, when = 1302252550, period = 0, handler_descrip=<dc_touch_log_file> DaemonCore--> id = 2, when = 1302252550, period = 0, handler_descrip=<dc_touch_lock_files> DaemonCore--> id = 3, when = 1302252550, period = 300, handler_descrip=<check_session_cache> DaemonCore--> id = 4, when = 1302252550, period = 1801, handler_descrip=<handle_cookie_refresh> DaemonCore--> id = 6, when = 1302252550, period = 10, handler_descrip=<JobLogMirror::TimerHandler_JobLogPolling> DaemonCore--> id = 0, when = 1302252565, period = 120, handler_descrip=<check_parent> DaemonCore--> id = 5, when = 1302281350, period = 28800, handler_descrip=<DaemonCore::refreshDNS()> leaving DaemonCore NewTimer, id=6 HTTP_PORT is undefined, using default value of 9091 Failed in creating DLL ERROR "Failed to initialize Axis2SoapProvider" at line 94 in file /builddir/build/BUILD/condor-7.5.6/src/condor_contrib/aviary/src/aviary_query_server.cpp packages: condor-aviary-7.6.0-0.5.el5 condor-7.6.0-0.5.el5 wso2-axis2-2.1.0-0.6.el5 python-condorutils-1.5-2.el5 condor-wallaby-client-4.0-5.el5 condor-qmf-7.6.0-0.5.el5 condor-wallaby-tools-4.0-5.el5 So we can make use of a similar configuration model that gives us "almost" entire control over where Axis2/C goes looking for shared libraries (32-bit v 64-bit). However, in the case of the messageReceiver shared lib that is loaded on message receipt, it unfortunately makes use of a hard coded path relative to an assumed repo directory structure. This leaves the following options:
1) still use a repo structure of /usr/axis2.xml, /usr/lib{64}, /usr/services and do softlinking fix up of lib locations in package post-install to keep the Axis2/C engine happy
2) same as above but patch the hardcoded dir path for 64-bit builds so no soft linking is required post install
3) implement a more substantial patch than #2, where we either add a parameter to the service.xml file for the location of the msgRcvr lib, or (probably better) just have it use whatever lib parameter was specified in the axis2.xml file
#3 would give us a fully customizable repo layout like:
config: /var/lib/condor/aviary/axis2.xml
service impl libs: /var/lib/condor/aviary/services/*
ws02 & axis2/C libs: /usr/lib or /usr/lib64
Both options #2 and #3 would require us to develop a patch and carry it until WS02 and/or Apache Axis2/C adopted it upstream.
The other argument for #3 is it avoids potential /usr/axis2.xml collisions (overwrites, incompatible parameters, etc.). Created attachment 491085 [details]
Patch to provide full flexibility for deployment directories
Rob,
We'll need to apply and carry this patch to cure what ails us. Basically, Axis2/C doesn't cover all the bases of a deployment that is not based on a repo dir layout it expects. We can specify the lib, services, and modules directories in an axis2.xml but the engine still falls down when it needs to load message receiver libs.
So, with this patch we should have complete flexibility over deployment and be able to store our axis2.xml config and services in something like /var/lib/condor/aviary but still load the various packaged libs for wso2 and axis2c from /usr/lib or /usr/lib64.
I might need to revisit this is if I venture into the module territories for rampart but am hopeful it will work there also.
The patch is incorporated into the 0.7 build of wso2 Created attachment 491350 [details]
RHEL5 spec file patch for new aviary config dir
Created attachment 491351 [details]
RHEL6 spec file patch for new aviary config dir
Also get this from FH V7_6-aviary-branch:
commit ee3c7981536808f5f3ed97a04c40299127822d32
Author: Peter MacKinnon <pmackinn>
Date: Mon Apr 11 15:31:02 2011 -0400
Various deployment improvements:
- now use axis2.xml as the repo config without a lib or services structure
- parameters in axis2.xml point to location of libs and services
- test scripts updated with new default WSDL file URI in /var/lib/condor...
- use cmake configuration to generate proper lib loc at build time
/usr/lib or /usr/lib64
Included in condor-7.6.0-0.7 The error from comment #1 is still there: Packages: condor-7.6.1-0.1.el6.x86_64 condor-aviary-7.6.1-0.1.el6.x86_64 condor-classads-7.6.1-0.1.el6.x86_64 condor-wallaby-base-db-1.12-1.el6.noarch condor-wallaby-client-4.0-5.el6.noarch condor-wallaby-tools-4.0-5.el6.noarch python-condorutils-1.5-2.el6.noarch python-qpid-0.10-1.el6.noarch python-qpid-qmf-0.10-6.el6.x86_64 python-wallabyclient-4.0-5.el6.noarch qpid-cpp-client-0.10-3.el6.x86_64 qpid-cpp-server-0.10-3.el6.x86_64 qpid-qmf-0.10-6.el6.x86_64 ruby-qpid-qmf-0.10-6.el6.x86_64 ruby-wallaby-0.10.5-3.el6.noarch wallaby-0.10.5-3.el6.noarch wallaby-utils-0.10.5-3.el6.noarch wso2-axis2-2.1.0-0.7.el6.x86_64 wso2-rampart-2.1.0-0.7.el6.x86_64 wso2-wsf-cpp-2.1.0-0.7.el6.x86_64 OSes: RHEL 5.6/6.1 x i386/x86_64 Steps to Reproduce: 1. install qpidd, condor, condor-aviary, remote configuration 2. set up condor by remote configuration: Group Memberships: Internal Default Group Features Applied: Master NodeAccess ExecuteNode QueryServer Axis2Home AviaryScheduler CentralManager Scheduler Explicitly Set Parameters: ALLOW_WRITE = * CONDOR_HOST = 127.0.0.1 ALLOW_READ = * Plus debug settings: # Enable core dump generation CREATE_CORE_FILES=True ABORT_ON_EXCEPTION=True # Increase the size of logs MAX_HISTORY_LOG=300*1024*1024 MAX_HISTORY_ROTATIONS=10 MAX_C_GAHP_LOG=20000000 MAX_COLLECTOR_LOG=20000000 MAX_GRIDMANAGER_LOG=20000000 MAX_HAD_LOG=20000000 MAX_HDFS_LOG=20000000 MAX_JOB_ROUTER_LOG=20000000 MAX_KBDD_LOG=20000000 MAX_LEASEMANAGER_LOG=20000000 MAX_MASTER_LOG=20000000 MAX_NEGOTIATOR_LOG=20000000 MAX_NEGOTIATOR_MATCH_LOG=20000000 MAX_REPLICATION_LOG=20000000 MAX_ROOSTER_LOG=20000000 MAX_SCHEDD_LOG=20000000 MAX_SHADOW_LOG=20000000 MAX_STARTD_LOG=20000000 MAX_STARTER_LOG=20000000 MAX_TRANSFERER_LOG=20000000 MAX_TRIGGERD_LOG=20000000 MAX_VM_GAHP_LOG=20000000 QMF_BROKER_HOST = 127.0.0.1 $ cat ScheddLog ... 04/20/11 12:28:25 (pid:9339) DaemonCore: command socket at <_ip_:42170> 04/20/11 12:28:25 (pid:9339) DaemonCore: private command socket at <_ip_:42170> 04/20/11 12:28:25 (pid:9339) Setting maximum accepts per cycle 4. 04/20/11 12:28:25 (pid:9339) ClassAdLogPlugin registration succeeded 04/20/11 12:28:25 (pid:9339) ScheddPlugin registration succeeded 04/20/11 12:28:25 (pid:9339) Successfully loaded plugin: /usr/lib64/condor/plugins/AviaryScheddPlugin-plugin.so 04/20/11 12:28:25 (pid:9339) Failed in creating DLL 04/20/11 12:28:25 (pid:9339) ERROR "Failed to initialize Axis2SoapProvider" at line 76 in file /builddir/build/BUILD/condor-7.5.6/src/condor_contrib/aviary/src/AviaryScheddPlugin.cpp Stack dump for process 9339 at timestamp 1303295305 (11 frames) condor_schedd(dprintf_dump_stack+0x63)[0x565293] condor_schedd[0x5cd322] /lib64/libpthread.so.0(+0xf520)[0x7f0f26d76520] /lib64/libc.so.6(abort+0xd4)[0x7f0f26a0a184] condor_schedd(_EXCEPT_+0x12b)[0x5671ab] /usr/lib64/condor/plugins/AviaryScheddPlugin-plugin.so(_ZN6aviary3job18AviaryScheddPlugin15earlyInitializeEv+0x2ea)[0x7f0f25b337ba] condor_schedd(_ZN19ScheddPluginManager15EarlyInitializeEv+0x50)[0x4d2c90] condor_schedd(_Z9main_initiPPc+0x7d)[0x47fedd] condor_schedd(main+0x10df)[0x4dbc6f] /lib64/libc.so.6(__libc_start_main+0xfd)[0x7f0f269f4c9d] condor_schedd[0x47d459] $ cat MasterLog ... 04/20/11 12:23:32 Started process "/usr/sbin/aviary_query_server", pid and pgroup = 8980 04/20/11 12:23:32 Started DaemonCore process "/usr/sbin/condor_schedd", pid and pgroup = 8981 04/20/11 12:23:32 The QUERY_SERVER (pid 8980) died due to signal 6 (Aborted) 04/20/11 12:23:32 Sending obituary for "/usr/sbin/aviary_query_server" 04/20/11 12:23:32 restarting /usr/sbin/aviary_query_server in 17 seconds 04/20/11 12:23:32 The SCHEDD (pid 8981) died due to signal 11 (Segmentation fault) 04/20/11 12:23:32 Sending obituary for "/usr/sbin/condor_schedd" 04/20/11 12:23:32 restarting /usr/sbin/condor_schedd in 17 seconds ... $cat QueryServerLog ... 04/20/11 12:28:25 Setting maximum accepts per cycle 4. 04/20/11 12:28:25 ****************************************************** 04/20/11 12:28:25 ** aviary_query_server (CONDOR_QUERY_SERVER) STARTING UP 04/20/11 12:28:25 ** /usr/sbin/aviary_query_server 04/20/11 12:28:25 ** SubsystemInfo: name=QUERY_SERVER type=DAEMON(12) class=DAEMON(1) 04/20/11 12:28:25 ** Configuration: subsystem:QUERY_SERVER local:<NONE> class:DAEMON 04/20/11 12:28:25 ** $CondorVersion: 7.6.1 Apr 13 2011 BuildID: RH-7.6.1-0.1.el6 $ 04/20/11 12:28:25 ** $CondorPlatform: X86_64-RedHat_6.0 $ 04/20/11 12:28:25 ** PID = 9338 04/20/11 12:28:25 ** Log last touched 4/20 12:26:08 04/20/11 12:28:25 ****************************************************** 04/20/11 12:28:25 Using config source: /etc/condor/condor_config 04/20/11 12:28:25 Using local config sources: 04/20/11 12:28:25 /etc/condor/config.d/00personal_condor.config 04/20/11 12:28:25 /etc/condor/config.d/61aviary.config 04/20/11 12:28:25 /etc/condor/config.d/99configd.config 04/20/11 12:28:25 /etc/condor/config.d/zzz_condor_config.test 04/20/11 12:28:25 /var/lib/condor/wallaby_node.config 04/20/11 12:28:25 DaemonCore: command socket at <_ip_:56817> 04/20/11 12:28:25 DaemonCore: private command socket at <_ip_:56817> 04/20/11 12:28:25 Setting maximum accepts per cycle 4. 04/20/11 12:28:25 main_init() called 04/20/11 12:28:25 Failed in creating DLL 04/20/11 12:28:25 ERROR "Failed to initialize Axis2SoapProvider" at line 94 in file /builddir/build/BUILD/condor-7.5.6/src/condor_contrib/aviary/src/aviary_query_server.cpp Stack dump for process 9338 at timestamp 1303295305 (10 frames) aviary_query_server(dprintf_dump_stack+0x63)[0x4d1483] aviary_query_server[0x515712] /lib64/libpthread.so.0(+0xf520)[0x7f8eec2ce520] /lib64/libc.so.6(gsignal+0x35)[0x7f8eebf60a45] /lib64/libc.so.6(abort+0x175)[0x7f8eebf62225] aviary_query_server(_EXCEPT_+0x12b)[0x4d339b] aviary_query_server(_Z9main_initiPPc+0x1a2)[0x457532] aviary_query_server(main+0x10df)[0x4662df] /lib64/libc.so.6(__libc_start_main+0xfd)[0x7f8eebf4cc9d] aviary_query_server[0x456f09] ... -->Assigned Created attachment 493453 [details] configuration and log files for comment #14 comment #14 is a different bug, please file a new BZ I've filed bug 698207. Retested over all supported archs x86,x86_64/RHEL5,RHEL6 with:
condor-aviary-7.6.1-0.4
condor-7.6.1-0.4
ScheddLog:
05/02/11 14:53:10 (pid:4758) ******************************************************
05/02/11 14:53:10 (pid:4758) ** condor_schedd (CONDOR_SCHEDD) STARTING UP
05/02/11 14:53:10 (pid:4758) ** /usr/sbin/condor_schedd
05/02/11 14:53:10 (pid:4758) ** SubsystemInfo: name=SCHEDD type=SCHEDD(5) class=DAEMON(1)
05/02/11 14:53:10 (pid:4758) ** Configuration: subsystem:SCHEDD local:<NONE> class:DAEMON
05/02/11 14:53:10 (pid:4758) ** $CondorVersion: 7.6.1 Apr 27 2011 BuildID: RH-7.6.1-0.4.el6 $
05/02/11 14:53:10 (pid:4758) ** $CondorPlatform: X86_64-RedHat_6.0 $
05/02/11 14:53:10 (pid:4758) ** PID = 4758
05/02/11 14:53:10 (pid:4758) ** Log last touched 5/2 14:53:08
05/02/11 14:53:10 (pid:4758) ******************************************************
05/02/11 14:53:10 (pid:4758) Using config source: /etc/condor/condor_config
05/02/11 14:53:10 (pid:4758) Using local config sources:
05/02/11 14:53:10 (pid:4758) /etc/condor/config.d/00personal_condor.config
05/02/11 14:53:10 (pid:4758) /etc/condor/config.d/60condor-qmf.config
05/02/11 14:53:10 (pid:4758) /etc/condor/config.d/61aviary.config
05/02/11 14:53:10 (pid:4758) /etc/condor/config.d/zzz_condor_config.test
05/02/11 14:53:10 (pid:4758) DaemonCore: command socket at <IP:46525>
05/02/11 14:53:10 (pid:4758) DaemonCore: private command socket at <10.34.37.121:46525>
05/02/11 14:53:10 (pid:4758) Setting maximum accepts per cycle 4.
05/02/11 14:53:10 (pid:4758) ClassAdLogPlugin registration succeeded
05/02/11 14:53:10 (pid:4758) ScheddPlugin registration succeeded
05/02/11 14:53:10 (pid:4758) Successfully loaded plugin: /usr/lib64/condor/plugins/MgmtScheddPlugin-plugin.so
05/02/11 14:53:11 (pid:4758) ClassAdLogPlugin registration succeeded
05/02/11 14:53:11 (pid:4758) ScheddPlugin registration succeeded
05/02/11 14:53:11 (pid:4758) Successfully loaded plugin: /usr/lib64/condor/plugins/AviaryScheddPlugin-plugin.so
05/02/11 14:53:11 (pid:4758) Successfully loaded plugin: /usr/lib64/condor/plugins/AviaryScheddPlugin-plugin.so
05/02/11 14:53:11 (pid:4758) Axis2 listener on http port: 9090
05/02/11 14:53:11 (pid:4758) History file rotation is enabled.
05/02/11 14:53:11 (pid:4758) Maximum history file size is: 314572800 bytes
05/02/11 14:53:11 (pid:4758) Number of rotated history files is: 10
05/02/11 14:53:12 (pid:4758) "/usr/sbin/condor_shadow.std -classad" did not produce any output, ignoring
05/02/11 14:53:17 (pid:4758) TransferQueueManager stats: active up=0/10 down=0/10; waiting up=0 down=0; wait time up=0s down=0s
No dynamic library loading issues found.
>>> VERIFIED
|