| Summary: | QueryServer crash on x86_64 | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise MRG | Reporter: | Martin Kudlej <mkudlej> | ||||
| Component: | condor-aviary | Assignee: | grid-maint-list <grid-maint-list> | ||||
| Status: | CLOSED WONTFIX | QA Contact: | MRG Quality Engineering <mrgqe-bugs> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | low | ||||||
| Version: | Development | CC: | esammons, jneedle, matt, tstclair | ||||
| Target Milestone: | --- | Keywords: | Reopened | ||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | condor-7.8.2-0.1 | Doc Type: | Bug Fix | ||||
| Doc Text: |
Cause: Logging a warning when a DestroyClassAd event occurs in the Query Server.
Consequence: Query Server crashes.
Fix: Fixed a bad string format in error logging.
Result: Query server doesn't crash.
|
Story Points: | --- | ||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2016-05-26 20:01:54 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
Bad dprintf format down a particular code path is the culprit. UW f3604d8
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
New Contents:
Cause: Logging a warning when a DestroyClassAd event occurs in the Query Server.
Consequence: Query Server crashes.
Fix: Fixed a bad string format in error logging.
Result: Query server doesn't crash.
|
Created attachment 557440 [details] logs and configuration Description of problem: I've got 4 machines with Features (priority: name): 0: Master 1: NodeAccess 2: ExecuteNode 3: CentralManager 4: Scheduler 5: QMF 6: JobHooks 7: QueryServer 8: JobServer Parameters: QMF_BROKER_HOST = _host_1 SUSPEND = false ALLOW_WRITE = * START = true CREATE_CORE_FILES = true CONTINUE = true ALLOW_READ = * CONDOR_HOST = _host_1 SCHEDD_CLUSTER_MAXIMUM_VALUE = 3 and generating of corefiles is on in OS, but I don't see any corefile. I've periodically submit simple job and check number of jobs in queue. I also run condor_q every 2 seconds simultaneously to submitting. I see this stackdump in QueryServerLog. 01/25/12 06:00:47 HistoryFile::init:1:Failed to stat /var/lib/condor/spool//history: 2 (No such file or directory) Stack dump for process 21195 at timestamp 1327489367 (21 frames) aviary_query_server(dprintf_dump_stack+0x56)[0x4fd296] aviary_query_server[0x4ff192] /lib64/libpthread.so.0[0x3eee20eb70] /lib64/libc.so.6(strlen+0x10)[0x3eeda79b60] /lib64/libc.so.6(_IO_vfprintf+0x4479)[0x3eeda46cb9] /lib64/libc.so.6(vsnprintf+0x9a)[0x3eeda699da] aviary_query_server(vprintf_length+0x32)[0x502dc2] aviary_query_server(vsprintf_realloc+0x52)[0x502e22] aviary_query_server[0x4fdd23] aviary_query_server(_condor_dprintf_va+0x313)[0x4fedb3] aviary_query_server(dprintf+0x86)[0x4ea186] aviary_query_server(_ZN23JobServerJobLogConsumer14DestroyClassAdEPKc+0x69)[0x461369] aviary_query_server(_ZN16ClassAdLogReader15ProcessLogEntryEP15ClassAdLogEntryP16ClassAdLogParser+0xa2)[0x5265a2] aviary_query_server(_ZN16ClassAdLogReader15IncrementalLoadEv+0x36)[0x5265e6] aviary_query_server(_ZN16ClassAdLogReader4PollEv+0xbf)[0x52677f] aviary_query_server(_ZN12JobLogMirror26TimerHandler_JobLogPollingEv+0x21)[0x4fee71] aviary_query_server(_ZN12TimerManager7TimeoutEv+0x155)[0x48e005] aviary_query_server(_ZN10DaemonCore6DriverEv+0x248)[0x47ae78] aviary_query_server(main+0xed0)[0x471030] /lib64/libc.so.6(__libc_start_main+0xf4)[0x3eeda1d994] aviary_query_server[0x45d4f9] I see this only on x86_64 systems. Version-Release number of selected component (if applicable): condor-wallaby-client-4.1.2-1.el5 qpid-qmf-devel-0.10-11.el5 condor-low-latency-1.2-2.el5 condor-ec2-enhanced-1.3.0-1.el5 condor-wallaby-base-db-1.19-1.el5 condor-kbdd-7.6.5-0.12.el5 python-qpid-qmf-0.10-11.el5 condor-job-hooks-1.5-4.el5 python-qpid-0.10-1.el5 qpid-cpp-client-0.10-9.el5 python-wallabyclient-4.1.2-1.el5 qpid-cpp-client-devel-0.10-9.el5 ruby-qpid-qmf-0.10-11.el5 condor-wallaby-tools-4.1.2-1.el5 qpid-qmf-debuginfo-0.10-11.el5 python-condorec2e-1.3.0-1.el5 condor-ec2-enhanced-hooks-1.3.0-1.el5 wallaby-utils-0.12.5-1.el5 wallaby-0.12.5-1.el5 condor-classads-7.6.5-0.12.el5 condor-aviary-7.6.5-0.12.el5 condor-debuginfo-7.6.5-0.12.el5 condor-vm-gahp-7.6.5-0.12.el5 python-condorutils-1.5-4.el5 qpid-cpp-server-0.10-9.el5 qpid-qmf-0.10-11.el5 qpid-tools-0.10-6.el5 ruby-wallaby-0.12.5-1.el5 python-wallaby-0.12.5-1.el5 condor-7.6.5-0.12.el5 condor-qmf-7.6.5-0.12.el5 How reproducible: 100% Steps to Reproduce: 1. install condor, qmf and aviary support for condor 2. set it up as it is described above 3. service condor stop 4. rm -f /var/log/condor/* 5. rm -f /var/lib/condor/spool/* 6. service condor start 7. periodically submit simple job 8. wait till raise of stackdump Actual results: Aviary server crashes and master should start it again. Expected results: Aviary server won't crash.