Bug 832331
Summary: | corrupted log file | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise MRG | Reporter: | Martin Kudlej <mkudlej> | ||||
Component: | condor | Assignee: | Erik Erlandson <eerlands> | ||||
Status: | CLOSED NOTABUG | QA Contact: | MRG Quality Engineering <mrgqe-bugs> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | urgent | ||||||
Version: | Development | CC: | dahorak, eerlands, iboverma, ltoscano, matt, rrati, sgraf, tstclair | ||||
Target Milestone: | 2.2 | Keywords: | Regression | ||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2012-06-22 21:54:46 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Martin Kudlej
2012-06-15 08:00:58 UTC
Issue indipendently reproduced during triggerd testing (the message is in a different log but it seems that the same codepath is hit). Configure a machine with triggerd, execute the trigger test: condor_trigger_config -s localhost Then restart condor. Triggerd won't start again, /var/log/condor/triggerd.log shows: 06/19/12 06:27:05 main_init() called 06/19/12 06:27:05 WARNING: Encountered corrupt log record 12 (byte offset 335) 06/19/12 06:27:05 Lines following corrupt log record 12 (up to 3): 06/19/12 06:27:05 106 06/19/12 06:27:05 ERROR "Error: corrupt log record 12 (byte offset 335) occurred inside closed transaction, recovery failed" at line 1104 in file /builddir/build/BUILD/condor-7.6.4/src/condor_utils/classad_log.cpp When removing the content of /var/lib/condor/spool/triggers.log, triggerd is able to restart again (temporarily) away. This is the content generated by the aforementioned steps: --------------------------------------- 107 1 CreationTimestamp 1340100986 105 101 1340101063 EventTrigger Trigger 103 1340101063 TriggerText "$(Machine) has a slot 1" 103 1340101063 TriggerName "TestTrigger" 103 1340101063 TriggerQuery "(SlotID == 1)" 103 1340101063 TargetType "Trigger" 103 1340101063 CurrentTime time() 103 1340101063 MyType "EventTrigger" 106 105 103 1340101063 TriggerName Changed Test Trigger 106 105 103 1340101063 TriggerQuery (SlotID > 0) 106 105 103 1340101063 TriggerText $(Machine) has a slot $(SlotID) 106 105 102 1340101063 106 --------------------------------------- Reproduced on RHEL5.8/i386 and RHEL6.3/x86_64, condor 7.6.5-0.15. Raising the severity and priority of the bug. Does this only happen when you kill the daemons @ once? (In reply to comment #3) > Does this only happen when you kill the daemons @ once? At least in the triggerd case, I noticed the error after a service condor restart But then I tried to kill triggerd only, when master respawns it then it can't restart because of the error. I've tried to remove all temporary condor files including locks, logs, address files and so on. It haven't helped. When I looked at martin's system, I removed job_queue.log and the schedd was able to start back up. Created attachment 593016 [details]
Corrupted job queue log
|