Bug 834669
Summary: | introduction of strict log queue log parsing rejects some legacy classad log exprs | ||
---|---|---|---|
Product: | Red Hat Enterprise MRG | Reporter: | Erik Erlandson <eerlands> |
Component: | condor | Assignee: | Erik Erlandson <eerlands> |
Status: | CLOSED ERRATA | QA Contact: | Daniel Horák <dahorak> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 2.2 | CC: | dahorak, iboverma, ltoscano, matt, mkudlej, tstclair |
Target Milestone: | 2.2 | Keywords: | Regression |
Target Release: | --- | ||
Hardware: | All | ||
OS: | All | ||
Whiteboard: | |||
Fixed In Version: | condor-7.6.5-0.16 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2012-09-19 18:26:21 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Erik Erlandson
2012-06-22 18:05:11 UTC
Downstream we are defaulting CLASSAD_LOG_STRICT_PARSING to false, to prevent any customer legacy classad log files from failing. TESTING: to test the fix, copy this content to job_queue.log 107 1 CreationTimestamp 1334245749 101 0.0 Job Machine 103 0.0 NextClusterNum 1 105 103 1.0 JobStatus 2 ZOMG! 103 1.0 EnteredCurrentStatus 1334245771 103 1.0 LastSuspensionTime 0 103 1.0 CurrentHosts 1 106 105 103 1.1 LastJobStatus 1 103 1.1 JobStatus 2 If you let the value of CLASSAD_LOG_STRICT_PARSING default, you should see the scheduler allow the incorrect line to pass with a warning, in the SchedLog: 06/22/12 13:24:43 (pid:32102) WARNING: strict classad parsing failed for expression: "2 ZOMG!" 06/22/12 13:24:43 (pid:32102) Detected unterminated transaction in ClassAd Log/home/eje/condor/local/spool/job_queue.log. Forcing rotation. 06/22/12 13:24:43 (pid:32102) About to rotate ClassAd log /home/eje/condor/local/spool/job_queue.log 06/22/12 13:24:43 (pid:32102) About to save historical log /home/eje/condor/local/spool/job_queue.log.1 To test functionality when flag is enabled, set: CLASSAD_LOG_STRICT_PARSING = True If you attempt to repeat the above test, it should now fail with an exception: 06/22/12 12:59:46 (pid:31585) WARNING: Encountered corrupt log record 5 (byte offset 89) 06/22/12 12:59:46 (pid:31585) Lines following corrupt log record 5 (up to 3): 06/22/12 12:59:46 (pid:31585) 103 1.0 EnteredCurrentStatus 1334245771 06/22/12 12:59:46 (pid:31585) 103 1.0 LastSuspensionTime 0 06/22/12 12:59:46 (pid:31585) 103 1.0 CurrentHosts 1 06/22/12 12:59:46 (pid:31585) ERROR "Error: corrupt log record 5 (byte offset 89) occurred inside closed transaction, recovery failed" at line 1141 in file /home/eje/git/grid/src/condor_utils/classad_log.cpp pushed fix to UPSTREAM-7.9.0-BZ834669-strict-parsing-flag (In reply to comment #2) > > 06/22/12 13:24:43 (pid:32102) WARNING: strict classad parsing failed for > expression: "2 ZOMG!" > 06/22/12 13:24:43 (pid:32102) Detected unterminated transaction in ClassAd > Log/home/eje/condor/local/spool/job_queue.log. Forcing rotation. > 06/22/12 13:24:43 (pid:32102) About to rotate ClassAd log > /home/eje/condor/local/spool/job_queue.log > 06/22/12 13:24:43 (pid:32102) About to save historical log > /home/eje/condor/local/spool/job_queue.log.1 Is it the rotation a new behavior? If yes, could it be a problem, if it happens everytime a bad classad is found? (In reply to comment #4) > Is it the rotation a new behavior? If yes, could it be a problem, if it > happens everytime a bad classad is found? The rotation is a standard response to detecting an unterminated transaction in the job queue log (which is the case in the test job_queue.log). When it happens, it only happens once, after it finishes reading the log. Tested on RHEL 5.8 and 6.3 - i386, x86_64 according to comment 2 (following output is from RHEL 6.3 i386, on other platforms is similar): # rpm -qa | grep condor condor-7.6.5-0.16.el6.i686 condor-classads-7.6.5-0.16.el6.i686 # cat /var/lib/condor/spool/job_queue.log 107 1 CreationTimestamp 1334245749 101 0.0 Job Machine 103 0.0 NextClusterNum 1 105 103 1.0 JobStatus 2 ZOMG! 103 1.0 EnteredCurrentStatus 1334245771 103 1.0 LastSuspensionTime 0 103 1.0 CurrentHosts 1 106 105 103 1.1 LastJobStatus 1 103 1.1 JobStatus 2 # condor_config_val SCHEDD_DEBUG D_FULLDEBUG # condor_config_val CLASSAD_LOG_STRICT_PARSING false # service condor start Starting Condor daemons: [ OK ] # tail -F /var/log/condor/SchedLog 06/28/12 12:36:30 WARNING: strict classad parsing failed for expression: "2 ZOMG!" 06/28/12 12:36:30 Detected unterminated transaction in ClassAd Log/var/lib/condor/spool/job_queue.log. Forcing rotation. 06/28/12 12:36:30 About to rotate ClassAd log /var/lib/condor/spool/job_queue.log 06/28/12 12:36:30 About to save historical log /var/lib/condor/spool/job_queue.log.1 # condor_config_val CLASSAD_LOG_STRICT_PARSING True # tail -F /var/log/condor/SchedLog 06/28/12 12:41:10 WARNING: Encountered corrupt log record 5 (byte offset 89) 06/28/12 12:41:10 Lines following corrupt log record 5 (up to 3): 06/28/12 12:41:10 103 1.0 EnteredCurrentStatus 1334245771 06/28/12 12:41:10 103 1.0 LastSuspensionTime 0 06/28/12 12:41:10 103 1.0 CurrentHosts 1 06/28/12 12:41:10 ERROR "Error: corrupt log record 5 (byte offset 89) occurred inside closed transaction, recovery failed" at line 1108 in file /builddir/build/BUILD/condor-7.6.4/src/condor_utils/classad_log.cpp Also retested all other scenarios from bug 784051 with expected behaviour. >>> VERIFIED |