Bug 834669 - introduction of strict log queue log parsing rejects some legacy classad log exprs
introduction of strict log queue log parsing rejects some legacy classad log ...
Status: CLOSED ERRATA
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor (Show other bugs)
2.2
All All
high Severity high
: 2.2
: ---
Assigned To: Erik Erlandson
Daniel Horák
: Regression
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-06-22 14:05 EDT by Erik Erlandson
Modified: 2012-09-24 07:18 EDT (History)
6 users (show)

See Also:
Fixed In Version: condor-7.6.5-0.16
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-09-19 14:26:21 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Condor 3069 None None None 2012-06-22 14:05:11 EDT

  None (edit)
Description Erik Erlandson 2012-06-22 14:05:11 EDT
Description of problem:
For a long time, classad logs did not enforce successful parsing of RHS expressions in 'set attribute' operation (103) when loading in a classad log file. This allowed some undefined set of instances where RHS expressions were created with improper classad syntax to creep in over time.

When I introduced strict parsing for Bug 784051, I kicked over some of these instances. Although it passed upstream regression testing, we have discovered a few more cases from external components.

As a transition measure, we are going to add a new flag, which can be used to turn on strict parsing.  For our purposes we will default this flag to FALSE, to maintain backward compatible behavior with legacy job queue logs.


How reproducible:
100%

Steps to Reproduce:
1. introduce an invalid RHS expression to a 'set attr' operation (op code 103), for example:

103 1340101063 TriggerName Changed Test Trigger


Actual results:
Exception will be thrown (or a log message showing read error on last classad transaction will occur, if last transaction was not closed)

Expected results:
improper RHS will be accepted in legacy fashion.
Comment 2 Erik Erlandson 2012-06-22 18:53:05 EDT
Downstream we are defaulting CLASSAD_LOG_STRICT_PARSING to false, to prevent any customer legacy classad log files from failing.

TESTING:

to test the fix, copy this content to job_queue.log

107 1 CreationTimestamp 1334245749
101 0.0 Job Machine
103 0.0 NextClusterNum 1
105 
103 1.0 JobStatus 2  ZOMG!
103 1.0 EnteredCurrentStatus 1334245771
103 1.0 LastSuspensionTime 0
103 1.0 CurrentHosts 1
106 
105 
103 1.1 LastJobStatus 1
103 1.1 JobStatus 2


If you let the value of CLASSAD_LOG_STRICT_PARSING default, you should see the scheduler allow the incorrect line to pass with a warning, in the SchedLog:

06/22/12 13:24:43 (pid:32102) WARNING: strict classad parsing failed for expression: "2  ZOMG!"
06/22/12 13:24:43 (pid:32102) Detected unterminated transaction in ClassAd Log/home/eje/condor/local/spool/job_queue.log. Forcing rotation.
06/22/12 13:24:43 (pid:32102) About to rotate ClassAd log /home/eje/condor/local/spool/job_queue.log
06/22/12 13:24:43 (pid:32102) About to save historical log /home/eje/condor/local/spool/job_queue.log.1


To test functionality when flag is enabled, set:
CLASSAD_LOG_STRICT_PARSING = True

If you attempt to repeat the above test, it should now fail with an exception:

06/22/12 12:59:46 (pid:31585) WARNING: Encountered corrupt log record 5 (byte offset 89)
06/22/12 12:59:46 (pid:31585) Lines following corrupt log record 5 (up to 3):
06/22/12 12:59:46 (pid:31585)     103 1.0 EnteredCurrentStatus 1334245771
06/22/12 12:59:46 (pid:31585)     103 1.0 LastSuspensionTime 0
06/22/12 12:59:46 (pid:31585)     103 1.0 CurrentHosts 1
06/22/12 12:59:46 (pid:31585) ERROR "Error: corrupt log record 5 (byte offset 89) occurred inside closed transaction, recovery failed" at line 1141 in file /home/eje/git/grid/src/condor_utils/classad_log.cpp
Comment 3 Erik Erlandson 2012-06-22 18:54:39 EDT
pushed fix to UPSTREAM-7.9.0-BZ834669-strict-parsing-flag
Comment 4 Luigi Toscano 2012-06-26 05:24:09 EDT
(In reply to comment #2)
> 
> 06/22/12 13:24:43 (pid:32102) WARNING: strict classad parsing failed for
> expression: "2  ZOMG!"
> 06/22/12 13:24:43 (pid:32102) Detected unterminated transaction in ClassAd
> Log/home/eje/condor/local/spool/job_queue.log. Forcing rotation.
> 06/22/12 13:24:43 (pid:32102) About to rotate ClassAd log
> /home/eje/condor/local/spool/job_queue.log
> 06/22/12 13:24:43 (pid:32102) About to save historical log
> /home/eje/condor/local/spool/job_queue.log.1

Is it the rotation a new behavior? If yes, could it be a problem, if it happens everytime a bad classad is found?
Comment 5 Erik Erlandson 2012-06-26 10:38:29 EDT
(In reply to comment #4)

> Is it the rotation a new behavior? If yes, could it be a problem, if it
> happens everytime a bad classad is found?

The rotation is a standard response to detecting an unterminated transaction in the job queue log (which is the case in the test job_queue.log).  When it happens, it only happens once, after it finishes reading the log.
Comment 6 Daniel Horák 2012-06-29 02:51:38 EDT
Tested on RHEL 5.8 and 6.3 - i386, x86_64 according to comment 2 (following output is from RHEL 6.3 i386, on other platforms is similar):

# rpm -qa | grep condor
condor-7.6.5-0.16.el6.i686
condor-classads-7.6.5-0.16.el6.i686

# cat /var/lib/condor/spool/job_queue.log 
107 1 CreationTimestamp 1334245749
101 0.0 Job Machine
103 0.0 NextClusterNum 1
105 
103 1.0 JobStatus 2  ZOMG!
103 1.0 EnteredCurrentStatus 1334245771
103 1.0 LastSuspensionTime 0
103 1.0 CurrentHosts 1
106 
105 
103 1.1 LastJobStatus 1
103 1.1 JobStatus 2

# condor_config_val SCHEDD_DEBUG
D_FULLDEBUG

# condor_config_val CLASSAD_LOG_STRICT_PARSING
false

# service condor start
Starting Condor daemons:                                   [  OK  ]

# tail -F /var/log/condor/SchedLog 
06/28/12 12:36:30 WARNING: strict classad parsing failed for expression: "2  ZOMG!"
06/28/12 12:36:30 Detected unterminated transaction in ClassAd Log/var/lib/condor/spool/job_queue.log. Forcing rotation.
06/28/12 12:36:30 About to rotate ClassAd log /var/lib/condor/spool/job_queue.log
06/28/12 12:36:30 About to save historical log /var/lib/condor/spool/job_queue.log.1


# condor_config_val CLASSAD_LOG_STRICT_PARSING
True

# tail -F /var/log/condor/SchedLog 
06/28/12 12:41:10 WARNING: Encountered corrupt log record 5 (byte offset 89)
06/28/12 12:41:10 Lines following corrupt log record 5 (up to 3):
06/28/12 12:41:10     103 1.0 EnteredCurrentStatus 1334245771
06/28/12 12:41:10     103 1.0 LastSuspensionTime 0
06/28/12 12:41:10     103 1.0 CurrentHosts 1
06/28/12 12:41:10 ERROR "Error: corrupt log record 5 (byte offset 89) occurred inside closed transaction, recovery failed" at line 1108 in file /builddir/build/BUILD/condor-7.6.4/src/condor_utils/classad_log.cpp

Also retested all other scenarios from bug 784051 with expected behaviour.

>>> VERIFIED

Note You need to log in before you can comment on or make changes to this bug.