Bug 834669 - introduction of strict log queue log parsing rejects some legacy classad log exprs
Summary: introduction of strict log queue log parsing rejects some legacy classad log ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor
Version: 2.2
Hardware: All
OS: All
high
high
Target Milestone: 2.2
: ---
Assignee: Erik Erlandson
QA Contact: Daniel Horák
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-06-22 18:05 UTC by Erik Erlandson
Modified: 2012-09-24 11:18 UTC (History)
6 users (show)

Fixed In Version: condor-7.6.5-0.16
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-09-19 18:26:21 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Condor 3069 0 None None None 2012-06-22 18:05:11 UTC
Red Hat Bugzilla 834659 0 medium CLOSED triggerd injects improper classad expressions into classad log file 2021-02-22 00:41:40 UTC

Internal Links: 834659

Description Erik Erlandson 2012-06-22 18:05:11 UTC
Description of problem:
For a long time, classad logs did not enforce successful parsing of RHS expressions in 'set attribute' operation (103) when loading in a classad log file. This allowed some undefined set of instances where RHS expressions were created with improper classad syntax to creep in over time.

When I introduced strict parsing for Bug 784051, I kicked over some of these instances. Although it passed upstream regression testing, we have discovered a few more cases from external components.

As a transition measure, we are going to add a new flag, which can be used to turn on strict parsing.  For our purposes we will default this flag to FALSE, to maintain backward compatible behavior with legacy job queue logs.


How reproducible:
100%

Steps to Reproduce:
1. introduce an invalid RHS expression to a 'set attr' operation (op code 103), for example:

103 1340101063 TriggerName Changed Test Trigger


Actual results:
Exception will be thrown (or a log message showing read error on last classad transaction will occur, if last transaction was not closed)

Expected results:
improper RHS will be accepted in legacy fashion.

Comment 2 Erik Erlandson 2012-06-22 22:53:05 UTC
Downstream we are defaulting CLASSAD_LOG_STRICT_PARSING to false, to prevent any customer legacy classad log files from failing.

TESTING:

to test the fix, copy this content to job_queue.log

107 1 CreationTimestamp 1334245749
101 0.0 Job Machine
103 0.0 NextClusterNum 1
105 
103 1.0 JobStatus 2  ZOMG!
103 1.0 EnteredCurrentStatus 1334245771
103 1.0 LastSuspensionTime 0
103 1.0 CurrentHosts 1
106 
105 
103 1.1 LastJobStatus 1
103 1.1 JobStatus 2


If you let the value of CLASSAD_LOG_STRICT_PARSING default, you should see the scheduler allow the incorrect line to pass with a warning, in the SchedLog:

06/22/12 13:24:43 (pid:32102) WARNING: strict classad parsing failed for expression: "2  ZOMG!"
06/22/12 13:24:43 (pid:32102) Detected unterminated transaction in ClassAd Log/home/eje/condor/local/spool/job_queue.log. Forcing rotation.
06/22/12 13:24:43 (pid:32102) About to rotate ClassAd log /home/eje/condor/local/spool/job_queue.log
06/22/12 13:24:43 (pid:32102) About to save historical log /home/eje/condor/local/spool/job_queue.log.1


To test functionality when flag is enabled, set:
CLASSAD_LOG_STRICT_PARSING = True

If you attempt to repeat the above test, it should now fail with an exception:

06/22/12 12:59:46 (pid:31585) WARNING: Encountered corrupt log record 5 (byte offset 89)
06/22/12 12:59:46 (pid:31585) Lines following corrupt log record 5 (up to 3):
06/22/12 12:59:46 (pid:31585)     103 1.0 EnteredCurrentStatus 1334245771
06/22/12 12:59:46 (pid:31585)     103 1.0 LastSuspensionTime 0
06/22/12 12:59:46 (pid:31585)     103 1.0 CurrentHosts 1
06/22/12 12:59:46 (pid:31585) ERROR "Error: corrupt log record 5 (byte offset 89) occurred inside closed transaction, recovery failed" at line 1141 in file /home/eje/git/grid/src/condor_utils/classad_log.cpp

Comment 3 Erik Erlandson 2012-06-22 22:54:39 UTC
pushed fix to UPSTREAM-7.9.0-BZ834669-strict-parsing-flag

Comment 4 Luigi Toscano 2012-06-26 09:24:09 UTC
(In reply to comment #2)
> 
> 06/22/12 13:24:43 (pid:32102) WARNING: strict classad parsing failed for
> expression: "2  ZOMG!"
> 06/22/12 13:24:43 (pid:32102) Detected unterminated transaction in ClassAd
> Log/home/eje/condor/local/spool/job_queue.log. Forcing rotation.
> 06/22/12 13:24:43 (pid:32102) About to rotate ClassAd log
> /home/eje/condor/local/spool/job_queue.log
> 06/22/12 13:24:43 (pid:32102) About to save historical log
> /home/eje/condor/local/spool/job_queue.log.1

Is it the rotation a new behavior? If yes, could it be a problem, if it happens everytime a bad classad is found?

Comment 5 Erik Erlandson 2012-06-26 14:38:29 UTC
(In reply to comment #4)

> Is it the rotation a new behavior? If yes, could it be a problem, if it
> happens everytime a bad classad is found?

The rotation is a standard response to detecting an unterminated transaction in the job queue log (which is the case in the test job_queue.log).  When it happens, it only happens once, after it finishes reading the log.

Comment 6 Daniel Horák 2012-06-29 06:51:38 UTC
Tested on RHEL 5.8 and 6.3 - i386, x86_64 according to comment 2 (following output is from RHEL 6.3 i386, on other platforms is similar):

# rpm -qa | grep condor
condor-7.6.5-0.16.el6.i686
condor-classads-7.6.5-0.16.el6.i686

# cat /var/lib/condor/spool/job_queue.log 
107 1 CreationTimestamp 1334245749
101 0.0 Job Machine
103 0.0 NextClusterNum 1
105 
103 1.0 JobStatus 2  ZOMG!
103 1.0 EnteredCurrentStatus 1334245771
103 1.0 LastSuspensionTime 0
103 1.0 CurrentHosts 1
106 
105 
103 1.1 LastJobStatus 1
103 1.1 JobStatus 2

# condor_config_val SCHEDD_DEBUG
D_FULLDEBUG

# condor_config_val CLASSAD_LOG_STRICT_PARSING
false

# service condor start
Starting Condor daemons:                                   [  OK  ]

# tail -F /var/log/condor/SchedLog 
06/28/12 12:36:30 WARNING: strict classad parsing failed for expression: "2  ZOMG!"
06/28/12 12:36:30 Detected unterminated transaction in ClassAd Log/var/lib/condor/spool/job_queue.log. Forcing rotation.
06/28/12 12:36:30 About to rotate ClassAd log /var/lib/condor/spool/job_queue.log
06/28/12 12:36:30 About to save historical log /var/lib/condor/spool/job_queue.log.1


# condor_config_val CLASSAD_LOG_STRICT_PARSING
True

# tail -F /var/log/condor/SchedLog 
06/28/12 12:41:10 WARNING: Encountered corrupt log record 5 (byte offset 89)
06/28/12 12:41:10 Lines following corrupt log record 5 (up to 3):
06/28/12 12:41:10     103 1.0 EnteredCurrentStatus 1334245771
06/28/12 12:41:10     103 1.0 LastSuspensionTime 0
06/28/12 12:41:10     103 1.0 CurrentHosts 1
06/28/12 12:41:10 ERROR "Error: corrupt log record 5 (byte offset 89) occurred inside closed transaction, recovery failed" at line 1108 in file /builddir/build/BUILD/condor-7.6.4/src/condor_utils/classad_log.cpp

Also retested all other scenarios from bug 784051 with expected behaviour.

>>> VERIFIED


Note You need to log in before you can comment on or make changes to this bug.