Bug 569561 - errors in >=2GB log file handling
Summary: errors in >=2GB log file handling
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor
Version: 1.2
Hardware: x86_64
OS: Linux
low
medium
Target Milestone: 2.1
: ---
Assignee: Erik Erlandson
QA Contact: Daniel Horák
URL:
Whiteboard:
Depends On:
Blocks: 743350
TreeView+ depends on / blocked
 
Reported: 2010-03-01 17:46 UTC by Jon Thomas
Modified: 2012-01-23 17:25 UTC (History)
4 users (show)

Fixed In Version: condor-7.6.5-0.2
Doc Type: Bug Fix
Doc Text:
The MAX_[subsys]_LOG values were read using the atoi() function instead of a 64-bit-capable lexical casting function, which rendered it impossible to set a maximum log size equal to, or greater than, 2GB. The lexical casting function lex_cast<T>() has been implemented to read MAX_[subsys]_LOG values, with the result that log files of size equal to, or greater than, 2GB, can now be configured.
Clone Of:
Environment:
Last Closed: 2012-01-23 17:25:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
patch to fix 2gb handling (1.51 KB, patch)
2010-03-01 17:46 UTC, Jon Thomas
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 752355 0 low CLOSED RFE: Evaluate MAX_*_LOG values 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHEA-2012:0045 0 normal SHIPPED_LIVE Red Hat Enterprise MRG Grid 2.1 bug fix and enhancement update 2012-01-23 22:22:58 UTC

Internal Links: 752355

Description Jon Thomas 2010-03-01 17:46:04 UTC
Created attachment 397139 [details]
patch to fix 2gb handling

Two problems:

-Atoi conversion limits file growth.
-Output error

In general, we need to use unsigned values instead of signed values.

Attaching a patch

This fixed problems with and applies on top of https://bugzilla.redhat.com/show_bug.cgi?id=527801.

Comment 1 Jon Thomas 2010-03-01 17:58:42 UTC
notes on testing

set max log to some value over 2Gb. (4000 more than 2gb in this case)

MAX_SCHEDD_LOG = 2147487648

Stop condor and create a file close to 2GB

dd if=/dev/zero of=SchedLog bs=1024 count=1 seek=2097150

This will make a 2147482624 byte SchedLog

restart condor a few times or run some jobs to generate log output.
Witness log growth and rotation

-rw-r--r-- 1 condor condor        482 Mar  1 12:48 SchedLog
-rw-r--r-- 1 condor condor 2147487764 Mar  1 12:48 SchedLog.old

tail SchedLog.old

03/01 12:48:07 (pid:14361) Using config source: /etc/condor/condor_config
03/01 12:48:07 (pid:14361) Using local config sources: 
03/01 12:48:07 (pid:14361)    /var/lib/condor/condor_config.local
03/01 12:48:07 (pid:14361) DaemonCore: Command Socket at <127.0.0.1:47336>
03/01 12:48:07 (pid:14361) History file rotation is enabled.
03/01 12:48:07 (pid:14361)   Maximum history file size is: 20971520 bytes
03/01 12:48:07 (pid:14361)   Number of rotated history files is: 2
03/01 12:48:07 (pid:14361) "/usr/sbin/condor_shadow.std -classad" did not produce any output, ignoring
MaxLog = 2147487648, length = 2147487673
Saving log file to "/var/log/condor/SchedLog.old"

Comment 2 Erik Erlandson 2011-09-16 23:18:17 UTC
upstream: https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2471

Comment 3 Erik Erlandson 2011-09-16 23:19:08 UTC
REPRO/TEST:

Use the following config to set a value for MAX_SCHEDD_LOG that is slightly larger than 2GB:

# 2^31 + 1000
MAX_SCHEDD_LOG = 2147484648

Before fix: If you start up the schedd, you should almost immediately see the SchedLog roll over, and a goofy value for MaxLog:

$ tail SchedLog.old
09/16/11 10:09:25 (pid:16467) MaxLog = -2147482648, length = 167
09/16/11 10:09:25 (pid:16467) Saving log file to "/usr/local/condor/local/log/SchedLog.old"

To test the fix, pre-set SchedLog at just under 2GB:

[condor@rorschach log]$ rm SchedLog
[condor@rorschach log]$ dd if=/dev/zero of=SchedLog bs=1024 count=1 seek=2097150

After fix, you should (again, immediately) see the SchedLog roll over properly at a value slightly larger than MAX_SCHEDD_LOG:

$ tail SchedLog.old
09/16/11 16:00:29 (pid:3761) MaxLog = 2147484648, length = 2147484710
09/16/11 16:00:29 (pid:3761) Saving log file to "/usr/local/condor/local/log/SchedLog.old"

Comment 4 Erik Erlandson 2011-09-29 19:48:08 UTC
Fix committed upstream to 7.6

Comment 5 Erik Erlandson 2011-09-29 19:48:08 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause:
The MAX_<subsys>_LOG configurations were read using atoi(), instead of a 64-bit capable lexical casting function.

Consequence:
Maximum log sizes >= 2GB could not be configured.

Fix:
A lexical casting function lex_cast<T>() was implemented and used to read in MAX_<subsys>_LOG values.

Result:
Log file sizes of >= 2GB can now be configured, on systems that support 64-bit file sizes.

Comment 7 Daniel Horák 2011-10-19 12:31:55 UTC
On RHEL 5.7 i386 is problem with rotating logs. If MAX_SCHEDD_LOG is higher then 2147483647 (2^31-1), logs are not rotated.

Reproduction:
# rpm -q condor
  condor-7.6.4-0.8.el5

# condor_config_val MAX_SCHEDD_LOG
  2147483648

# rm -f /var/log/condor/SchedLog*

# ls -AlF | grep Sched
  -rw-r--r-- 1 condor condor 2147482624 Oct 19 13:53 SchedLog

# service condor start
  Starting Condor daemons:                                   [  OK  ]

# ls -AlF | grep Sched
  -rw-r--r-- 1 condor condor 2147484269 Oct 19 13:53 SchedLog

# runuser -s /bin/bash -l condor -c "condor_submit /tmp/test.job"
  Submitting job(s).
  1 job(s) submitted to cluster 1.

# ls -AlF | grep Sched
  -rw-r--r-- 1 condor condor 2147485988 Oct 19 13:54 SchedLog

# condor_rm -all
  All jobs marked for removal.

# ls -AlF | grep Sched
  -rw-r--r-- 1 condor condor 2147486664 Oct 19 13:54 SchedLog



On RHEL 6.1 i386 are logs rotated, but around bad size. If MAX_SCHEDD_LOG is higher then 2147483647, they are rotated around 2147483647.

# rpm -q condor
  condor-7.6.4-0.8.el6.i686

# condor_config_val MAX_SCHEDD_LOG
  4000000000

# ls -AlF | grep SchedLog
  -rw-r--r--. 1 condor condor 2147482624 Oct 19 14:04 SchedLog

# service condor start
  Starting Condor daemons:                                   [  OK  ]

# ls -AlF | grep SchedLog
  -rw-r--r--. 1 condor condor        695 Oct 19 14:04 SchedLog
  -rw-r--r--. 1 condor condor 2147483797 Oct 19 14:04 SchedLog.old

# tail -2 SchedLog.old 
  10/19/11 14:04:44 (pid:2288) MaxLog = 2147483647, length = 2147483648
  10/19/11 14:04:45 (pid:2288) Saving log file to "/var/log/condor/SchedLog.old"


On x86_64 (RHEL 5.7 and 6.1) looks everything OK.

>>>> ASSIGNED

Comment 12 Daniel Horák 2011-10-25 10:00:04 UTC
Problem is still here on 32-bit machines (on RHEL 5.7 and 6.1 the same). 
When I set some of MAX_*_LOG to value higher than 2147483647, particular daemon exit with status 4.
Tested with MAX_SCHEDD_LOG, MAX_STARTD_LOG, MAX_COLLECTOR_LOG and MAX_NEGOTIATOR_LOG.

Output in MasterLog is similar to:
  10/25/11 11:29:06 DaemonCore: No more children processes to reap.
  10/25/11 11:29:06 The STARTD (pid 6703) exited with status 4
  10/25/11 11:29:06 ProcAPI::buildFamily failed: parent 6703 not found on system.
  10/25/11 11:29:06 Sending obituary for "/usr/sbin/condor_startd"
  10/25/11 11:29:06 Forking Mailer process...
  10/25/11 11:29:06 restarting /usr/sbin/condor_startd in 10 seconds
  10/25/11 11:29:06 enter Daemons::UpdateCollector

With MAX_MASTER_LOG set to value higher than 2147483647, attempt to start condor failed with this error:
# service condor start
  Starting Condor daemons: ERROR "Invalid config: MAX_MASTER_LOG = 2147483648" at line 297 in file /builddir/build/BUILD/condor-7.6.3/src/condor_utils/dprintf_config.cpp
                                                           [FAILED]
# rpm -q condor
  condor-7.6.5-0.2.el5

Comment 13 Erik Erlandson 2011-10-25 13:01:12 UTC
(In reply to comment #12)
> Problem is still here on 32-bit machines (on RHEL 5.7 and 6.1 the same). 
> When I set some of MAX_*_LOG to value higher than 2147483647, particular daemon
> exit with status 4.

That is correct behavior: systems with 32-bit off_t will raise an exception while trying to parse a value > 2147483647.  normally an exception would be logged, however this exception is happening while the logging is being configured, and so no log output can happen.

Comment 16 Daniel Horák 2011-10-27 08:15:07 UTC
Reproduced on RHEL 6.1 x86_64:
# rpm -qa | grep condor
  condor-classads-7.6.3-0.3.el6.x86_64
  condor-7.6.3-0.3.el6.x86_64

Configuration:
  MAX_SCHEDD_LOG = 2147484648

# rm -f SchedLog*
# ls -AlF | grep SchedLog

# service condor restart
  Starting Condor daemons:                                   [  OK  ]

# ls -AlF | grep SchedLog
  -rw-r--r--. 1 condor condor  202 Oct 19 08:57 SchedLog
  -rw-r--r--. 1 condor condor  322 Oct 19 08:57 SchedLog.old

# cat SchedLog.old 
  10/19/11 08:57:29 (pid:2415) Now in new log file /var/log/condor/SchedLog
  10/19/11 08:57:29 (pid:2415) "/usr/sbin/condor_shadow.std -classad" did not produce any output, ignoring
  10/19/11 08:57:35 (pid:2415) MaxLog = -2147482648, length = 179
  10/19/11 08:57:35 (pid:2415) Saving log file to "/var/log/condor/SchedLog.old"



Verification on RHEL 6.1 x86_64:
# rpm -qa | grep condor
  condor-classads-7.6.5-0.2.el6.x86_64
  condor-7.6.5-0.2.el6.x86_64

# condor_config_val MAX_SCHEDD_LOG
  2147484648

# ls -AlF | grep SchedLog
  -rw-r--r--. 1 condor condor 2147482624 Oct 27 09:46 SchedLog

# service condor start
  Starting Condor daemons:                                   [  OK  ]

# ls -AlF | grep SchedLog
  -rw-r--r--. 1 condor condor 2147484259 Oct 27 09:47 SchedLog

# service condor restart
  Stopping Condor daemons:                                   [  OK  ]
  Starting Condor daemons:                                   [  OK  ]

# ls -AlF | grep SchedLog
  -rw-r--r--. 1 condor condor       1709 Oct 27 09:47 SchedLog
  -rw-r--r--. 1 condor condor 2147484820 Oct 27 09:47 SchedLog.old

# tail -2 SchedLog.old 
  10/27/11 09:47:30 (pid:3084) MaxLog = 2147484648, length = 2147484671
  10/27/11 09:47:30 (pid:3084) Saving log file to "/var/log/condor/SchedLog.old"


Verified on RHEL 5.7 x86_64 and RHEL 6.1 x86_64 with MAX_SCHEDD_LOG set to: 
2147484647 (2^31-1), 2147484648 (2^31), 1048576 (2^20) and 4294967296 (2^32).


>>> VERIFIED

Comment 17 Douglas Silas 2011-11-16 15:32:27 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,11 +1 @@
-Cause:
+The MAX_[subsys]_LOG values were read using the atoi() function instead of a 64-bit-capable lexical casting function, which rendered it impossible to set a maximum log size equal to, or greater than, 2GB. The lexical casting function lex_cast<T>() has been implemented to read MAX_[subsys]_LOG values, with the result that log files of size equal to, or greater than, 2GB, can now be configured.-The MAX_<subsys>_LOG configurations were read using atoi(), instead of a 64-bit capable lexical casting function.
-
-Consequence:
-Maximum log sizes >= 2GB could not be configured.
-
-Fix:
-A lexical casting function lex_cast<T>() was implemented and used to read in MAX_<subsys>_LOG values.
-
-Result:
-Log file sizes of >= 2GB can now be configured, on systems that support 64-bit file sizes.

Comment 18 errata-xmlrpc 2012-01-23 17:25:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2012-0045.html


Note You need to log in before you can comment on or make changes to this bug.