Created attachment 397139 [details] patch to fix 2gb handling Two problems: -Atoi conversion limits file growth. -Output error In general, we need to use unsigned values instead of signed values. Attaching a patch This fixed problems with and applies on top of https://bugzilla.redhat.com/show_bug.cgi?id=527801.
notes on testing set max log to some value over 2Gb. (4000 more than 2gb in this case) MAX_SCHEDD_LOG = 2147487648 Stop condor and create a file close to 2GB dd if=/dev/zero of=SchedLog bs=1024 count=1 seek=2097150 This will make a 2147482624 byte SchedLog restart condor a few times or run some jobs to generate log output. Witness log growth and rotation -rw-r--r-- 1 condor condor 482 Mar 1 12:48 SchedLog -rw-r--r-- 1 condor condor 2147487764 Mar 1 12:48 SchedLog.old tail SchedLog.old 03/01 12:48:07 (pid:14361) Using config source: /etc/condor/condor_config 03/01 12:48:07 (pid:14361) Using local config sources: 03/01 12:48:07 (pid:14361) /var/lib/condor/condor_config.local 03/01 12:48:07 (pid:14361) DaemonCore: Command Socket at <127.0.0.1:47336> 03/01 12:48:07 (pid:14361) History file rotation is enabled. 03/01 12:48:07 (pid:14361) Maximum history file size is: 20971520 bytes 03/01 12:48:07 (pid:14361) Number of rotated history files is: 2 03/01 12:48:07 (pid:14361) "/usr/sbin/condor_shadow.std -classad" did not produce any output, ignoring MaxLog = 2147487648, length = 2147487673 Saving log file to "/var/log/condor/SchedLog.old"
upstream: https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2471
REPRO/TEST: Use the following config to set a value for MAX_SCHEDD_LOG that is slightly larger than 2GB: # 2^31 + 1000 MAX_SCHEDD_LOG = 2147484648 Before fix: If you start up the schedd, you should almost immediately see the SchedLog roll over, and a goofy value for MaxLog: $ tail SchedLog.old 09/16/11 10:09:25 (pid:16467) MaxLog = -2147482648, length = 167 09/16/11 10:09:25 (pid:16467) Saving log file to "/usr/local/condor/local/log/SchedLog.old" To test the fix, pre-set SchedLog at just under 2GB: [condor@rorschach log]$ rm SchedLog [condor@rorschach log]$ dd if=/dev/zero of=SchedLog bs=1024 count=1 seek=2097150 After fix, you should (again, immediately) see the SchedLog roll over properly at a value slightly larger than MAX_SCHEDD_LOG: $ tail SchedLog.old 09/16/11 16:00:29 (pid:3761) MaxLog = 2147484648, length = 2147484710 09/16/11 16:00:29 (pid:3761) Saving log file to "/usr/local/condor/local/log/SchedLog.old"
Fix committed upstream to 7.6
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause: The MAX_<subsys>_LOG configurations were read using atoi(), instead of a 64-bit capable lexical casting function. Consequence: Maximum log sizes >= 2GB could not be configured. Fix: A lexical casting function lex_cast<T>() was implemented and used to read in MAX_<subsys>_LOG values. Result: Log file sizes of >= 2GB can now be configured, on systems that support 64-bit file sizes.
On RHEL 5.7 i386 is problem with rotating logs. If MAX_SCHEDD_LOG is higher then 2147483647 (2^31-1), logs are not rotated. Reproduction: # rpm -q condor condor-7.6.4-0.8.el5 # condor_config_val MAX_SCHEDD_LOG 2147483648 # rm -f /var/log/condor/SchedLog* # ls -AlF | grep Sched -rw-r--r-- 1 condor condor 2147482624 Oct 19 13:53 SchedLog # service condor start Starting Condor daemons: [ OK ] # ls -AlF | grep Sched -rw-r--r-- 1 condor condor 2147484269 Oct 19 13:53 SchedLog # runuser -s /bin/bash -l condor -c "condor_submit /tmp/test.job" Submitting job(s). 1 job(s) submitted to cluster 1. # ls -AlF | grep Sched -rw-r--r-- 1 condor condor 2147485988 Oct 19 13:54 SchedLog # condor_rm -all All jobs marked for removal. # ls -AlF | grep Sched -rw-r--r-- 1 condor condor 2147486664 Oct 19 13:54 SchedLog On RHEL 6.1 i386 are logs rotated, but around bad size. If MAX_SCHEDD_LOG is higher then 2147483647, they are rotated around 2147483647. # rpm -q condor condor-7.6.4-0.8.el6.i686 # condor_config_val MAX_SCHEDD_LOG 4000000000 # ls -AlF | grep SchedLog -rw-r--r--. 1 condor condor 2147482624 Oct 19 14:04 SchedLog # service condor start Starting Condor daemons: [ OK ] # ls -AlF | grep SchedLog -rw-r--r--. 1 condor condor 695 Oct 19 14:04 SchedLog -rw-r--r--. 1 condor condor 2147483797 Oct 19 14:04 SchedLog.old # tail -2 SchedLog.old 10/19/11 14:04:44 (pid:2288) MaxLog = 2147483647, length = 2147483648 10/19/11 14:04:45 (pid:2288) Saving log file to "/var/log/condor/SchedLog.old" On x86_64 (RHEL 5.7 and 6.1) looks everything OK. >>>> ASSIGNED
Problem is still here on 32-bit machines (on RHEL 5.7 and 6.1 the same). When I set some of MAX_*_LOG to value higher than 2147483647, particular daemon exit with status 4. Tested with MAX_SCHEDD_LOG, MAX_STARTD_LOG, MAX_COLLECTOR_LOG and MAX_NEGOTIATOR_LOG. Output in MasterLog is similar to: 10/25/11 11:29:06 DaemonCore: No more children processes to reap. 10/25/11 11:29:06 The STARTD (pid 6703) exited with status 4 10/25/11 11:29:06 ProcAPI::buildFamily failed: parent 6703 not found on system. 10/25/11 11:29:06 Sending obituary for "/usr/sbin/condor_startd" 10/25/11 11:29:06 Forking Mailer process... 10/25/11 11:29:06 restarting /usr/sbin/condor_startd in 10 seconds 10/25/11 11:29:06 enter Daemons::UpdateCollector With MAX_MASTER_LOG set to value higher than 2147483647, attempt to start condor failed with this error: # service condor start Starting Condor daemons: ERROR "Invalid config: MAX_MASTER_LOG = 2147483648" at line 297 in file /builddir/build/BUILD/condor-7.6.3/src/condor_utils/dprintf_config.cpp [FAILED] # rpm -q condor condor-7.6.5-0.2.el5
(In reply to comment #12) > Problem is still here on 32-bit machines (on RHEL 5.7 and 6.1 the same). > When I set some of MAX_*_LOG to value higher than 2147483647, particular daemon > exit with status 4. That is correct behavior: systems with 32-bit off_t will raise an exception while trying to parse a value > 2147483647. normally an exception would be logged, however this exception is happening while the logging is being configured, and so no log output can happen.
Reproduced on RHEL 6.1 x86_64: # rpm -qa | grep condor condor-classads-7.6.3-0.3.el6.x86_64 condor-7.6.3-0.3.el6.x86_64 Configuration: MAX_SCHEDD_LOG = 2147484648 # rm -f SchedLog* # ls -AlF | grep SchedLog # service condor restart Starting Condor daemons: [ OK ] # ls -AlF | grep SchedLog -rw-r--r--. 1 condor condor 202 Oct 19 08:57 SchedLog -rw-r--r--. 1 condor condor 322 Oct 19 08:57 SchedLog.old # cat SchedLog.old 10/19/11 08:57:29 (pid:2415) Now in new log file /var/log/condor/SchedLog 10/19/11 08:57:29 (pid:2415) "/usr/sbin/condor_shadow.std -classad" did not produce any output, ignoring 10/19/11 08:57:35 (pid:2415) MaxLog = -2147482648, length = 179 10/19/11 08:57:35 (pid:2415) Saving log file to "/var/log/condor/SchedLog.old" Verification on RHEL 6.1 x86_64: # rpm -qa | grep condor condor-classads-7.6.5-0.2.el6.x86_64 condor-7.6.5-0.2.el6.x86_64 # condor_config_val MAX_SCHEDD_LOG 2147484648 # ls -AlF | grep SchedLog -rw-r--r--. 1 condor condor 2147482624 Oct 27 09:46 SchedLog # service condor start Starting Condor daemons: [ OK ] # ls -AlF | grep SchedLog -rw-r--r--. 1 condor condor 2147484259 Oct 27 09:47 SchedLog # service condor restart Stopping Condor daemons: [ OK ] Starting Condor daemons: [ OK ] # ls -AlF | grep SchedLog -rw-r--r--. 1 condor condor 1709 Oct 27 09:47 SchedLog -rw-r--r--. 1 condor condor 2147484820 Oct 27 09:47 SchedLog.old # tail -2 SchedLog.old 10/27/11 09:47:30 (pid:3084) MaxLog = 2147484648, length = 2147484671 10/27/11 09:47:30 (pid:3084) Saving log file to "/var/log/condor/SchedLog.old" Verified on RHEL 5.7 x86_64 and RHEL 6.1 x86_64 with MAX_SCHEDD_LOG set to: 2147484647 (2^31-1), 2147484648 (2^31), 1048576 (2^20) and 4294967296 (2^32). >>> VERIFIED
Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,11 +1 @@ -Cause: +The MAX_[subsys]_LOG values were read using the atoi() function instead of a 64-bit-capable lexical casting function, which rendered it impossible to set a maximum log size equal to, or greater than, 2GB. The lexical casting function lex_cast<T>() has been implemented to read MAX_[subsys]_LOG values, with the result that log files of size equal to, or greater than, 2GB, can now be configured.-The MAX_<subsys>_LOG configurations were read using atoi(), instead of a 64-bit capable lexical casting function. - -Consequence: -Maximum log sizes >= 2GB could not be configured. - -Fix: -A lexical casting function lex_cast<T>() was implemented and used to read in MAX_<subsys>_LOG values. - -Result: -Log file sizes of >= 2GB can now be configured, on systems that support 64-bit file sizes.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2012-0045.html