Bug 748825

Summary: Load avg not detected on Linux 3.x kernel
Product: Red Hat Enterprise MRG Reporter: Matthew Farrellee <matt>
Component: condorAssignee: Matthew Farrellee <matt>
Status: CLOSED ERRATA QA Contact: Tomas Rusnak <trusnak>
Severity: high Docs Contact:
Priority: high    
Version: 2.0CC: jneedle, matt, trusnak, tstclair
Target Milestone: 2.1   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: condor-7.6.5-0.3 Doc Type: Bug Fix
Doc Text:
Load average detection is specific for each kernel major version and only supports kernels of version 1 or 2. Consequently, the load average would appear to be "-1" under a kernel 3.x. This update adds support for load average detection under kernels 3.x, implemented identically as for previous kernel versions, with the /proc/loadavg utility unchanged.
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-01-23 17:29:35 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 743350    

Description Matthew Farrellee 2011-10-25 13:15:13 UTC
https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2579

Two issues:

    load_avg.cpp has a switch statement on the major version of the kernel. On the 3.1.0 kernel, this causes an issue because case "3" isn't handled.
    The generated error message doesn't end with a newline, causing a pileup of output: 

10/25/11 02:22:16 /proc format unknown for kernel version 3.1.010/25/11 02:22:21 /proc format unknown for kernel version 3.1.010/25/11 02:22:26 /proc format unknown for kernel version 3.1.010/25/11 02:22:31 /proc format unknown for kernel version 3.1.010/25/11 02:22:36 /proc format unknown for kernel version 3.1.010/25/11 02:22:41 slot1: State change: received RELEASE_CLAIM command

-

# uname -a
Linux v3node 3.0.7-rt20.36.el6rt.x86_64 #1 SMP PREEMPT RT Wed Oct 19 12:38:05 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux

# condor_config_val NUM_CPUS
1

-

FAIL -

# condor_status -l | grep LoadAvg
TotalLoadAvg = -1.000000
CpuBusy = ( ( LoadAvg - CondorLoadAvg ) >= 0.500000 )
LoadAvg = -1.000000
TotalCondorLoadAvg = -1.000000
CondorLoadAvg = -1.000000

# condor_status -format "%d, " LoadAvg -format "%d, " CondorLoadAvg -format "%d, " TotalLoadAvg -format "%d\n" TotalCondorLoadAvg
-1, -1, -1, -1

-

After simple patch to add "case 3" -

SUCCESS (values not -1) -

# condor_status -l | grep LoadAvg
TotalLoadAvg = 0.290000
CpuBusy = ( ( LoadAvg - CondorLoadAvg ) >= 0.500000 )
LoadAvg = 0.290000
TotalCondorLoadAvg = 0.0
CondorLoadAvg = 0.0

# condor_status -format "%d, " LoadAvg -format "%d, " CondorLoadAvg -format "%d, " TotalLoadAvg -format "%d\n" TotalCondorLoadAvg
0, 0, 0, 0

Fix is upstream on V7_6-branch at 1204622c.

Comment 1 Matthew Farrellee 2011-10-25 13:18:37 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
C: Load average detection is kernel major version specific, only supporting v1 and v2
C: Load average would appear to be -1.
F: Added support for v3 (identical detection as v1 & v2, /proc/loadavg did not change).
R: Load average detection works on v3 kernels.

Comment 4 Jeff Needle 2011-10-31 21:23:04 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,4 +1 @@
-C: Load average detection is kernel major version specific, only supporting v1 and v2
+C: Load average detection is kernel major version specific, only supporting v1 and v2
C: Load average would appear to be -1.
F: Added support for v3 (identical detection as v1 & v2, /proc/loadavg did not change).
R: Load average detection works on v3 kernels.-C: Load average would appear to be -1.
-F: Added support for v3 (identical detection as v1 & v2, /proc/loadavg did not change).
-R: Load average detection works on v3 kernels.

Comment 6 Tomas Rusnak 2011-11-01 10:22:05 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1 +1,4 @@
-C: Load average detection is kernel major version specific, only supporting v1 and v2
C: Load average would appear to be -1.
F: Added support for v3 (identical detection as v1 & v2, /proc/loadavg did not change).
R: Load average detection works on v3 kernels.+C: Load average detection is kernel major version specific, only supporting v1 and v2
+C: Load average would appear to be -1.
+F: Added support for v3 (identical detection as v1 & v2, /proc/loadavg did not change).
+R: Load average detection works on v3 kernels.

Comment 7 Tomas Rusnak 2011-11-03 13:25:52 UTC
Reproduced with:

$CondorVersion: 7.6.3 Jul 27 2011 BuildID: RH-7.6.3-0.3.el6 $
$CondorPlatform: X86_64-RedHat_6.1 $

11/03/11 09:21:13 fgets failed
11/03/11 09:21:13 /proc format unknown for kernel version 3.0.811/03/11 09:21:13 CronJobList: Adding job 'mips'


# uname -a
Linux localhost 3.0.8-rt20.38.el6rt.x86_64 #1 SMP PREEMPT RT Thu Oct 27 17:41:50 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux

# condor_config_val NUM_CPUS
1

# condor_status -l | grep LoadAvg
TotalLoadAvg = -1.000000
CpuBusy = ( ( LoadAvg - CondorLoadAvg ) >= 0.500000 )
LoadAvg = -1.000000
TotalCondorLoadAvg = -1.000000
CondorLoadAvg = -1.000000

# condor_status -format "%d, " LoadAvg -format "%d, " CondorLoadAvg -format "%d, " TotalLoadAvg -format "%d\n" TotalCondorLoadAvg
-1, -1, -1, -1

Comment 8 Tomas Rusnak 2011-11-03 13:30:27 UTC
# condor -v
$CondorVersion: 7.6.5 Oct 31 2011 BuildID: RH-7.6.5-0.5.el6 $
$CondorPlatform: X86_64-RedHat_6.1 $

# uname -a
Linux localhost 3.0.8-rt20.38.el6rt.x86_64 #1 SMP PREEMPT RT Thu Oct 27 17:41:50 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux

# condor_config_val NUM_CPUS
1

# condor_status -l | grep LoadAvg
TotalLoadAvg = 0.230000
CpuBusy = ( ( LoadAvg - CondorLoadAvg ) >= 0.500000 )
LoadAvg = 0.230000
TotalCondorLoadAvg = 0.0
CondorLoadAvg = 0.0

# condor_status -format "%d, " LoadAvg -format "%d, " CondorLoadAvg -format "%d, " TotalLoadAvg -format "%d\n" TotalCondorLoadAvg
0, 0, 0, 0

No such /proc compatibility error found in StartLog:
11/03/11 09:28:26 CronJobList: Adding job 'mips'
11/03/11 09:28:26 CronJobList: Adding job 'kflops'
11/03/11 09:28:26 CronJob: Initializing job 'mips' (/usr/libexec/condor/condor_mips)
11/03/11 09:28:26 CronJob: Initializing job 'kflops' (/usr/libexec/condor/condor_kflops)
...

>>> VERIFIED

Comment 9 Tomas Capek 2011-11-16 13:58:44 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,4 +1,2 @@
-C: Load average detection is kernel major version specific, only supporting v1 and v2
+Load average detection is specific for each kernel major version and only supports kernels of version 1 or 2.
-C: Load average would appear to be -1.
+Consequently, the load average would appear to be "-1" under a kernel 3.x. This update adds support for load average detection under kernels 3.x, implemented identically as for previous kernel versions, with the /proc/loadavg utility unchanged.-F: Added support for v3 (identical detection as v1 & v2, /proc/loadavg did not change).
-R: Load average detection works on v3 kernels.

Comment 10 errata-xmlrpc 2012-01-23 17:29:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2012-0045.html