Bug 471648 - Cpuscaling failures on Dunnington Processors
Cpuscaling failures on Dunnington Processors
Product: Red Hat Hardware Certification Program
Classification: Red Hat
Component: Test Suite (tests) (Show other bugs)
All Linux
urgent Severity high
: ---
: ---
Assigned To: Greg Nichols
Lawrence Lim
Depends On:
  Show dependency treegraph
Reported: 2008-11-14 15:24 EST by Adam Sheltz
Modified: 2014-03-25 20:55 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2010-05-03 12:15:40 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
4 node x3950 m2 64 bit results.xml (327.42 KB, text/xml)
2008-11-14 15:24 EST, Adam Sheltz
no flags Details
1 node with 5.3 32 bit (3.06 KB, text/xml)
2008-11-14 15:26 EST, Adam Sheltz
no flags Details
5.3 snapshot 3 .123 PAE kernel (3.00 KB, text/xml)
2008-11-21 15:10 EST, Adam Sheltz
no flags Details
results of RC2 with 5.3-12 (737.78 KB, text/xml)
2009-01-12 15:56 EST, Adam Sheltz
no flags Details
results run with 5.3-12 on 5.2 (400.04 KB, application/x-rpm)
2009-01-14 13:25 EST, Adam Sheltz
no flags Details
1 Node 32 bit Cpuscaling (436.30 KB, application/x-rpm)
2009-01-14 18:43 EST, Adam Sheltz
no flags Details
4 node 64 bit (643.68 KB, application/x-rpm)
2009-01-15 09:22 EST, Adam Sheltz
no flags Details
5.3 rc 1 node (412.20 KB, application/x-rpm)
2009-01-15 13:14 EST, Adam Sheltz
no flags Details
4 node 64 bit 53RC2 64 bit (596.27 KB, application/x-rpm)
2009-01-15 16:41 EST, Adam Sheltz
no flags Details

  None (edit)
Description Adam Sheltz 2008-11-14 15:24:18 EST
Created attachment 323641 [details]
4 node x3950 m2 64 bit results.xml 

Description of problem: cpuscaling test fails on Dunnington Processors.

Version-Release number of selected component (if applicable):
HTS 5.2-20

How reproducible: Everytime

Steps to Reproduce:
1. Install RHEL 5.2 on x3850M2/x3950M2 32 or 64 bit with Dunnington Processors 
2. Install HTS
3. Add cpuscaling test  hts plan --add --test cpuscaling --device cpu0
4. Verify cpuspeed is working #service cpuspeed status
5. run hts certify --test cpuscaling

Actual results:
See attached results files.

Expected results:

Additional info:
Comment 1 Adam Sheltz 2008-11-14 15:26:19 EST
Created attachment 323642 [details]
1 node with 5.3 32 bit

The failure between 5.2 and 5.3 is slightly different.
Comment 5 Adam Sheltz 2008-11-19 09:42:26 EST
One of the systems we are trying to test is a 96 core system.  

Also I was trying to find the point of failure within the script,  you are running cpufreq-selector,  I am able to run this command but notice there is no return code even if you give this command bogus info.  I don't think that this has any bearing on the current bug but that could present problems in the future.

Comment 7 Adam Sheltz 2008-11-19 17:54:03 EST
I just tried the test on a 1 node with 5.3 snapshot 3 and cpuscaling test passes.

Comment 8 Adam Sheltz 2008-11-21 15:10:25 EST
Created attachment 324341 [details]
5.3 snapshot 3 .123 PAE kernel
Comment 9 Adam Sheltz 2008-11-21 15:49:55 EST
I don't think the scaling ability is a limitation of the 5.2 kernel.  

I can run the following commands on my 4 node 

#for x in `seq 0 95` ; do cpufreq-selector --cpu $x -f 2128000; done

#for x in `seq 0 95` ; do echo $x `cat cpu"$x"/cpufreq/cpuinfo_cur_freq`; done

and change the speed value in the first command (after -f) and run the second command without getting any errors and see the speeds change. 

Comment 11 Joseph Kachuck 2008-12-12 18:48:05 EST
Please run the following at slowest settings, on-demand, and performance:
`time echo "scale=2^12;4*a(1)" | bc -l`
Please respond with the time of each run, and what the cpuinfo_cur_freq is set to.

This should show if the issue is the HTS or in the cpuscaling. This should be the manual way to test cpuscaling.

Engineering please update if you disagree with this update.
Comment 12 Adam Sheltz 2009-01-12 10:06:48 EST
I am now experiencing a failure with rc-1 and 5.3-12 test suite.
Comment 13 Joseph Kachuck 2009-01-12 13:44:53 EST
Hello IBM,
Please confirm which kernel version you last used were it worked, or which kernel version you were testing with in snap 3. Then please confirm the uname -a of the current kernel were it is is failing.
Comment 14 Adam Sheltz 2009-01-12 15:56:44 EST
Created attachment 328786 [details]
results of RC2 with 5.3-12

When I do  #hts plan.  The cpuscaling test is left off.  In order to run the scaling test I have to manually add the test.

#hts plan --add --test cpuscaling --device 0

Do I only have to add it for cpu0 or do I have to add it for cpu0-X?

#lshal | grep throttle   
     yields false, meaning it doesn't see the ability to scale the cpu's as enabled.
Comment 16 Joseph Kachuck 2009-01-14 11:16:04 EST
Hello IBM,
Update from engineering:
The lshal output isn't directly a failure however in this case it may vary well be related.  IMO this is going to be a kernel or BIOS issue that needs to get sorted.  If it used to work in an earlier 5.3 beta then there's probably some code change in the kernel.  FWIW, gcase had discovered a cpuscaling patch that was introduced in the -119 kernel, where in that situation it was the opposite, -119 on worked in his case and -118 and earlier fails (IIRC).

In the last result attachment the values that /sys sees as available for the
CPU speeds are...

    2660 MHz
    2394 MHz
    2128 MHz

...and the test does not appear to be complaining that the OS is failing to
report speed changes occurred however the actual time measured work is
completed in effectively the same length of time for all speed settings.  Based
on the below values my guess is the system is stuck in it's highest speed
setting and is not throttling down...

    Minimum:     16.02
    Maximum:     16.00
    On Demand:   16.07
    Performance: 16.00

...I suspect On Demand is showing a slower time than Minimum speed due to
trying to calculate what setting to move to, however what we should have seen
is that Minimum is the slowest, Maximum and Performance are all but equal and
On Demand is ever so slightly slower than Maximum/Performance.
Comment 17 Adam Sheltz 2009-01-14 13:25:48 EST
Created attachment 329034 [details]
results run with 5.3-12 on 5.2
Comment 18 Adam Sheltz 2009-01-14 18:43:48 EST
Created attachment 329054 [details]
1 Node 32 bit Cpuscaling

5.2 kernel
hts 5.3-12
Hwcert Bugzilla = 470126
24 Core
4 X 6 Core Dunnington Processors
Comment 19 Adam Sheltz 2009-01-15 09:22:16 EST
Created attachment 329096 [details]
4 node 64 bit

64 bit 4 node 

5.2 kernel
hts 5.3.12

96 Core Scaling test Dunnington (4*4*6)
Comment 20 Adam Sheltz 2009-01-15 09:23:18 EST
Comment 19 is relevant to hwcert BZ 471091

Comment 21 Adam Sheltz 2009-01-15 13:14:22 EST
Created attachment 329118 [details]
5.3 rc 1 node
Comment 22 Adam Sheltz 2009-01-15 16:41:52 EST
Created attachment 329146 [details]
4 node 64 bit 53RC2 64 bit
Comment 24 Rob Landry 2009-07-16 12:22:02 EDT
Adam, IIRC the cert where this bug originated has been completed right?  If so, can this bug be closed or is there still some issue that needs sorting in the test suite?
Comment 25 Rob Landry 2009-07-16 12:26:12 EDT
Opps, forgot to say "Hi" and "Thanks" :)
Comment 27 Rob Landry 2010-05-03 12:15:40 EDT
v7-1.1 has also been released since this bug was opened.  AFAIK this items is resolved.

Note You need to log in before you can comment on or make changes to this bug.