Red Hat Bugzilla – Bug 471648
Cpuscaling failures on Dunnington Processors
Last modified: 2014-03-25 20:55:50 EDT
Created attachment 323641 [details]
4 node x3950 m2 64 bit results.xml
Description of problem: cpuscaling test fails on Dunnington Processors.
Version-Release number of selected component (if applicable):
How reproducible: Everytime
Steps to Reproduce:
1. Install RHEL 5.2 on x3850M2/x3950M2 32 or 64 bit with Dunnington Processors
2. Install HTS
3. Add cpuscaling test hts plan --add --test cpuscaling --device cpu0
4. Verify cpuspeed is working #service cpuspeed status
5. run hts certify --test cpuscaling
See attached results files.
Created attachment 323642 [details]
1 node with 5.3 32 bit
The failure between 5.2 and 5.3 is slightly different.
One of the systems we are trying to test is a 96 core system.
Also I was trying to find the point of failure within the script, you are running cpufreq-selector, I am able to run this command but notice there is no return code even if you give this command bogus info. I don't think that this has any bearing on the current bug but that could present problems in the future.
I just tried the test on a 1 node with 5.3 snapshot 3 and cpuscaling test passes.
Created attachment 324341 [details]
5.3 snapshot 3 .123 PAE kernel
I don't think the scaling ability is a limitation of the 5.2 kernel.
I can run the following commands on my 4 node
#for x in `seq 0 95` ; do cpufreq-selector --cpu $x -f 2128000; done
#for x in `seq 0 95` ; do echo $x `cat cpu"$x"/cpufreq/cpuinfo_cur_freq`; done
and change the speed value in the first command (after -f) and run the second command without getting any errors and see the speeds change.
Please run the following at slowest settings, on-demand, and performance:
`time echo "scale=2^12;4*a(1)" | bc -l`
Please respond with the time of each run, and what the cpuinfo_cur_freq is set to.
This should show if the issue is the HTS or in the cpuscaling. This should be the manual way to test cpuscaling.
Engineering please update if you disagree with this update.
I am now experiencing a failure with rc-1 and 5.3-12 test suite.
Please confirm which kernel version you last used were it worked, or which kernel version you were testing with in snap 3. Then please confirm the uname -a of the current kernel were it is is failing.
Created attachment 328786 [details]
results of RC2 with 5.3-12
When I do #hts plan. The cpuscaling test is left off. In order to run the scaling test I have to manually add the test.
#hts plan --add --test cpuscaling --device 0
Do I only have to add it for cpu0 or do I have to add it for cpu0-X?
#lshal | grep throttle
yields false, meaning it doesn't see the ability to scale the cpu's as enabled.
Update from engineering:
The lshal output isn't directly a failure however in this case it may vary well be related. IMO this is going to be a kernel or BIOS issue that needs to get sorted. If it used to work in an earlier 5.3 beta then there's probably some code change in the kernel. FWIW, gcase had discovered a cpuscaling patch that was introduced in the -119 kernel, where in that situation it was the opposite, -119 on worked in his case and -118 and earlier fails (IIRC).
In the last result attachment the values that /sys sees as available for the
CPU speeds are...
...and the test does not appear to be complaining that the OS is failing to
report speed changes occurred however the actual time measured work is
completed in effectively the same length of time for all speed settings. Based
on the below values my guess is the system is stuck in it's highest speed
setting and is not throttling down...
On Demand: 16.07
...I suspect On Demand is showing a slower time than Minimum speed due to
trying to calculate what setting to move to, however what we should have seen
is that Minimum is the slowest, Maximum and Performance are all but equal and
On Demand is ever so slightly slower than Maximum/Performance.
Created attachment 329034 [details]
results run with 5.3-12 on 5.2
Created attachment 329054 [details]
1 Node 32 bit Cpuscaling
Hwcert Bugzilla = 470126
4 X 6 Core Dunnington Processors
Created attachment 329096 [details]
4 node 64 bit
64 bit 4 node
96 Core Scaling test Dunnington (4*4*6)
Comment 19 is relevant to hwcert BZ 471091
Created attachment 329118 [details]
5.3 rc 1 node
Created attachment 329146 [details]
4 node 64 bit 53RC2 64 bit
Adam, IIRC the cert where this bug originated has been completed right? If so, can this bug be closed or is there still some issue that needs sorting in the test suite?
Opps, forgot to say "Hi" and "Thanks" :)
v7-1.1 has also been released since this bug was opened. AFAIK this items is resolved.