Bug 465366 - add multi-core support to cpufreq driver [NEEDINFO]
add multi-core support to cpufreq driver
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.7
All Linux
urgent Severity medium
: rc
: ---
Assigned To: Brian Maly
Martin Jenner
: ZStream
: 268441 (view as bug list)
Depends On:
Blocks: 451119 RHEL4u8_relnotes 469647
  Show dependency treegraph
 
Reported: 2008-10-02 17:39 EDT by Brian Maly
Modified: 2009-05-18 15:05 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-05-18 15:05:58 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
jarod: needinfo? (bmaly)


Attachments (Terms of Use)
/proc/cpuinfo (23.99 KB, text/plain)
2009-04-26 03:34 EDT, CAI Qian
no flags Details
/var/log/dmesg (34.90 KB, text/plain)
2009-04-26 03:35 EDT, CAI Qian
no flags Details

  None (edit)
Description Brian Maly 2008-10-02 17:39:09 EDT
Description of problem: cpufreq driver needs multi-core support


Version-Release number of selected component (if applicable): <= 4.7


How reproducible: run cpuspeed (userspace daemon), the cpu's do not scale properly


Steps to Reproduce:
1. select 'userspace' governor
2. start cpuspeed daemon
3. load machine, remove load, observe cpu freqs
  
Actual results: cpu freq gets stuck at a fixed freq for some processors with no load


Expected results: cpu freq should scale up and down for all processors


Additional info:

affected hardware: IBM X3n50

this BZ provides the needed support for affected_cpus (/sys/devices/system/cpu/cpu*/cpufreq/affected_cpus) which the cpuspeed daemon needs to scale all cores properly.

The userspace portion of this is Bug 451119
Comment 4 RHEL Product and Program Management 2008-10-30 17:06:42 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 6 Vivek Goyal 2008-11-05 08:58:06 EST
Committed in 78.17.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
Comment 11 Linda Wang 2009-04-07 14:13:48 EDT
*** Bug 268441 has been marked as a duplicate of this bug. ***
Comment 12 Han Pingtian 2009-04-24 04:50:22 EDT
Whey trying to verify this bug, we found a problem :
https://bugzilla.redhat.com/show_bug.cgi?id=497490
Comment 13 CAI Qian 2009-04-26 03:29:47 EDT
Apart from the above problem which looks like in the userspace tool, there looks like another problem in affected_cpus patch.

An AMD machine has 4 cores, but the affected_cpus file has only one.

# cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 16
model           : 2
model name      : Quad-Core AMD Opteron(tm) Processor 8356
stepping        : 3
cpu MHz         : 1400.000
cache size      : 512 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 4
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a misalignsse 3dnowprefetch osvw
bogomips        : 4598.26
TLB size        : 1104 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate <NULL>   <-- strange NULL here.
...

# cat /sys/devices/system/cpu/cpu*/cpufreq/affected_cpus 
0
10
11
12
13
14
15
16
17
18
19
1
20
21
22
23
24
25
26
27
28
29
2
30
31
3
4
5
6
7
8
9

# lsmod
...
powernow_k8            22753  1
...

# uname -ra
Linux sun-x4600m2-01.rhts.bos.redhat.com 2.6.9-88.ELlargesmp #1 SMP Mon Apr 13 19:33:40 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux

I'll put this to ASSIGNED to have developers to look at the above 2 issues.
Comment 14 CAI Qian 2009-04-26 03:34:25 EDT
Created attachment 341336 [details]
/proc/cpuinfo
Comment 15 CAI Qian 2009-04-26 03:35:05 EDT
Created attachment 341337 [details]
/var/log/dmesg
Comment 16 CAI Qian 2009-04-26 03:43:22 EDT
sun-x4600m2-01.rhts.bos.redhat.com was working before until it has upgraded from Dual-Core to Quad-Core CPUs. More details about how it was tested before, please see,
https://bugzilla.redhat.com/show_bug.cgi?id=469647#c7
Comment 17 CAI Qian 2009-04-27 11:00:45 EDT
It looks like affected_cpus never work on AMD Quad-Core systems that I have tested another two machines without luck.

amd-ma78gm-01.rhts.bos.redhat.com
model name      : AMD Phenom(tm) 9750 Quad-Core Processor

hp-xw9400-01.rhts.bos.redhat.com
model name      : Quad-Core AMD Opteron(tm) Processor 2382
Comment 18 CAI Qian 2009-04-27 11:05:27 EDT
Intel Quad-Core system is working without problem.

# uname -ra
Linux ibm-defiant.rhts.bos.redhat.com 2.6.9-89.ELsmp #1 SMP Mon Apr 20 10:33:05
EDT 2009 x86_64 x86_64 x86_64 GNU/Linux

#  cat /sys/devices/system/cpu/*/cpufreq/affected_cpus
0 1 2 3
0 1 2 3
0 1 2 3
0 1 2 3

# cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model  : 15
model name : Intel(R) Xeon(R) CPU           X5355  @ 2.66GHz
stepping : 7
cpu MHz  : 2331.000
cache size : 4096 KB
physical id : 0
siblings : 4
core id  : 0
cpu cores : 4
fpu  : yes
fpu_exception : yes
cpuid level : 10
wp  : yes
flags  : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm pni monitor
ds_cpl est tm2 cx16 xtpr lahf_lm
bogomips : 5323.70
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:
...
Comment 19 Jarod Wilson 2009-04-27 11:15:49 EDT
(In reply to comment #17)
> It looks like affected_cpus never work on AMD Quad-Core systems that I have
> tested another two machines without luck.
> 
> amd-ma78gm-01.rhts.bos.redhat.com
> model name      : AMD Phenom(tm) 9750 Quad-Core Processor
> 
> hp-xw9400-01.rhts.bos.redhat.com
> model name      : Quad-Core AMD Opteron(tm) Processor 2382  

Hrm... Just looked at a quad-core amd box here which is running Fedora 10 (2.6.27.21-based kernel). affected_cpus for each core doesn't list anything but itself there either...
Comment 20 CAI Qian 2009-04-27 20:47:21 EDT
Thanks Jarod! I think in this situation, I open new bugs for this issue, and move this back to ON_QA, because at least the backport work seems correct.

Bug 497938 - [RHEL4] Affected_cpus Not Working for AMD Quad-Core Systems
Bug 497939 - [RHEL5] Affected_cpus Not Working for AMD Quad-Core Systems
Comment 21 Han Pingtian 2009-04-28 07:32:53 EDT
On ibm-defiant.rhts.bos.redhat.com, there is a quad-core Intel CPU:
...
processor       : 3
vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Xeon(R) CPU           X5355  @ 2.66GHz
stepping        : 7
cpu MHz         : 1998.000
cache size      : 4096 KB
physical id     : 0
siblings        : 4
core id         : 3
cpu cores       : 4
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm
constant_tsc pni monitor ds_cpl est tm2 xtpr
bogomips        : 5320.14

[root@ibm-defiant ~]# tail /sys/devices/system/cpu/cpu0/cpufreq/*
==> /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq <==
1998000

==> /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq <==
2664000

==> /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq <==
1998000

==> /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies <==
2664000 2331000 2331000 1998000

==> /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors <==
powersave userspace performance

==> /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq <==
1998000

==> /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver <==
centrino

==> /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor <==
userspace

==> /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq <==
2664000

==> /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq <==
1998000

==> /sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed <==
1998000

With old kernel 2.6.9-78.ELsmp and kernel-utils-2.4-14.1.117, the reproduce
procedure is the following,

First, all CPUs are idle:
[root@ibm-defiant ~]# mpstat -P ALL
Linux 2.6.9-78.ELsmp (ibm-defiant.rhts.bos.redhat.com)  04/28/2009

06:26:14 AM  CPU   %user   %nice %system %iowait    %irq   %soft   %idle intr/s
06:26:14 AM  all    0.11    0.38    0.24    0.25    0.00    0.00   99.01 1020.51
06:26:14 AM    0    0.23    0.22    0.27    0.16    0.00    0.00   99.13 256.53
06:26:14 AM    1    0.10    0.34    0.23    0.17    0.00    0.00   99.16 258.52
06:26:14 AM    2    0.05    0.53    0.23    0.36    0.00    0.00   98.82 253.04
06:26:14 AM    3    0.06    0.44    0.22    0.31    0.00    0.00   98.96 252.42

and the frequencies are the minimum:
[root@ibm-defiant ~]# cat /sys/devices/system/cpu/*/cpufreq/scaling_cur_freq
1998000
1998000
1998000
1998000

Then let's add some load on them by running two

perl -e '$a = 1_000_000_000_000; while ($a--) {};'

on the machine:

[root@ibm-defiant ~]# mpstat -P ALL 1
Linux 2.6.9-78.ELsmp (ibm-defiant.rhts.bos.redhat.com)  04/28/2009

07:00:24 AM  CPU   %user   %nice %system %iowait    %irq   %soft   %idle intr/s
07:00:25 AM  all   50.00    0.00    0.00    0.00    0.00    0.00   50.00 1011.00
07:00:25 AM    0    0.00    0.00    0.00    0.00    0.00    0.00  100.00 9.00
07:00:25 AM    1  100.00    0.00    0.00    0.00    0.00    0.00    0.00 0.00
07:00:25 AM    2  100.00    0.00    0.00    0.00    0.00    0.00    0.00 0.00
07:00:25 AM    3    0.00    0.00    0.00    0.00    0.00    0.00  100.00 1001.00

[root@ibm-defiant ~]# cat /sys/devices/system/cpu/*/cpufreq/scaling_cur_freq
2664000
2664000
2664000
2664000

All CPUs is running at the maximum.

After killing a 'perl':

[root@ibm-defiant ~]# mpstat -P ALL 1
Linux 2.6.9-78.ELsmp (ibm-defiant.rhts.bos.redhat.com)  04/28/2009

07:03:14 AM  CPU   %user   %nice %system %iowait    %irq   %soft   %idle intr/s
07:03:15 AM  all   25.38    0.00    0.00    0.00    0.00    0.00   74.62 1074.23
07:03:15 AM    0    0.00    0.00    0.00    0.00    0.00    0.00  100.00 1055.67
07:03:15 AM    1    0.00    0.00    0.00    0.00    0.00    0.00   98.97 18.56
07:03:15 AM    2  104.12    0.00    0.00    0.00    0.00    0.00    0.00 0.00
07:03:15 AM    3    0.00    0.00    0.00    0.00    0.00    0.00  103.09 0.00

[root@ibm-defiant ~]# cat /sys/devices/system/cpu/*/cpufreq/scaling_cur_freq
2331000
2331000
2331000
2331000

All CPUs are running at a middle speed.

With 2.6.9-89.ELsmp and kernel-utils-2.4-17.el4:

CPUs run at the minimum speed when are idle:
[root@ibm-defiant ~]# cat /sys/devices/system/cpu/*/cpufreq/scaling_cur_freq
1998000
1998000
1998000
1998000

[root@ibm-defiant ~]# mpstat -P ALL 1
Linux 2.6.9-89.ELsmp (ibm-defiant.rhts.bos.redhat.com)  04/28/2009

07:17:54 AM  CPU   %user   %nice %system %iowait    %irq   %soft   %idle intr/s
07:17:55 AM  all    0.00    0.00    0.00    0.00    0.00    0.00  100.00 1019.80
07:17:55 AM    0    0.00    0.00    0.00    0.00    0.00    0.00  100.00 1000.99
07:17:55 AM    1    0.00    0.00    0.00    0.00    0.00    0.00   99.01 17.82
07:17:55 AM    2    0.00    0.00    0.00    0.00    0.00    0.00  100.00 0.00
07:17:55 AM    3    0.00    0.00    0.00    0.00    0.00    0.00   99.01 0.00

Add some load by running two 'perl':

[root@ibm-defiant ~]# cat /sys/devices/system/cpu/*/cpufreq/scaling_cur_freq
2664000
2664000
2664000
2664000

[root@ibm-defiant ~]# mpstat -P ALL 1
Linux 2.6.9-89.ELsmp (ibm-defiant.rhts.bos.redhat.com)  04/28/2009

07:19:09 AM  CPU   %user   %nice %system %iowait    %irq   %soft   %idle intr/s
07:19:10 AM  all   50.00    0.00    0.00    0.00    0.00    0.00   50.00 1000.00
07:19:10 AM    0  100.00    0.00    0.00    0.00    0.00    0.00    0.00 6.93
07:19:10 AM    1    0.00    0.00    0.00    0.00    0.00    0.00   99.01 0.00
07:19:10 AM    2   99.01    0.00    0.00    0.00    0.00    0.00    0.00 0.00
07:19:10 AM    3    0.00    0.00    0.00    0.00    0.00    0.00   99.01 992.08

All run at the maximum speed.

Kill one 'perl':

[root@ibm-defiant ~]# cat /sys/devices/system/cpu/*/cpufreq/scaling_cur_freq
2664000
2664000
2664000
2664000
[root@ibm-defiant ~]# mpstat -P ALL 1
Linux 2.6.9-89.ELsmp (ibm-defiant.rhts.bos.redhat.com)  04/28/2009

07:22:20 AM  CPU   %user   %nice %system %iowait    %irq   %soft   %idle intr/s
07:22:21 AM  all   25.25    0.00    0.00    0.00    0.00    0.00   74.75 1051.02
07:22:21 AM    0    0.00    0.00    0.00    0.00    0.00    0.00  100.00 10.20
07:22:21 AM    1    0.00    0.00    0.00    0.00    0.00    0.00  100.00 18.37
07:22:21 AM    2  102.04    0.00    0.00    0.00    0.00    0.00    0.00 416.33
07:22:21 AM    3    0.00    0.00    0.00    0.00    0.00    0.00  102.04 605.10

Still running at the maximum speed. And the contents of 'affected_cpus' are
correct:
[root@ibm-defiant ~]# cat /sys/devices/system/cpu/*/cpufreq/affected_cpus
0 1 2 3
0 1 2 3
0 1 2 3
0 1 2 3

Kill all 'perl':
[root@ibm-defiant ~]# cat /sys/devices/system/cpu/*/cpufreq/scaling_cur_freq
1998000
1998000
1998000
1998000
[root@ibm-defiant ~]# mpstat -P ALL 1
Linux 2.6.9-89.ELsmp (ibm-defiant.rhts.bos.redhat.com)  04/28/2009

07:24:28 AM  CPU   %user   %nice %system %iowait    %irq   %soft   %idle intr/s
07:24:29 AM  all    0.00    0.00    0.00    0.00    0.00    0.00  100.00 999.01
07:24:29 AM    0    0.00    0.00    0.00    0.00    0.00    0.00  100.00 6.93
07:24:29 AM    1    0.00    0.00    0.00    0.00    0.00    0.00  100.00 0.00
07:24:29 AM    2    0.00    0.00    0.00    0.00    0.00    0.00   99.01 0.00
07:24:29 AM    3    0.00    0.00    0.00    0.00    0.00    0.00   99.01 991.09

All speed drop to the minimum.
Comment 22 Jan Tluka 2009-04-28 13:11:29 EDT
Patch is in -89.EL kernel.
Comment 23 Han Pingtian 2009-04-28 23:07:21 EDT
Will change the status to VERIFIED.
Comment 25 errata-xmlrpc 2009-05-18 15:05:58 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1024.html

Note You need to log in before you can comment on or make changes to this bug.