Bug 428895

Summary: System freezes with p4-clockmod and ondemand governor
Product: [Fedora] Fedora Reporter: Peter Janakiev <malwkgad>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: low    
Version: 8CC: chris, christoph, ronny.fischer
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-03-10 21:45:30 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Peter Janakiev 2008-01-15 21:43:03 UTC
Description of problem:
When using ondemand CPU frequency scaling governor the system hangs after half
an hour of use or less.

Version-Release number of selected component (if applicable):
2.6.23.9-85.fc8 #1 SMP Fri Dec 7 15:49:59 EST 2007 i686 i686 i386 GNU/Linux

How reproducible:
Always

Steps to Reproduce:
1. Ugins Celeron M load p4-clockmod module and enable the ondemand governor
2. Do the usual stuff, like browsing, viewing photos and so on (no need to play
big videos)
3. The system shows unresposivness - first one app is not responding, but mouse
is active and you can switch to another already started app, then everything
freezes (mouse pointer including)
  
Actual results:
The system is completely unresponsive.


Expected results:
Normal CPU scaling 

Additional info:
cpu:
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 9
model name      : Intel(R) Celeron(R) M processor         1400MHz
stepping        : 5
cpu MHz         : 1400.000
cache size      : 512 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr mce cx8 mtrr pge mca cmov pat clflush
dts acpi mmx fxsr sse sse2 tm pbe up bts
bogomips        : 2800.78
clflush size    : 64

modules loaded:
Module                  Size  Used by
vfat                   13249  0 
fat                    45277  1 vfat
usb_storage            73601  0 
cpufreq_stats           8545  0 
cpufreq_ondemand       10317  0 
p4_clockmod             8517  0 
gspca                 663888  0 
videodev               28097  1 gspca
v4l2_common            18625  1 videodev
v4l1_compat            15941  1 videodev
autofs4                20421  2 
sunrpc                140765  1 
dm_mirror              21697  0 
dm_multipath           18249  0 
dm_mod                 46465  2 dm_mirror,dm_multipath
ipv6                  245989  20 
snd_intel8x0m          16845  0 
snd_intel8x0           30429  3 
snd_ac97_codec         92389  2 snd_intel8x0m,snd_intel8x0
ac97_bus                6081  1 snd_ac97_codec
snd_seq_dummy           6725  0 
snd_seq_oss            29889  0 
snd_seq_midi_event      9793  1 snd_seq_oss
snd_seq                44849  5 snd_seq_dummy,snd_seq_oss,snd_seq_midi_event
snd_seq_device         10061  3 snd_seq_dummy,snd_seq_oss,snd_seq
snd_pcm_oss            37569  0 
via_rhine              23753  0 
snd_mixer_oss          16705  1 snd_pcm_oss
firewire_ohci          18497  0 
firewire_core          36097  1 firewire_ohci
video                  19921  0 
output                  6977  1 video
mii                     8385  1 via_rhine
crc_itu_t               6081  1 firewire_core
snd_pcm                63685  5
snd_intel8x0m,snd_intel8x0,snd_ac97_codec,snd_pcm_oss
iTCO_wdt               13797  0 
ac                      8133  0 
iTCO_vendor_support     7109  1 iTCO_wdt
button                 10448  0 
snd_timer              20549  3 snd_seq,snd_pcm
snd                    43461  14
snd_intel8x0m,snd_intel8x0,snd_ac97_codec,snd_seq_oss,snd_seq,snd_seq_device,snd_pcm_oss,snd_mixer_oss,snd_pcm,snd_timer
i2c_i801               12113  0 
pcspkr                  6593  0 
soundcore               9633  1 snd
snd_page_alloc         11337  3 snd_intel8x0m,snd_intel8x0,snd_pcm
i2c_core               21825  1 i2c_i801
joydev                 11649  0 
sg                     31965  0 
sr_mod                 17509  0 
cdrom                  33889  1 sr_mod
ata_piix               16709  4 
ata_generic             8901  0 
libata                100145  2 ata_piix,ata_generic
sd_mod                 27329  5 
scsi_mod              119757  5 usb_storage,sg,sr_mod,libata,sd_mod
ext3                  110665  3 
jbd                    52457  1 ext3
mbcache                10177  1 ext3
uhci_hcd               23633  0 
ohci_hcd               21445  0 
ehci_hcd               31821  0 

I have noticed that the disk led stopped flashing when the first signs of the
hang are shown, I believe that is why the apps hang one ofther another, and not
all at the same time, eventually everything that is requiring I/O to disk fails. 
I have checked the disk  it is just fine, i can reproduce this on my work laptop
also (same celeron M, same module p4-clockmod, same behaviour, disk operations
stops, apps hang one after another and finaly everything freezes)

nothing is recorded in messages, just the starting of the kernel after the cold
reboot. 

Same happens if I use userspace governor and scale down t lets say 700Mhz the
1.4 G cpu. 

This was happening with the previous default kernel for F8 also. 

I am not sure how to collect more info on this, any advice will be appreciated.

Thanks

Comment 1 Dave Jones 2008-01-22 18:15:10 UTC
p4-clockmod doesn't actually save any power, and has a number of unexplained
problems on some systems.  I'm leaning towards just disabling it.


Comment 2 Chuck Ebbert 2008-01-22 19:05:59 UTC
(In reply to comment #1)
> p4-clockmod doesn't actually save any power, and has a number of unexplained
> problems on some systems.  I'm leaning towards just disabling it.
> 

I don't think we have a choice given the problems with it.


Comment 3 Chuck Ebbert 2008-01-23 00:37:21 UTC
Driver disabled in rawhide to see if anyone complains.

Comment 4 Christoph Trassl 2008-03-10 16:38:57 UTC
Objection!

First noticed the problem since updating F8 to 2.6.24.3-12.fc8 a few days ago.
Therefore this late complaint.

Asus EEE either has a fixed frequency setting of 630 MHz or (by loading
p4-clockmod) you get the full frequency range from 112 to 900 MHz and cpuspeed
support. 

I was using cpuspeed + p4-clockmod over a month now, and get not a single hang
or crash.

Best fix would be to reenable p4-clockmod but blacklisting it if possible.

This is my vote for reenabling p4-clockmod. Please!



Comment 5 Dave Jones 2008-03-10 16:57:53 UTC
having knobs to turn is pointless if they are either causing crashes on some
systems or having no effect on others.   The fact that the eee is appearing to
be scaling speed is a misnomer. All it's doing is making work take longer to
complete, which isn't saving you any power at all.


Comment 6 Christoph Trassl 2008-03-10 21:22:04 UTC
Perhaps my comment was not accurate enough.

I have no problem if the cpu cannot scale down below 630 MHz, as it is like Dave
stated - it does not save power - everything just appears sluggish. With
cpuspeed running it seemed to scale up to the full 900 Mhz. This was the reason
I objected.

Digging in deeper showed I was wrong. 

Wrote a simple calculating benchmark and ran it multiple times. 

Using 'time' showed:

  With p4-clockmod not loaded:
  * /proc/cpuinfo says cpu MHz = 630
  * benchmark run time on average = 1m1s

  With p4-clockmod loaded and cpuspeed off:
  * /proc/cpuinfo says cpu MHz = 900
  * benchmark run time on average = 1m1s

  With p4-clockmod loaded and cpuspeed on with MIN_SPEED/MAX_SPEED = 900000:
  * /proc/cpuinfo says cpu MHz = 900
  * benchmark run time on average = 1m1s

  With p4-clockmod loaded and cpuspeed on with MIN_SPEED/MAX_SPEED = 675000:
  * /proc/cpuinfo says cpu MHz = 675
  * benchmark run time on average = 1m23s

So enabling p4-clockmod just raises the displayed MHz, but the cpu is not
running any faster when displaying 900 instead of 630 MHz. The difference equals
the underclocked/normal FSB ratio: 70 MHz underclocked/100 MHz normal.

There is the possibility to overclock the fsb to 100 MHz instead of 70 MHz,
where having cpuspeed would be nice, but as it is not guaranteed not to harm the
EEE when running with 100 MHz FSB, not having p4-clockmod is not a big issue.

Therefore, I am sorry and remove my objection.


Comment 7 Dave Jones 2008-03-20 15:50:34 UTC
*** Bug 438326 has been marked as a duplicate of this bug. ***

Comment 8 Chris Bagwell 2008-05-11 04:24:10 UTC
I'd like to add a minor complaint after I've exhausted my other easy paths with
my desktop PC.

I've got an HP with somewhat broken ACPI DSDT.  When it reaches some heat
threshold, I get an ACPI error message logged.  Example:

Feb  3 05:28:59 hostname kernel: ACPI Error (psargs-0355): [\_TZ_.THRM]
Namespace lookup failure, AE_NOT_FOUND
 Feb  3 05:28:59 hostname kernel: ACPI Error (psparse-0537): Method
parse/execution failed [\_GPE._L1C] (Node f7d02450), AE_NOT_FOUND
 Feb  3 05:28:59 hostname kernel: ACPI Exception (evgpe-0576): AE_NOT_FOUND,
while evaluating GPE method [_L1C] [20070126]

Basically, the errors map to a bug (undeclared variable basically) in the DSDT
that only occurs when the CPU crosses a temp. threshold.

Since I'm not up to ACPI debugging (I tried), I've been experimenting with the
onboard fan controller and running it at annoyingly high volume levels to reduce
the amount of times this message is printed.  Even with these step I get around
900 reports a day.  Thats an improvement from 10,000's.

In the past (up until Fedora 9 rawhide), I simply enabled p4-clockmod and
regardless of comments in the source code (saying it doesn't save energy) it
seemed to keep the CPU cool enough that the messages were in the <100 range per day.

At least for me, the p4-clockmod seems to help reduce CPU heat during idle
systems (not busy systems of course) and has resulted in 0% system lockups.

I could use it back in standard distribution but will look in to compiling it as
a load module without recompiling the whole kernel package.


Comment 9 Ronny Fischer 2008-05-11 08:59:34 UTC
From my point of view the p4-clockmod is reducing power consumption. Maybe not 
much because of the high clock rates of the Pentium 4 structure itself but 
enough to let the CPU produce less heat at all. For sure the work takes longer 
if the speed is scaled down, but that is just normal.

I'd like to remind that the cpu frequency scaling is just about that to reduce 
speed when the PC is idle or has less work to do.

Furthermore it is pretty unusual to use the p4-clockmod with Eee PC when this 
is equipped with a Celeron CPU based upon the Intel Pentium M arch. Therefor 
the usage of the integrated acpi_cpufreq should be used.