Bug 164247 - powernow-k8 oops in query_current_values_with_pending_wait
powernow-k8 oops in query_current_values_with_pending_wait
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
4
x86_64 Linux
medium Severity high
: ---
: ---
Assigned To: Dave Jones
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-07-26 04:29 EDT by Andrew Stubbs
Modified: 2015-01-04 17:21 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-09-05 01:34:38 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Patch to provide data structures for each core in powernow-k8 (837 bytes, patch)
2005-07-28 15:58 EDT, Mark Langsdorf
no flags Details | Diff

  None (edit)
Description Andrew Stubbs 2005-07-26 04:29:22 EDT
The kernel throws an Oops after a while runining latest SMP kernels on a Dual
Core Athlon 64. If I switch to non SMP it seems stable.

Dmesg
=====
Jul 23 05:25:54 server02 kernel: Unable to handle kernel NULL pointer
dereference at 0000000000000024 RIP:
Jul 23 05:25:54 server02 kernel:
<ffffffff8011dae1>{query_current_values_with_pending_wait+65}
Jul 23 05:25:54 server02 kernel: PGD 0
Jul 23 05:25:54 server02 kernel: Oops: 0002 [1] SMP
Jul 23 05:25:54 server02 kernel: CPU 1
Jul 23 05:25:54 server02 kernel: Modules linked in: vmnet(U) vmmon(U) nls_utf8
loop parport_pc lp parport autofs4 it87 eeprom i2c_sensor i2c_isa sunrpc
ipt_REJECT iptable_filter iptable_nat ipt_LOG ipt_state ip_conntrack
iptable_mangle ip_tables dm_mod video button battery ac md5 ipv6 ohci_hcd
ehci_hcd i2c_nforce2 i2c_core shpchp snd_intel8x0 snd_ac97_codec snd_seq_dummy
snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss
snd_pcm snd_timer snd soundcore snd_page_alloc forcedeth sk98lin ext3 jbd
sata_nv sbp2 ohci1394 ieee1394 sata_via sata_sil libata sd_mod scsi_mod
Jul 23 05:25:54 server02 kernel: Pid: 6, comm: events/0 Tainted: P     
2.6.12-1.1398_FC4smp
Jul 23 05:25:54 server02 kernel: RIP: 0010:[<ffffffff8011dae1>]
<ffffffff8011dae1>{query_current_values_with_pending_wait+65}
Jul 23 05:25:54 server02 kernel: RSP: 0000:ffff81007fdc7dc8  EFLAGS: 00010206
Jul 23 05:25:54 server02 kernel: RAX: 000000000000000e RBX: 0000000000000000
RCX: 00000000c0010042
Jul 23 05:25:54 server02 kernel: RDX: 000000000000000a RSI: 0000000000000001
RDI: 0000000000000000
Jul 23 05:25:54 server02 kernel: RBP: 0000000000000000 R08: ffff81007fdc6000
R09: 0000000000000002
Jul 23 05:25:54 server02 kernel: R10: 0000000000000000 R11: 0000000000000246
R12: 0000000000000001
Jul 23 05:25:54 server02 kernel: R13: 0000000000000000 R14: 0000000000000292
R15: ffffffff80112950
Jul 23 05:25:54 server02 kernel: FS:  00002aaaaadfd6e0(0000)
GS:ffffffff8050d800(0000) knlGS:00000000ee1b3bb0
Jul 23 05:25:54 server02 kernel: CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Jul 23 05:25:54 server02 kernel: CR2: 0000000000000024 CR3: 0000000078f17000
CR4: 00000000000006e0
Jul 23 05:25:54 server02 kernel: Process events/0 (pid: 6, threadinfo
ffff81007fdc6000, task ffff81007fd86800)
Jul 23 05:25:54 server02 kernel: Stack: 0000000000000000 ffffffff8011e0b1
0000000000000001 ffff81007ff19e00
Jul 23 05:25:54 server02 kernel:        ffff81007ff19e30 ffffffff802e68a3
0000000000000000 0000000000000003
Jul 23 05:25:54 server02 kernel:        0000000000000001 0000000000000020
Jul 23 05:25:54 server02 kernel: Call
Trace:<ffffffff8011e0b1>{powernowk8_get+129} <ffffffff802e68a3>{cpufreq_get+115}
Jul 23 05:25:54 server02 kernel:       
<ffffffff8011298a>{handle_cpufreq_delayed_get+58}
<ffffffff8014b9ec>{worker_thread+476}
Jul 23 05:25:54 server02 kernel:       
<ffffffff80134720>{default_wake_function+0} <ffffffff801326b3>{__wake_up_common+67}
Jul 23 05:25:54 server02 kernel:        <ffffffff8014b810>{worker_thread+0}
<ffffffff80150489>{kthread+217}
Jul 23 05:25:54 server02 kernel:        <ffffffff80135ca0>{schedule_tail+64}
<ffffffff8010f76b>{child_rip+8}
Jul 23 05:25:54 server02 kernel:       
<ffffffff801f3100>{selinux_d_instantiate+0} <ffffffff801503b0>{kthread+0}
Jul 23 05:25:54 server02 kernel:        <ffffffff8010f763>{child_rip+0}
Jul 23 05:25:54 server02 kernel:
Jul 23 05:25:54 server02 kernel: Code: 89 47 24 89 57 20 31 c0 48 83 c4 08 c3 66
90 48 83 ec 28 f7
Jul 23 05:25:54 server02 kernel: RIP
<ffffffff8011dae1>{query_current_values_with_pending_wait+65} RSP <ffff81007fdc7dc8>
Jul 23 05:25:54 server02 kernel: CR2: 0000000000000024
Jul 23 05:25:54 server02 kernel:  <3>Debug: sleeping function called from
invalid context at include/linux/rwsem.h:43
Jul 23 05:25:54 server02 kernel: in_atomic():0, irqs_disabled():1
Jul 23 05:25:54 server02 kernel:
Jul 23 05:25:54 server02 kernel: Call
Trace:<ffffffff8013abd5>{profile_task_exit+21} <ffffffff8013bff2>{do_exit+34}
Jul 23 05:25:54 server02 kernel:       
<ffffffff80265f79>{do_unblank_screen+137} <ffffffff80124286>{do_page_fault+1926}
Jul 23 05:25:54 server02 kernel:        <ffffffff8035ac32>{thread_return+0}
<ffffffff8035ac84>{thread_return+82}
Jul 23 05:25:54 server02 kernel:        <ffffffff8013434d>{activate_task+141}
<ffffffff80112950>{handle_cpufreq_delayed_get+0}
Jul 23 05:25:54 server02 kernel:        <ffffffff8010f5b5>{error_exit+0}
<ffffffff80112950>{handle_cpufreq_delayed_get+0}
Jul 23 05:25:54 server02 kernel:       
<ffffffff8011dae1>{query_current_values_with_pending_wait+65}
Jul 23 05:25:54 server02 kernel:        <ffffffff8011e0b1>{powernowk8_get+129}
<ffffffff802e68a3>{cpufreq_get+115}
Jul 23 05:25:54 server02 kernel:       
<ffffffff8011298a>{handle_cpufreq_delayed_get+58}
<ffffffff8014b9ec>{worker_thread+476}
Jul 23 05:25:54 server02 kernel:       
<ffffffff80134720>{default_wake_function+0} <ffffffff801326b3>{__wake_up_common+67}
Jul 23 05:25:54 server02 kernel:        <ffffffff8014b810>{worker_thread+0}
<ffffffff80150489>{kthread+217}
Jul 23 05:25:54 server02 kernel:        <ffffffff80135ca0>{schedule_tail+64}
<ffffffff8010f76b>{child_rip+8}
Jul 23 05:25:54 server02 kernel:       
<ffffffff801f3100>{selinux_d_instantiate+0} <ffffffff801503b0>{kthread+0}
Jul 23 05:25:54 server02 kernel:        <ffffffff8010f763>{child_rip+0}
Comment 1 Mark Langsdorf 2005-07-28 15:58:07 EDT
Created attachment 117240 [details]
Patch to provide data structures for each core in powernow-k8

This is a known problem that was debugged by 6/14.

powernow-k8 requires that a data structure for each core be created in the
_cpu_init function call.  The cpufreq infrastructure doesn't call _cpu_init for
the second core in each processor.  Some systems crashed when _get was called
with an odd-numbered core because it tried to dereference a NULL pointer since
the data structure had not been created.

The attached patch solves the problem by initializing data structures for all
shared cores in the _cpu_init function.  It should apply to 2.6.12-rc6 and has
been tested by AMD and Sun.
Comment 2 Gene Czarcinski 2005-07-29 18:02:12 EDT
I am not sure that my problem is the same as this one but I will be trying the
patch.

Although it takes some time, I can reproduce my problem more or less when I want.
Comment 3 Gene Czarcinski 2005-07-30 12:38:15 EDT
OK, I applied the patch to the 2.6.12-1.1398_FC4 kernel and rebuilt the up and
smp rpms.  Running this updated kernel I have (for the first time) been able to
copy about 70GB from an NFS filesystem to a local filesystem on a Athlon 64 X2
4400+ running the smp kernel.

Previously, I could only do this with the up kernel or running a i386 installation.

This patch is definitely needed for any x86_64 smp kernels.
Comment 4 Gene Czarcinski 2005-07-31 06:57:43 EDT
After searching around kernel mailinglist, I have found that this bug was
identified around mid June.

The current 2.6.13-rc4-mm1.bz2 patches includes the fix to this problem. 
However, it also includes a whole bunch of other fixes dealing with
multi-processors in the powernow-k8 code.
Comment 5 Gene Czarcinski 2005-07-31 07:13:39 EDT
Better still, I found that patch-2.6.13-rc4-git3.bz2 contains the needed patch
to powernow-k8.  The current kernel in development has patch-2.6.13-rc3-git9.bz2
so (hopefully) a kernel with the needed patch will appear soon.
Comment 6 Dave Jones 2005-08-04 02:14:19 EDT
fixed in kernel-2.6.12-1.1411_FC4 which just got pushed to updates-testing
Comment 7 Gene Czarcinski 2005-08-04 10:20:01 EDT
Looking at the patches in 2.6.12-1.1411_FC4, it includes a local (by me) patch
to powernow-k8 which does fix the problem.

However, in researching the fix, I found that there were a number of updates to
powernow-k8 queued for 2.6.13 in addition to this fix.  All of these fixes are
includes in the kernel rpm in development. I know that development kernels are
just that:development.  However, is there any reason not to grab the development
src.rpm and rebuild for FC4?  I currently plan this for a system currently in
test mode and not planned for "production" until next month.

Specifically, is there anything in the FC5 development kernel which will
have/cause problems if run on an otherwise FC4 system?
Comment 8 Gene Czarcinski 2005-08-04 10:47:17 EDT
BTW, as far as I am concerned, this bug report can be closed with the
2.6.12-1.1411_FC4 update.
Comment 9 Dave Jones 2005-08-04 13:24:23 EDT
The 2.6.13rc kernel in development may cause random bits of userspace to stop
working. PCMCIA got overhauled dramatically for example.  Every time we rebase
we usually end up having to update a half dozen or so packages afterwards.

Note You need to log in before you can comment on or make changes to this bug.