Bug 443853 - RHEL 5.3 NULL pointer dereferenced in powernowk8_init
Summary: RHEL 5.3 NULL pointer dereferenced in powernowk8_init
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.2
Hardware: i386
OS: Linux
urgent
high
Target Milestone: rc
: ---
Assignee: Prarit Bhargava
QA Contact: Martin Jenner
URL:
Whiteboard: GSSApproved
: 448937 (view as bug list)
Depends On:
Blocks: 441906 KernelPrio5.3 450866 RHEL5u3_relnotes
TreeView+ depends on / blocked
 
Reported: 2008-04-23 17:57 UTC by Russell Doty
Modified: 2009-06-20 03:44 UTC (History)
24 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
(x86) The powernowk8 driver was not performing sufficient checks on the number of running CPUs. Consequently, when the driver was started, a kernel oops error message may have been reported. In this update the powernowk8 driver verifies that the number of supported CPUs (supported_cpus) equals the number of online CPUs (num_online_cpus), which resolves this issue.
Clone Of:
Environment:
Last Closed: 2009-01-20 19:40:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
RHEL5 fix for this issue [1/2] (1.25 KB, patch)
2008-04-23 18:07 UTC, Prarit Bhargava
no flags Details | Diff
RHEL5 fix for this issue [2/2] (2.59 KB, patch)
2008-04-23 18:25 UTC, Prarit Bhargava
no flags Details | Diff
demidecode output from MS9620 MoBo (15.22 KB, text/plain)
2008-07-10 16:44 UTC, Karanbir Singh
no flags Details


Links
System ID Private Priority Status Summary Last Updated
CentOS 2912 0 None None None Never
Red Hat Product Errata RHSA-2009:0225 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.3 kernel security and bug fix update 2009-01-20 16:06:24 UTC

Comment 1 Prarit Bhargava 2008-04-23 18:06:06 UTC
The issue here isn't the call to dmi_scan_* as mentioned jkarhune's comment #5
in the original BZ.  The problem is that one of the args to strncmp is NULL and
that leads to a NULL dereference.

P.

Comment 2 Prarit Bhargava 2008-04-23 18:07:34 UTC
Created attachment 303533 [details]
RHEL5 fix for this issue [1/2]

Comment 3 RHEL Program Management 2008-04-23 18:24:42 UTC
This bugzilla has Keywords: Regression.  

Since no regressions are allowed between releases, 
it is also being proposed as a blocker for this release.  

Please resolve ASAP.

Comment 4 Prarit Bhargava 2008-04-23 18:25:04 UTC
Created attachment 303534 [details]
RHEL5 fix for this issue [2/2]

Comment 5 Prarit Bhargava 2008-05-29 17:40:42 UTC
*** Bug 448937 has been marked as a duplicate of this bug. ***

Comment 12 Prarit Bhargava 2008-06-25 12:15:54 UTC
Panic seen on systems that do not support DMI or have busted DMI tables.  Panic
occurs during powernowk8 driver load, and occurs on both AMD and Intel systems.

BUG: unable to handle kernel NULL pointer dereference at virtual address 00000000
 printing eip:
c041041c
*pde = 00000000
Oops: 0000 [#1]
SMP 
last sysfs file: 
Modules linked in:
CPU:    3
EIP:    0060:[<c041041c>]    Not tainted VLI
EFLAGS: 00010202   (2.6.18-88.el5 #1) 
EIP is at powernowk8_init+0x5e/0x1c2
eax: 00000000   ebx: 00000000   ecx: 0000000e   edx: 00000020
esi: 00000000   edi: c06242c3   ebp: 00000000   esp: dfaa0fa0
ds: 007b   es: 007b   ss: 0068
Process swapper (pid: 1, ti=dfaa0000 task=dfa9faa0 task.ti=dfaa0000)
Stack: 00000000 c071bb38 00000000 c06ec5a8 c06e7fd8 c0404dee 00000202 c06ec42b 
       00000000 00000000 00000000 00000000 00000000 00000000 c06ec42b 00000000 
       00000000 c0405c3b 00000000 00000000 00000000 00000000 00000000 00000000 
Call Trace:
 [<c06ec5a8>] init+0x17d/0x24a
 [<c0404dee>] ret_from_fork+0x6/0x1c
 [<c06ec42b>] init+0x0/0x24a
 [<c06ec42b>] init+0x0/0x24a
 [<c0405c3b>] kernel_thread_helper+0x7/0x10
 =======================
Code: 83 3d 20 41 67 c0 01 75 40 83 3d 84 d4 76 c0 00 75 37 b8 01 00 00 00 bf c3
42 62 c0 e8 96 11 19 00 b9 0f 00 00 00 89 c6 49 78 08 <ac> ae 75 08 84 c0 75 f5
31 c0 eb 04 19 c0 0c 01 85 c0 75 0a c7 
EIP: [<c041041c>] powernowk8_init+0x5e/0x1c2 SS:ESP 0068:dfaa0fa0
 <0>Kernel panic - not syncing: Fatal exception

The panic is due to a missing NULL check on the return from a dmi_scan_* call. 
Original patch did not include this check.  A fix for this issue is patch 1/2
attached to this BZ.

Patch 2/2, also attached to this BZ, prevents Intel boxes from running any part
of the powernowk8 code, other than the check to see if the box is an AMD box or
an Intel box -- this diverges us significantly from upstream, however, we have
already diverged significantly from upstream in the init function and other
areas of this driver.

Both patches have been POSTed for review and will likely be included in RHEL5.3.

P.

Comment 13 Tim Verhoeven 2008-06-26 09:12:51 UTC
We are seeing this error on a lot of older hardware (usually PII and PIII's) as
reported here "http://bugs.centos.org/view.php?id=2912".

Is there any chance this will get fixed in a kernel update during 5.2 ?

This bug prevents people from applying security updates of the kernel since they
will have to stay at 2.6.18-53.1.21 (last working kernel).

Comment 14 Robert Moskowitz 2008-06-26 14:31:01 UTC
This also affects fresh installs from the install CDs.

On my DecTOP with an AMD GEODE chip, I would have to install 5.1, upgrade to 5.2
and then fall back to the 2.6.18-53.1.21 kernel.

So not only do we need an updated kernel, we need new install CDs.

The message I get from the kernel panic is:

<7> spurious 8259A interrupt: IRQ7





Comment 15 Don Zickus 2008-07-09 21:11:44 UTC
in kernel-2.6.18-95.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 16 Patrick C. F. Ernzer 2008-07-10 11:43:50 UTC
I can confirm that on my affected hardware (c.f. bug #439292) 2.6.18-95.el5
 boots just fine.

Comment 17 Karanbir Singh 2008-07-10 16:01:03 UTC
Linux koala.lan 2.6.18-95.el5 #1 SMP Thu Jul 3 20:54:13 EDT 2008 x86_64 x86_64
x86_64 GNU/Linux

Boots and runs fine on this machine ( which was unable to boot the -92.1.6.el5 )

Comment 18 Karanbir Singh 2008-07-10 16:43:37 UTC
one side effect of this kernel (2.6.18-95.el5) is that my vmware-server setup is
shot to hell. a Virtual Machine that previously took under 2 min to boot, is not
taking upto 15 minutes. 

I've checked physcial drive i/o rate on the host and things seem fine. 

I am running 2.6.18-95.el5 on the host machine, the VM is EL4 ( both x86_64 ). I
am running : 
[kbsingh@koala ~]$ rpm -q VMware-server
VMware-server-1.0.4-56528

[kbsingh@koala ~]$ cat /proc/cpuinfo 
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 37
model name      : AMD Opteron(tm) Processor 250   
stepping        : 1
cpu MHz         : 1000.000
cache size      : 1024 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt lm 3dnowext 3dnow
pni lahf_lm
bogomips        : 2009.84
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp

processor       : 1
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 37
model name      : AMD Opteron(tm) Processor 250   
stepping        : 1
cpu MHz         : 2411.146
cache size      : 1024 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt lm 3dnowext 3dnow
pni lahf_lm
bogomips        : 4821.51
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp

[kbsingh@koala ~]$ free
             total       used       free     shared    buffers     cached
Mem:       6093108     879044    5214064          0      32488     495524
-/+ buffers/cache:     351032    5742076
Swap:      4192956          0    4192956

And, I am seeing things like this in top:
 3807 kbsingh    5 -10  240m 118m 110m S  178  2.0   3:28.29 vmware-vmx         

if it matters, this machine is built around a  MS-9620 MicroStar MotherBoard (
dmidecode dump attached with bug report )

Not sure if this is an issue with VMware itself, and something that needs to be
reported there, but if there is any other info required I'd be happy to provide
any feedback.

Comment 19 Karanbir Singh 2008-07-10 16:44:28 UTC
Created attachment 311493 [details]
demidecode output from MS9620 MoBo

Comment 20 Prarit Bhargava 2008-07-10 17:12:03 UTC
Karanbir, could you open up a *new* bugzilla with that information please and
add me to the cc list?

That seems like a completely new issue.

Thanks,

P.

Comment 21 Robert Moskowitz 2008-07-10 20:53:02 UTC
This patch works on my DECtop AMD Geode based systems.

Thanks, now can have an ISO for the ist CD built with this kernel so I can do
straight 5.2 installs on problem hardware?

Comment 22 Prarit Bhargava 2008-07-10 22:39:50 UTC
(In reply to comment #21)
> This patch works on my DECtop AMD Geode based systems.
> 
> Thanks, now can have an ISO for the ist CD built with this kernel so I can do
> straight 5.2 installs on problem hardware?

Red Hat does not respin ISOs in-between releases.  You can install an earlier
version of RHEL5 and upgrade to the latest kernel.

P.

Comment 23 Johnny Hughes 2008-07-11 11:19:44 UTC
We have created a kernel that fixes this issue here for CentOS users:

http://people.centos.org/hughesjr/kernel/5/bz_pre53/

The goal of this kernel is to keep a kernel as close as possible to the
"released version" while fixing major issues for CentOS users that will be
rolled into the 5.3 kernel.  We will keep this version up to date with security
patches as they are released.

Comment 28 Ryan Lerch 2008-10-22 03:35:29 UTC
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
(x86)
The powernowk8 driver was not performing sufficient checks on the number of running CPUs. Consequently, when the driver was started, a kernel oops error message may have been reported. In this update the powernowk8 driver verifies that the number of supported CPUs (supported_cpus) equals the number of online CPUs (num_online_cpus), which resolves this issue.

Comment 30 Akemi Yagi 2008-11-30 17:30:23 UTC
I understand the patch(es) that fix the bug reported in here were added to kernel 2.6.18-92.1.7.el5.  It was not clear by just reading this bugzilla thread.  The status "ON_QA" should be updated to reflect the fact (that the fix is already in 5.2) ??

Comment 35 errata-xmlrpc 2009-01-20 19:40:26 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-0225.html


Note You need to log in before you can comment on or make changes to this bug.