Bug 443853
| Summary: | RHEL 5.3 NULL pointer dereferenced in powernowk8_init | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Russell Doty <rdoty> | ||||||||
| Component: | kernel | Assignee: | Prarit Bhargava <prarit> | ||||||||
| Status: | CLOSED ERRATA | QA Contact: | Martin Jenner <mjenner> | ||||||||
| Severity: | high | Docs Contact: | |||||||||
| Priority: | urgent | ||||||||||
| Version: | 5.2 | CC: | agk, akropel1, amyagi, bmaly, bnagendr, cward, herrold, jplans, kbsingh, mishu, notting, pasteur, pcfe, peterm, prarit, qcai, ralph, rdoty, rgm, rh-bugzilla, rlerch, sputhenp, tim.verhoeven.be, wmealing | ||||||||
| Target Milestone: | rc | Keywords: | ZStream | ||||||||
| Target Release: | --- | ||||||||||
| Hardware: | i386 | ||||||||||
| OS: | Linux | ||||||||||
| Whiteboard: | GSSApproved | ||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||
| Doc Text: |
(x86)
The powernowk8 driver was not performing sufficient checks on the number of running CPUs. Consequently, when the driver was started, a kernel oops error message may have been reported. In this update the powernowk8 driver verifies that the number of supported CPUs (supported_cpus) equals the number of online CPUs (num_online_cpus), which resolves this issue.
|
Story Points: | --- | ||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2009-01-20 19:40:26 UTC | Type: | --- | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Bug Depends On: | |||||||||||
| Bug Blocks: | 441906, 448732, 450866, 454962 | ||||||||||
| Attachments: |
|
||||||||||
|
Comment 1
Prarit Bhargava
2008-04-23 18:06:06 UTC
Created attachment 303533 [details]
RHEL5 fix for this issue [1/2]
This bugzilla has Keywords: Regression. Since no regressions are allowed between releases, it is also being proposed as a blocker for this release. Please resolve ASAP. Created attachment 303534 [details]
RHEL5 fix for this issue [2/2]
*** Bug 448937 has been marked as a duplicate of this bug. *** Panic seen on systems that do not support DMI or have busted DMI tables. Panic
occurs during powernowk8 driver load, and occurs on both AMD and Intel systems.
BUG: unable to handle kernel NULL pointer dereference at virtual address 00000000
printing eip:
c041041c
*pde = 00000000
Oops: 0000 [#1]
SMP
last sysfs file:
Modules linked in:
CPU: 3
EIP: 0060:[<c041041c>] Not tainted VLI
EFLAGS: 00010202 (2.6.18-88.el5 #1)
EIP is at powernowk8_init+0x5e/0x1c2
eax: 00000000 ebx: 00000000 ecx: 0000000e edx: 00000020
esi: 00000000 edi: c06242c3 ebp: 00000000 esp: dfaa0fa0
ds: 007b es: 007b ss: 0068
Process swapper (pid: 1, ti=dfaa0000 task=dfa9faa0 task.ti=dfaa0000)
Stack: 00000000 c071bb38 00000000 c06ec5a8 c06e7fd8 c0404dee 00000202 c06ec42b
00000000 00000000 00000000 00000000 00000000 00000000 c06ec42b 00000000
00000000 c0405c3b 00000000 00000000 00000000 00000000 00000000 00000000
Call Trace:
[<c06ec5a8>] init+0x17d/0x24a
[<c0404dee>] ret_from_fork+0x6/0x1c
[<c06ec42b>] init+0x0/0x24a
[<c06ec42b>] init+0x0/0x24a
[<c0405c3b>] kernel_thread_helper+0x7/0x10
=======================
Code: 83 3d 20 41 67 c0 01 75 40 83 3d 84 d4 76 c0 00 75 37 b8 01 00 00 00 bf c3
42 62 c0 e8 96 11 19 00 b9 0f 00 00 00 89 c6 49 78 08 <ac> ae 75 08 84 c0 75 f5
31 c0 eb 04 19 c0 0c 01 85 c0 75 0a c7
EIP: [<c041041c>] powernowk8_init+0x5e/0x1c2 SS:ESP 0068:dfaa0fa0
<0>Kernel panic - not syncing: Fatal exception
The panic is due to a missing NULL check on the return from a dmi_scan_* call.
Original patch did not include this check. A fix for this issue is patch 1/2
attached to this BZ.
Patch 2/2, also attached to this BZ, prevents Intel boxes from running any part
of the powernowk8 code, other than the check to see if the box is an AMD box or
an Intel box -- this diverges us significantly from upstream, however, we have
already diverged significantly from upstream in the init function and other
areas of this driver.
Both patches have been POSTed for review and will likely be included in RHEL5.3.
P.
We are seeing this error on a lot of older hardware (usually PII and PIII's) as reported here "http://bugs.centos.org/view.php?id=2912". Is there any chance this will get fixed in a kernel update during 5.2 ? This bug prevents people from applying security updates of the kernel since they will have to stay at 2.6.18-53.1.21 (last working kernel). This also affects fresh installs from the install CDs. On my DecTOP with an AMD GEODE chip, I would have to install 5.1, upgrade to 5.2 and then fall back to the 2.6.18-53.1.21 kernel. So not only do we need an updated kernel, we need new install CDs. The message I get from the kernel panic is: <7> spurious 8259A interrupt: IRQ7 in kernel-2.6.18-95.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 I can confirm that on my affected hardware (c.f. bug #439292) 2.6.18-95.el5 boots just fine. Linux koala.lan 2.6.18-95.el5 #1 SMP Thu Jul 3 20:54:13 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux Boots and runs fine on this machine ( which was unable to boot the -92.1.6.el5 ) one side effect of this kernel (2.6.18-95.el5) is that my vmware-server setup is
shot to hell. a Virtual Machine that previously took under 2 min to boot, is not
taking upto 15 minutes.
I've checked physcial drive i/o rate on the host and things seem fine.
I am running 2.6.18-95.el5 on the host machine, the VM is EL4 ( both x86_64 ). I
am running :
[kbsingh@koala ~]$ rpm -q VMware-server
VMware-server-1.0.4-56528
[kbsingh@koala ~]$ cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 15
model : 37
model name : AMD Opteron(tm) Processor 250
stepping : 1
cpu MHz : 1000.000
cache size : 1024 KB
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt lm 3dnowext 3dnow
pni lahf_lm
bogomips : 2009.84
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp
processor : 1
vendor_id : AuthenticAMD
cpu family : 15
model : 37
model name : AMD Opteron(tm) Processor 250
stepping : 1
cpu MHz : 2411.146
cache size : 1024 KB
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt lm 3dnowext 3dnow
pni lahf_lm
bogomips : 4821.51
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp
[kbsingh@koala ~]$ free
total used free shared buffers cached
Mem: 6093108 879044 5214064 0 32488 495524
-/+ buffers/cache: 351032 5742076
Swap: 4192956 0 4192956
And, I am seeing things like this in top:
3807 kbsingh 5 -10 240m 118m 110m S 178 2.0 3:28.29 vmware-vmx
if it matters, this machine is built around a MS-9620 MicroStar MotherBoard (
dmidecode dump attached with bug report )
Not sure if this is an issue with VMware itself, and something that needs to be
reported there, but if there is any other info required I'd be happy to provide
any feedback.
Created attachment 311493 [details]
demidecode output from MS9620 MoBo
Karanbir, could you open up a *new* bugzilla with that information please and add me to the cc list? That seems like a completely new issue. Thanks, P. This patch works on my DECtop AMD Geode based systems. Thanks, now can have an ISO for the ist CD built with this kernel so I can do straight 5.2 installs on problem hardware? (In reply to comment #21) > This patch works on my DECtop AMD Geode based systems. > > Thanks, now can have an ISO for the ist CD built with this kernel so I can do > straight 5.2 installs on problem hardware? Red Hat does not respin ISOs in-between releases. You can install an earlier version of RHEL5 and upgrade to the latest kernel. P. We have created a kernel that fixes this issue here for CentOS users: http://people.centos.org/hughesjr/kernel/5/bz_pre53/ The goal of this kernel is to keep a kernel as close as possible to the "released version" while fixing major issues for CentOS users that will be rolled into the 5.3 kernel. We will keep this version up to date with security patches as they are released. Release note added. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: (x86) The powernowk8 driver was not performing sufficient checks on the number of running CPUs. Consequently, when the driver was started, a kernel oops error message may have been reported. In this update the powernowk8 driver verifies that the number of supported CPUs (supported_cpus) equals the number of online CPUs (num_online_cpus), which resolves this issue. I understand the patch(es) that fix the bug reported in here were added to kernel 2.6.18-92.1.7.el5. It was not clear by just reading this bugzilla thread. The status "ON_QA" should be updated to reflect the fact (that the fix is already in 5.2) ?? An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-0225.html |