Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 467941

Summary:

Kernel BUG at drivers/cpufreq/cpufreq_userspace.c:136

Product:

Red Hat Enterprise Linux 5

Reporter:

Rafael Garabato <rafael.f.garabato>

Component:

kernel

Assignee:

John Villalovos <jvillalo>

Status:

CLOSED DUPLICATE

QA Contact:

Martin Jenner <mjenner>

Severity:

high

Docs Contact:

Priority:

medium

Version:

5.4

CC:

andriusb, dzickus, jane.lv, jfeeney, jvillalo, keve.a.gabbert, Klaus+rhbz, len.brown, luyu, rafael.f.garabato, youquan.song

Target Milestone:

Keywords:

Reopened

Target Release:

5.4

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2009-05-26 13:15:33 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

500311

Bug Blocks:

480792

Attachments:

Description	Flags
Test kernel	none
kernel-2.6.18-120.el5dz_test boot log	none
Adding _PSS invalidation check	none
Updated patch for the 2.6.18-149 kernel	none

Description Rafael Garabato 2008-10-21 19:26:01 UTC

Description of problem:
I have downloaded and installed kernel-2.6.18-120.el5.gtest.59.x86_64.rpm and kernel-2.6.18-116.el5.gtest.57.x86_64.rpm from http://people.redhat.com/agospoda/#rhel5.

Both packages failed to work in Intel's Shoffner platform. There is a kernel panic at __cpufreq_governor: invalid opcode: 0000 [1] SMP



Version-Release number of selected component (if applicable):
kernel-2.6.18-116.el5.gtest.57.x86_64
kernel-2.6.18-120.el5.gtest.59.x86_64


How reproducible:
Always

Steps to Reproduce:
1. Install the packages
2. reboot the computer
3. 

  

Actual results:

Version 1.20.1093 Copyright (C) 2005-2007 American Megatrends, Inc.
Press <F2> to enter setup, <F12> Network Boot
Bios Version: S5400.86B.06.00.0026.040820080929
Platform ID:  S5400SF
4 GB system memory found
Current Memory Speed: 667 MT/s (333 MHz)
Intel(R) Xeon(R) CPU           X5355  @ 2.66GHz
Intel(R) Xeon(R) CPU           X5355  @ 2.66GHz
Booting from BIOS Partition 0
USB keyboard detected
USB mouse detected



Memory for crash kernel (0x0 to 0x0) notwithin permissible range
ÿRed Hat nash version 5.1.19.6 starting
  Reading all physical volumes.  This may take a while...
  Found volume group "VolGroup00" using metadata type lvm2
  2 logical volume(s) in volume group "VolGroup00" now active
                Welcome to Red Hat Enterprise Linux Server
                Press 'I' to enter interactive startup.
Setting clock  (utc): Tue Oct 21 16:45:51 ARST 2008 [  OK  ]
Starting udev: [  OK  ]
Loading default keymap (us): [  OK  ]
Setting hostname rhhpcsf.intel.com:  [  OK  ]
Setting up Logical Volume Management:   2 logical volume(s) in volume group "VolGroup00" now active
[  OK  ]
Checking filesystems
Checking all file systems.
[/sbin/fsck.ext3 (1) -- /] fsck.ext3 -a /dev/VolGroup00/LogVol00
/dev/VolGroup00/LogVol00: clean, 148762/60522496 files, 4355939/60514304 blocks
[/sbin/fsck.ext3 (1) -- /boot] fsck.ext3 -a /dev/sda1
/boot: recovering journal
/boot: clean, 39/26104 files, 21917/104388 blocks
[  OK  ]
Remounting root filesystem in read-write mode:  [  OK  ]
Mounting local filesystems:  [  OK  ]
Enabling local filesystem quotas:  [  OK  ]
Enabling /etc/fstab swaps:  [  OK  ]
INIT: Entering runlevel: 3
Entering non-interactive startup
Applying Intel CPU microcode update: [  OK  ]
Starting monitoring for VG VolGroup00:   /dev/hdb: open failed: Read-only file system
  2 logical volume(s) in volume group "VolGroup00" monitored
[  OK  ]
Starting background readahead: [  OK  ]
Checking for hardware changes [  OK  ]
Loading OpenIB kernel modules:[  OK  ]
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at drivers/cpufreq/cpufreq_userspace.c:136
invalid opcode: 0000 [1] SMP
last sysfs file: /devices/pci0000:00/0000:00:00.0/class
CPU 0
Modules linked in: acpi_cpufreq ib_iser libiscsi scsi_transport_iscsi ib_srp ib_sdp ib_ipoib ipv6 xfrm_nalgo crypto_api rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ib_sa dm_multipath scsi_dh video backlight sbs i2c_ec button battery asus_acpi acpi_memhotplug ac parport_pc lp parport joydev sr_mod ide_cd ib_mthca cdrom shpchp i2c_i801 ib_mad i2c_core ib_core sg serio_raw e1000e pcspkr dm_snapshot dm_zero dm_mirror dm_log dm_mod usb_storage ata_piix libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 12673, comm: modprobe Not tainted 2.6.18-116.el5.gtest.57 #1
RIP: 0010:[<ffffffff80215a14>]  [<ffffffff80215a14>] cpufreq_governor_userspace+0x44/0x205
RSP: 0000:ffff810174c85ca8  EFLAGS: 00010246
RAX: 00000000ffffffff RBX: ffff81017e220400 RCX: 0000000000000000
RDX: 00000000ffffffea RSI: 0000000000000000 RDI: ffff81017e220400
RBP: ffff81017e220400 R08: 0000000000000001 R09: 0000000000000000
R10: ffff81017e220400 R11: 0000000000000058 R12: 0000000000000000
R13: 0000000000000000 R14: ffffffff80450688 R15: 0000000000000000
FS:  00002aaaaaac7240(0000) GS:ffffffff803b8000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000000062e19f CR3: 0000000176504000 CR4: 00000000000006e0
Process modprobe (pid: 12673, threadinfo ffff810174c84000, task ffff81017c588860)
Stack:  ffff81017e220400 0000000000000001 0000000000000000 ffffffff802142c7
 ffff810174c85d48 ffff81017e220400 0000000000000000 ffffffff802144cd
 0000000000000000 ffff81017e220400 ffff810174c85d48 ffffffff8033d300
Call Trace:
 [<ffffffff802142c7>] __cpufreq_governor+0x6a/0xf6
 [<ffffffff802144cd>] __cpufreq_set_policy+0x17a/0x1f4
 [<ffffffff80214f40>] cpufreq_set_policy+0x33/0x7c
 [<ffffffff80215410>] cpufreq_add_dev+0x435/0x57f
 [<ffffffff80214ee5>] handle_update+0x0/0x28
 [<ffffffff801b66cc>] sysdev_driver_register+0x61/0xbd
 [<ffffffff80214182>] cpufreq_register_driver+0xb9/0x194
 [<ffffffff800a4d15>] sys_init_module+0xaf/0x1e8
 [<ffffffff8005d116>] system_call+0x7e/0x83


Code: 0f 0b 68 89 64 2c 80 c2 88 00 44 89 e3 48 c7 c7 c0 d7 33 80
RIP  [<ffffffff80215a14>] cpufreq_governor_userspace+0x44/0x205
 RSP <ffff810174c85ca8>
 <0>Kernel panic - not syncing: Fatal exception



Expected results:
Boot succeeds

Additional info:

I also have the Red Hat HPC bits downloaded. Actually, the bug occurs after the system Loaded the OpenIB kernel modules.

Comment 1 John Villalovos 2008-10-23 14:47:52 UTC

What are the results with the RHEL 5.3 Alpha?

Comment 2 Rafael Garabato 2008-10-23 14:56:08 UTC

That would be good to try. I will let you know once I try it?. However, it could be a problem of having the HPC solution installed together with the new kernel.

Now I have a clean install of 5.2 without Red Hat HPC so I will try the new kernel and see what happens.
Then I will try Red Hat 5.3 Alpha.

Rafael.

Comment 3 John Villalovos 2008-10-23 14:57:20 UTC

I'm curious why you are pulling from Gospo's kernel and not
http://people.redhat.com/dzickus/el5/ if you are pulling a development kernel.

Also, am I correct in saying the the Shoffner platform is the Dual socket High
Performance Compute Platform?

Comment 4 Matthew Garrett 2008-10-23 15:48:27 UTC

I'm building a kernel with extra debugging to give some extra information on this.

Comment 5 Rafael Garabato 2008-10-23 16:26:03 UTC

It is the Dual Socket HPC platform.

I have installed the kernel on Red Hat EL 5.2 without Red Hat HPC software and it also hangs in the same way. 

I will now try Red Hat EL 5.3.

Comment 6 Rafael Garabato 2008-10-24 18:24:49 UTC

Red Hat EL 5.3 Alpha is also crashing in the same way.

Comment 7 John Villalovos 2008-10-24 18:29:36 UTC

Youquan,

Since you are our cpufreq expert, can you take a look at this and work with Rafael?

Rafael, Youquan is in Beijing, so might be a couple days before he responds.  I don't think we have any SDVs for that platform, so will have to do some remote debugging.

Comment 8 Song, Youquan 2008-10-28 03:47:38 UTC

Sorry, I am just back to office from vacation.  
Rafael, could you provide the machine's available access address for I have not such SDV? So I can try to do some remote debugging etc..

Comment 9 Rafael Garabato 2008-10-29 13:45:11 UTC

Where can I get the sources for Red Hat 5.3 alpha kernel? (2.6.18-118.el5)

I couldn't find them in the DVD.

Comment 10 Matthew Garrett 2008-10-29 18:37:12 UTC

Created attachment 321843 [details]
Test kernel

Can you provide dmesg output when booting with the attached kernel?

Comment 11 Rafael Garabato 2008-10-29 19:31:10 UTC

Created attachment 321858 [details]
 kernel-2.6.18-120.el5dz_test boot log

Comment 12 Rafael Garabato 2008-10-29 19:32:45 UTC

The kernel from Comment #11 doesn't boot. It crashes before the cpu_freq crash during usb initialization.

See logs: https://bugzilla.redhat.com/attachment.cgi?id=321858

Comment 13 Matthew Garrett 2008-10-29 19:44:49 UTC

Are you able to test without any USB input devices plugged in? This seems to be triggered by building RHEL kernels on Rawhide systems, for some reason.

Comment 14 Song, Youquan 2008-11-05 14:57:55 UTC

From Rafael's information, the issue can be solve by enable EIST (Enhanced Intel Speedstep Technonogy) in BIOS.  
Rafael, Can you build upstream kernel to check the issue when EIST disable?

Comment 15 Rafael Garabato 2008-11-05 18:21:54 UTC

By upstream kernel you mean the latest kernel from Kernel org?
I thought that there was a patch for this under development, isn't it?

Comment 16 Song, Youquan 2008-11-06 02:59:34 UTC

Yes. you can try it with 2.6.27 kernel.  I want to make sure that the upstream kernel if can handle this kind of BIOS issue.  If upstream can handle, we can try to backport the patch for RHEL5.3. If not, we can ask help from upstream developer.

Comment 17 Rafael Garabato 2008-11-06 14:56:21 UTC

No luck with the upstream kernel. Seems to be the same issue although the behaviour is not the same.


Linux version 2.6.27.4 (root.com) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)) #2 SMP Thu Nov 6 10:08:15 ARST 2008
Command line: ro root=/dev/VolGroup00/LogVol00 console=ttyS1,115200n8
.
.
.
------------[ cut here ]------------
kernel BUG at drivers/cpufreq/cpufreq_userspace.c:122!
invalid opcode: 0000 [1] SMP
CPU 0
Modules linked in: acpi_cpufreq(+) dm_multipath scsi_dh sbs sbshc battery acpi_memhotplug ac parport_pc lp parport e1000e joydev sr_mod mlx4_core shpchp sg rtc_cmos button rtc_core pcspkr rtc_lib ide_cd_mod cdrom serio_raw i2c_i801 i2c_core dm_snapshot dm_zero dm_mirror dm_log dm_mod usb_storage ata_piix libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode]
Pid: 3532, comm: modprobe Not tainted 2.6.27.4 #2
RIP: 0010:[<ffffffff803f6617>]  [<ffffffff803f6617>] cpufreq_governor_userspace+0x4a/0x31f
RSP: 0018:ffff880175d65ca8  EFLAGS: 00010246
RAX: 00000000ffffffff RBX: ffff88017d8de200 RCX: 0000000000000000
RDX: 00000000ffffffea RSI: 0000000000000000 RDI: ffff88017d8de200
RBP: ffff88017d8de200 R08: 0000000000000001 R09: 0000000000000000
R10: ffffffff805d41c0 R11: ffff880175d65c68 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS:  00007f4949a7d6e0(0000) GS:ffffffff80717a80(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007fff75a69130 CR3: 0000000175d75000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process modprobe (pid: 3532, threadinfo ffff880175d64000, task ffff88017b6f6400)
Stack:  0000000000000000 0000000000000000 0000000000000292 ffff88017d8de200
 ffff88017d8de200 0000000000000001 0000000000000000 ffffffff803f4d35
 ffff880175d65d58 ffff88017d8de200 0000000000000000 ffffffff803f4eed
Call Trace:
 [<ffffffff803f4d35>] ? __cpufreq_governor+0x91/0xc8
 [<ffffffff803f4eed>] ? __cpufreq_set_policy+0x181/0x1fb
 [<ffffffff803f5e75>] ? cpufreq_add_dev+0x4c7/0x5f4
 [<ffffffff803f5934>] ? handle_update+0x0/0x28
 [<ffffffff8032432a>] ? __next_cpu_nr+0x1a/0x21
 [<ffffffff803a20a6>] ? sysdev_driver_register+0xa4/0x100
 [<ffffffff803f4bae>] ? cpufreq_register_driver+0xbb/0x1b1
 [<ffffffffa019b000>] ? acpi_cpufreq_init+0x0/0x90 [acpi_cpufreq]
 [<ffffffff80209041>] ? _stext+0x41/0x110
 [<ffffffff80257671>] ? sys_init_module+0x9e/0x1ad
 [<ffffffff8020be6b>] ? system_call_fastpath+0x16/0x1b


Code: f2 01 00 00 31 d2 ff ce 0f 85 e5 02 00 00 44 0f a3 25 1e 17 32 00 19 c0 85 c0 ba ea ff ff ff 0f 84 ce 02 00 00 83 7f 5c 00 75 04 <0f> 0b eb fe 48 c7 c7 d0 45 5d 80 e8 d0 6f 09 00 83 3d 02 aa 48
RIP  [<ffffffff803f6617>] cpufreq_governor_userspace+0x4a/0x31f
 RSP <ffff880175d65ca8>
---[ end trace 2b1230b4297c4a04 ]---
/etc/rc3.d/S06cpuspeed: line 112:  3532 Segmentation fault      /sbin/modprobe acpi-cpufreq 2> /dev/null

Comment 18 Song, Youquan 2008-11-07 03:20:50 UTC

OK. I know that. I will push upstream do fix this kind of BIOS issue.

Comment 19 Rafael Garabato 2008-11-19 12:29:51 UTC

Which version will contain the fixes?
I know there is a RHEL5.3-Snap3 available. Does this version contain the fix?

Comment 20 Matthew Garrett 2008-11-19 12:40:44 UTC

There's currently no patch or fix. Have you been able to test the kernel mentioned earlier?

Comment 21 Rafael Garabato 2008-11-19 12:51:28 UTC

I tested a fix provided by Youquan on November 11 (not logged here) with successful results. But I don't know the status of this fix in Red Hat.

Comment 22 Matthew Garrett 2008-11-19 13:05:10 UTC

The change doesn't appear to have been submitted for the Red Hat kernel.

Comment 23 Rafael Garabato 2008-11-19 13:53:06 UTC

Youquan, can you submit the fix?

Comment 24 Song, Youquan 2008-11-20 02:49:02 UTC

Created attachment 324133 [details]
Adding _PSS invalidation check

Comment 25 Song, Youquan 2008-12-05 05:11:04 UTC

What's the status of the bug?

Comment 26 Rafael Garabato 2008-12-05 14:34:01 UTC

Was the patch included in RH EL 5.3 GA Snapshot 4 or before?

Comment 27 Rafael Garabato 2008-12-16 14:41:30 UTC

Which is the status of this bug?. Was it included in any of the 5.3 snapshots? Will it be included otherwise?

Comment 28 Song, Youquan 2008-12-16 15:40:09 UTC

No. RHEL5.3 do not include the patch. We can just desire to include it in RHEL5.4. 
This patch is included upstream -mm tree now.

Comment 29 Rafael Garabato 2008-12-29 18:28:11 UTC

The Bug is still being reproduced in RH EL 5.3 RC1. 
This comment is just for the record and tracking purposes.

Comment 30 Ronald Pacheco 2009-01-05 18:08:15 UTC

Rafael,

Per comment 28, the code is not in Linus' kernel yet.  We will target RHEL 5.4, assuming the code is upstream by the time we code freeze.

Comment 31 Keve Gabbert 2009-01-05 23:52:53 UTC

please update subject with "RHEL 5.4"

Comment 32 John Villalovos 2009-01-06 18:12:24 UTC

Youquan,

Is it possible to work around this bug via a kernel command line argument?

Comment 33 Song, Youquan 2009-01-07 07:59:34 UTC

There is no kernel command line option to work round it on this situation.

Comment 34 John Villalovos 2009-01-21 02:09:45 UTC

Rafael,

The upstream maintainer of ACPI (Len Brown) rejected the patch because he believes that this issue is a BIOS bug.  Can you try to have the BIOS team investigate this issue and fix it.

Comment 35 Ronald Pacheco 2009-01-21 02:43:38 UTC

John and Rafael,

Based on comment 34, I am closing this as "not a bug".

Comment 36 Len Brown 2009-03-16 03:19:28 UTC

The patch in comment #24 fixes the problem at hand,
and will not break anything else.  So I'd not lose
any sleep about putting it as a workaround into a distro
until a better patch is available.

But the reason that I will not accept that patch upstream
is that it is checking for a random bit pattern to determine
that p-states are disabled by the BIOS.  This bit pattern
is completely arbitrary.

If the BIOS is going to throw random bit patterns at us,
then we need to either
1. be smarter about sanity checking them -- checking
   one bit and not others, why?

or better...
2. harden linux to handle total garbage in that field.

Comment 37 Len Brown 2009-05-20 16:23:52 UTC

re-opening, as there is no indication that this issue has gone away,
either with a BIOS upgrade or a kernel patch.

Comment 38 Len Brown 2009-05-20 16:25:46 UTC

Rafael,
Please make sure that the board is running a production BIOS,
and then please attach the output from acpidump and dmidecode

Comment 39 John Villalovos 2009-05-20 18:48:32 UTC

Created attachment 344864 [details]
Updated patch for the 2.6.18-149 kernel

Comment 40 John Villalovos 2009-05-20 18:50:02 UTC

This bug seems to be the same as Bug 500311

Comment 41 Rafael Garabato 2009-05-22 13:23:36 UTC

(In reply to comment #38)
> Rafael,
> Please make sure that the board is running a production BIOS,
> and then please attach the output from acpidump and dmidecode  

The BIOS I was using was a production BIOS available at www.intel.com. I am not sure which is your concern.

Comment 42 John Villalovos 2009-05-22 13:31:45 UTC

Rafael,

Do you know what version of the BIOS you are running?

Could we get the output of dmidecode please?

Also is it possible to get a copy of the acpidump output?  You can get the pmtools RPM in Bug 500311 and acpidump is inside that package.  You can use that one if you like or:

The pmtools source can be found at:
http://www.lesswatts.org/projects/acpi/utilities.php

I was able to build the Fedora SRPM on my RHEL5 build root:
ftp://mirrors.kernel.org/fedora/releases/10/Everything/source/SRPMS/pmtools-20071116-1.fc9.src.rpm

Comment 43 Rafael Garabato 2009-05-22 13:33:22 UTC

Just (In reply to comment #37)
> re-opening, as there is no indication that this issue has gone away,
> either with a BIOS upgrade or a kernel patch.  

Update:
We have successfully Installed Red Hat 5.3 on this board with the latest BIOS release. I don't know which is the exact Bios version/ kernel version that handled this issue, but the system is working correctly with the latest versions.

Comment 44 John Villalovos 2009-05-22 13:41:35 UTC

Rafael,

Thanks for the info.  Glad to hear the BIOS update fixed it for you.

Comment 45 Ronald Pacheco 2009-05-22 20:06:40 UTC

John,

Based on comments #33 and #34, it appears that we should close this as notabug.  If you concur, please close.  If not, then please provide an update as to what the problem is and how to reproduce it.  Thanks!

Comment 46 John Villalovos 2009-05-22 20:24:01 UTC

Ron,

I believe this is a bug.  This and Bug 500311 seem to be the same bug.  So we could mark this as a duplicate of that bug if desired.

Customers in the field are seeing this issue.

Comment 47 Ronald Pacheco 2009-05-26 12:37:51 UTC

John,

If it's a dup, then please close this as a dup.

Comment 48 John Villalovos 2009-05-26 13:15:33 UTC


*** This bug has been marked as a duplicate of bug 500311 ***