Bug 715485 - intel_idle use of c3 provokes MCE panic - Shuttle SH55J2 + i7-870
Summary: intel_idle use of c3 provokes MCE panic - Shuttle SH55J2 + i7-870
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 18
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Matthew Garrett
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-06-23 03:07 UTC by J. Bruce Fields
Modified: 2012-11-13 01:42 UTC (History)
9 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2012-11-13 01:42:27 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
screenshot of panic (145.12 KB, image/jpeg)
2011-06-23 03:14 UTC, J. Bruce Fields
no flags Details
dmesg including oops after ht turned off (97.55 KB, text/plain)
2011-06-23 17:00 UTC, J. Bruce Fields
no flags Details
Photo of the error (195.88 KB, image/jpeg)
2012-05-22 09:56 UTC, Psk
no flags Details
Photo of the error (160.52 KB, image/jpeg)
2012-05-22 09:57 UTC, Psk
no flags Details
acpidump output (209.15 KB, text/plain)
2012-10-11 22:02 UTC, J. Bruce Fields
no flags Details
turbostat output (777 bytes, application/octet-stream)
2012-10-11 22:03 UTC, J. Bruce Fields
no flags Details
rdmsr.c (7.76 KB, text/x-csrc)
2012-10-12 14:04 UTC, Len Brown
no flags Details
wrmsr.c (3.54 KB, text/x-csrc)
2012-10-12 14:05 UTC, Len Brown
no flags Details
acpidump output (new BIOS, max_cstate=0) (209.15 KB, application/octet-stream)
2012-10-14 15:43 UTC, J. Bruce Fields
no flags Details

Description J. Bruce Fields 2011-06-23 03:07:23 UTC
After using the machine a while (mainly yum upgrades and installs), I get a panic.  I've transcribed it by hand below, and will attach an image.

This is on a Fedora 15 newly installed and then fully updated this afternoon.

I also attempted to install Fedora 15 on this machine a few weeks ago and got similar results (but wasn't able to catch the panic that time), so I'd be reasonably confident of being able to reproduce this within an hour or two.

The machine has run Fedora 13 for months with no trouble.

[hardware Error]: CPU 4: Machine Check Exception: 4 Bank 5: be00000000800400
Clocksource tsc unstable (delta = -8589933399 ns)
[hardware Error]: TSC a3fe0652426 ADDR 3fff81080b5d MISC 7fff
[Hardware Error]: PROCESSOR 0:106e5 TIME 1308792167 SOCKET 0 APIC 1
[Hardware Error]: No human readable MCE decoding support on this CPU type.
[Hardware Error]: Run the message through 'mcelog --ascii' to decode.
[Hardware Error]: CPU 0: Machine Check Exception: 4 Bank 5: be00000000800400
[Hardware Error]: TSC a3fe0633fb7 ADDR 3fff81080b5d MISC 7fff
[Hardware Error]: PROCESSOR 0:106e5 TIME 1308792167 SOCKET 0 APIC 0
[Hardware Error]: No human readable MCE decoding support on this CPU type.
[Hardware Error]: Run the message through 'mcelog --ascii' to decode.
[Hardware Error]: Machine check: Processor context corrupt
Kernel panic - not syncing: Fatal Machine check
Pid: 12496, comm: prelink Tainted: G   M        2.6.38.8-32.fc15.x86_64 #1
Call Trace:
 <#MC>  [<ffffffff8146c6e6>] panic+0x91/0x19c
 [<ffffffff8101b1bd>] mce_panic+0x191/0x1c7
 [<ffffffff8101b9b9>] do_machine_check+0x59a/0x741
 [<ffffffff8147622c>] machine_check+0x1c/0x30
 [<ffffffff81080b5d>] ? arch_local_irq_disable+0x4/0xd
 <<EOE>>  [<ffffffff814759a2>] _raw_spin_lock_irq+0x13/0x1e
 [<ffffffff810d8a0a>] add_to_page_cache_locked+0x93/0x118
 [<ffffffff8119864b>] ? ext4_get_block+0x0/0x18
 [<ffffffff810d8ab9>] add_to_page_cache_lru+0x2a/0x58
 [<ffffffff8114c14a>] mpage_readpages+0x99/0x104
 [<ffffffff8119864b>] ? ext4_get_block+0x0/0x18
 [<ffffffff8110875e>] ? alloc_pages_current+0xc7/0xd8
 [<ffffffff81194b9d>] ext4_readpages+0x1d/0x1f
 [<ffffffff810e0870>] __do_page_cache_readahead+0x100/0x177
 [<ffffffff810e0b4d>] ra_submit+0x21/0x25
 [<ffffffff810e0d1a>] ondemand_readahead+0x1c9/0x1d8
 [<ffffffff810e0da4>] page_cache_async_readahead+0x7b/0xa3
 [<ffffffff8122c8bc>] ? radix_tree_lookup_slot+0xe/0x10
 [<ffffffff810d7f42>] ? find_get_page+0x40/0x62
 [<ffffffff810d9708>] generic_file_aio_read+0x2bd/0x5e0
 [<ffffffff8112114a>] do_sync_read+0xbf/0xff
 [<ffffffff811e8102>] ? security_file_permission+0x2e/0x33
 [<ffffffff81121436>] ? rw_verify_area+0xb0/0xcd
 [<ffffffff811217b1>] vfs_read+0xa9/0xf0
 [<ffffffff8112192e>] sys_pread64+0x5a/0x76
 [<ffffffff81009bc2>] system_call_fastpath+0x16/0x1b
panic occurred, switching back to text console
Rebooting in 30 seconds..

Comment 1 J. Bruce Fields 2011-06-23 03:14:36 UTC
Created attachment 506108 [details]
screenshot of panic

Comment 2 Chuck Ebbert 2011-06-23 11:39:52 UTC
What kind of machine is it (vendor and model)? Does the problem go away if you disable hyperthreading in the BIOS?

Decoded MCE:

Wed Jun 22 21:22:47 2011
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 BANK 5 TSC a3fe0633fb7
MISC 7fff ADDR 3fff81080b5d 
MCG status:MCIP 
MCi status:
Uncorrected error
Error enabled
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: Internal Timer error
STATUS be00000000800400 MCGSTATUS 4
CPUID Vendor Intel Family 6 Model 30
PROCESSOR 0:106e5 TIME 1308792167 SOCKET 0 APIC 0

Comment 3 J. Bruce Fields 2011-06-23 15:11:07 UTC
The CPU is a core i7-870, in this barebones: http://www.newegg.com/Product/Product.aspx?Item=N82E16856101098.

I'll try turning off hyperthreading.

Comment 4 J. Bruce Fields 2011-06-23 17:00:51 UTC
Created attachment 509574 [details]
dmesg including oops after ht turned off

After turning off hyperthreading, booting, and playing around for a few minutes (mainly with setting up a kvm guest), I got a kernel oops.  (dmesg attached).  No idea if it's related.

Hm, I forgot that the driver for the built-in network interface is buggy in Fedora >=14.  (See https://bugzilla.redhat.com/show_bug.cgi?id=654147, for a machine built from the same barebones.)  So that may be a factor as well.  I'll see if I can find a workaround for that bug to help isolate this one.

Comment 5 Chuck Ebbert 2011-06-24 05:29:50 UTC
This looks like broken hardware.

  16:	b9 10 00 00 00       	mov    $0x10,%ecx
  1b:	45 85 ed             	test   %r13d,%r13d
  1e:	c7 85 ac fe ff ff ff 	movl   $0xffffffff,-0x154(%rbp)
  25:	ff ff ff 
  28:	48 89 d7             	mov    %rdx,%rdi

   0:	f3 ab                	rep stos %eax,%es:(%rdi)

%rcx should contain 0x10 but it contains 0xffff8801f2ce382c
%rdi points to userspace when it should be a copy of the kernel pointer in %rdx

I would try installing some other OS to rule out hardware problems.

Comment 6 J. Bruce Fields 2011-06-24 11:22:27 UTC
"I would try installing some other OS to rule out hardware problems."

The same machine runs fine under Fedora 13, and has been for months.  I've also tried downgrading it to Fedora 13 in case there's a hardware problem that developed only recently, but am still unable to reproduce the problem under Fedora 13, whereas it happens within an hour or two of use under Fedora 15.

Comment 7 Chuck Ebbert 2011-06-25 07:09:32 UTC
Something very strange is going on. Can you try "iommu=soft" to rule out DMAR bugs? Do older F15 kernels work?

Comment 8 J. Bruce Fields 2011-07-11 18:52:01 UTC
After some further testing I've seen it freeze (and couldn't get debugging information) after adding iommu=soft to the kernel commandline.

I believe I've seen similar problems under older F15 kernels, but haven't retested to confirm that.

Apologies, I use the machine a lot while I'm working and am not getting a lot of time to boot it to Fedora 15 for testing.

Comment 9 Bill McGonigle 2011-10-20 22:44:30 UTC
I just started seeing a very similar [hardware error] on an MSI laptop with an AMD e350 after updating to kernel 2.6.40.6-0.fc15.x86_64 yesterday.  It happened three times in a row, so of course I went to grab my camera and now the machine has been fine since.  Anyway, adding myself to the cc: list for if/when it happens again.

Comment 10 Dave Jones 2012-04-11 15:34:06 UTC
was this machine hibernated at all ? I'm wondering if this was more fallout from the recent i915 memory corruption bug that got fixed.

Comment 11 J. Bruce Fields 2012-04-11 15:45:44 UTC
No, the machine never hibernates.

Comment 12 Psk 2012-05-22 09:56:55 UTC
Created attachment 585981 [details]
Photo of the error

Comment 13 Psk 2012-05-22 09:57:46 UTC
Created attachment 585982 [details]
Photo of the error

Comment 14 Psk 2012-05-22 10:02:51 UTC
Hi,
I'm a new Fedora user and I've got the same kind of error on my machine, about once per day.

Fedora 16
CPU: Corei7 920
Ram: 3x 2 Go DDR3

Hyperthreading is already disabled because of an other problem with josm (https://bugzilla.redhat.com/show_bug.cgi?id=819345)

I don't have this problem when I'm Working on Windows 7 on the same computer.

I create 2 news attachment which are photos of error on my computer:
https://bugzilla.redhat.com/attachment.cgi?id=585981
https://bugzilla.redhat.com/attachment.cgi?id=585982

Comment 15 Josh Boyer 2012-07-11 17:53:14 UTC
Fedora 15 has reached it's end of life as of June 26, 2012.  As a result, we will not be fixing any remaining bugs found in Fedora 15.

In the event that you have upgraded to a newer release and the bug you reported is still present, please reopen the bug and set the version field to the newest release you have encountered the issue with.  Before doing so, please ensure you are testing the latest kernel update in that release and attach any new and relevant information you may have gathered.

Thank you for taking the time to file a report.  We hope newer versions of Fedora suit your needs.

Comment 16 J. Bruce Fields 2012-10-07 12:23:14 UTC
I've attempted to install more recent Fedora versions several times including most recently the F18 alpha, but continue to have random bugs and MCE's.  F13 continues to work.

I finally took the time to experiment some more.  It looks now like the bug began when CONFIG_INTEL_IDLE was turned on for F14.  Upstream report:

http://mid.gmane.org/<20121005222357.GC30139>

I'm not a completely positive this is the same bug, but for now it looks likely.  I'll do some more work with the modified kernel and report the results.

I'd also like to work out how to create an F18 lived CD with a modified kernel to see whether this makes F18 reliable for me.

Resetting the bug's state to ASSIGNED, but let me know if that's not the right thing to do.

Comment 17 J. Bruce Fields 2012-10-10 01:37:07 UTC
I just noticed intel_idle has a "max_cstate" parameter.  Booting with "intel_idle.max_cstate = 0" also fixes the problem without the need to rebuild the kernel.

I can now get through a Fedora 18 install successfully, whereas previously it always crashed either during the install itself or in the initial post-boot configuration.

Comment 18 Matthew Garrett 2012-10-10 03:00:49 UTC
Any chance you can incrementally increase max_cstate until the point where it starts failing?

Comment 19 J. Bruce Fields 2012-10-10 13:45:54 UTC
Yep.  So far:

intel_idle.max_cstate=2 is bad
intel_idle.max_cstate=1 is good?
intel_idle.max_cstate=0 is good

A question mark for max_cstate=1 just because my quick "dd" reproducer hasn't been 100% reliable.  It's probably good, but I'll do my work with max_cstate=1 today (as I did yesterday with max_cstate=0) and report if it crashes.

This is all with Fedora 18 and 3.6.1-1.fc18.x86_64.

Comment 20 Len Brown 2012-10-11 04:30:57 UTC
Booting with intel_idle.max_cstate=0,
and then with intel_idle.max_cstate=1,
please show the output from

dmesg | grep idle
grep . /sys/devices/system/cpu/cpu0/cpuidle/*/*

Comment 21 J. Bruce Fields 2012-10-11 14:29:30 UTC
To confirm: it survived all day yesterday with max_cstate=1.  So max_cstate=2 is the first that reproduces the bug.

With max_cstate=0:

# dmesg|grep idle
[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.6.1-1.fc18.x86_64 root=UUID=0526310d-dcb8-4371-a785-752590fe62c1 ro rd.md=0 rd.lvm=0 rd.dm=0 rd.luks=0 rhgb quiet intel_idle.max_cstate=0
[    0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-3.6.1-1.fc18.x86_64 root=UUID=0526310d-dcb8-4371-a785-752590fe62c1 ro rd.md=0 rd.lvm=0 rd.dm=0 rd.luks=0 rhgb quiet intel_idle.max_cstate=0
[    0.002931] process: using mwait in idle threads
[    0.922878] intel_idle: disabled
[    1.198689] cpuidle: using governor ladder
[    1.198690] cpuidle: using governor menu
# grep . /sys/devices/system/cpu/cpu0/cpuidle/*/*
grep: /sys/devices/system/cpu/cpu0/cpuidle/*/*: No such file or directory

With max_cstate=1:

# dmesg|grep idle
[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.6.1-1.fc18.x86_64 root=UUID=0526310d-dcb8-4371-a785-752590fe62c1 ro rd.md=0 rd.lvm=0 rd.dm=0 rd.luks=0 rhgb quiet intel_idle.max_cstate=1
[    0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-3.6.1-1.fc18.x86_64 root=UUID=0526310d-dcb8-4371-a785-752590fe62c1 ro rd.md=0 rd.lvm=0 rd.dm=0 rd.luks=0 rhgb quiet intel_idle.max_cstate=1
[    0.002926] process: using mwait in idle threads
[    0.923045] intel_idle: MWAIT substates: 0x1120
[    0.923050] intel_idle: v0.4 model 0x1E
[    0.923051] intel_idle: lapic_timer_reliable_states 0x2
[    0.923052] intel_idle: max_cstate 1 reached
[    0.923060] intel_idle: max_cstate 1 reached
[    0.923064] intel_idle: max_cstate 1 reached
[    0.923067] intel_idle: max_cstate 1 reached
[    0.923068] intel_idle: max_cstate 1 reached
[    1.198811] cpuidle: using governor ladder
[    1.198840] cpuidle: using governor menu
# grep . /sys/devices/system/cpu/cpu0/cpuidle/*/*
/sys/devices/system/cpu/cpu0/cpuidle/state0/desc:CPUIDLE CORE POLL IDLE
/sys/devices/system/cpu/cpu0/cpuidle/state0/disable:0
/sys/devices/system/cpu/cpu0/cpuidle/state0/latency:0
/sys/devices/system/cpu/cpu0/cpuidle/state0/name:POLL
/sys/devices/system/cpu/cpu0/cpuidle/state0/power:4294967295
/sys/devices/system/cpu/cpu0/cpuidle/state0/time:28659510
/sys/devices/system/cpu/cpu0/cpuidle/state0/usage:236965
/sys/devices/system/cpu/cpu0/cpuidle/state1/desc:MWAIT 0x00
/sys/devices/system/cpu/cpu0/cpuidle/state1/disable:0
/sys/devices/system/cpu/cpu0/cpuidle/state1/latency:3
/sys/devices/system/cpu/cpu0/cpuidle/state1/name:C1-NHM
/sys/devices/system/cpu/cpu0/cpuidle/state1/power:4294967294
/sys/devices/system/cpu/cpu0/cpuidle/state1/time:63419910896
/sys/devices/system/cpu/cpu0/cpuidle/state1/usage:92907312

Comment 22 Len Brown 2012-10-11 15:58:46 UTC
It appears that when you disable intel_idle via intel_idle.max_cstate=0,
that instead of running acpi_idle, you are running with no C-states at all.

Do you have ACPI C-states disabled in the BIOS or the Linux
acpi "processor" driver disabled?  Please go into BIOS SETUP
and select defaults and verify that you see the same thing.
Also look for BIOS options related to idle c-states.

Please attach the output of "acpidump" to this bug report.

In the intel_idle.max_cstate=0 case, on an idle system,
please show the output from
# turbostat -v sleep 1

turbostat and acpidump can be found in the latest upstream kernel tree
under utils/power/

Comment 23 J. Bruce Fields 2012-10-11 22:00:16 UTC
(In reply to comment #22)
> It appears that when you disable intel_idle via intel_idle.max_cstate=0,
> that instead of running acpi_idle, you are running with no C-states at all.
> 
> Do you have ACPI C-states disabled in the BIOS or the Linux
> acpi "processor" driver disabled?

Apologies, I don't know how to answer either of those questions!

> Please go into BIOS SETUP
> and select defaults and verify that you see the same thing.
> Also look for BIOS options related to idle c-states.

The only possibly relevant items I see in the BIOS menus are "C1E support", "SpeedStep", and "TurboMode".  All are set to "enabled".

I did try restoring all BIOS defaults.  Output above in the max_cstate=0 case was unchanged.  (Still no /sys/devices/system/cpu/cpu0/cpuidle directory.)

> Please attach the output of "acpidump" to this bug report.
> 
> In the intel_idle.max_cstate=0 case, on an idle system,
> please show the output from
> # turbostat -v sleep 1
> 
> turbostat and acpidump can be found in the latest upstream kernel tree
> under utils/power/

(Actually looks like they're in tools/power/acpi and tools/power/x86/turbostat).  Thanks, I'll do that next.

Comment 24 J. Bruce Fields 2012-10-11 22:02:52 UTC
Created attachment 625706 [details]
acpidump output

Comment 25 J. Bruce Fields 2012-10-11 22:03:38 UTC
Created attachment 625707 [details]
turbostat output

Comment 26 Len Brown 2012-10-11 22:42:09 UTC
Please verify that this motherboard officially supports this processor,
that you are running the latest BIOS, and that the BIOS
supports this processor.

Even in ACPI mode, this box is running with C1 in idle only,
which is an indication that something is quite wrong.

in the FADT...

[05Fh 0095   1]                 _CST Support : E3
[060h 0096   2]                   C2 Latency : 0065
[062h 0098   2]                   C3 Latency : 03E9

which translate to 101 and 1001 decimal, which disable C2 and C3
in non-CST mode.  The E3 means that the BIOS wants the OS to
tell it that the OS has _CST support, but the tables you sent
don't have any _CST present.

Are there any dynamic tables in
/sys/firmware/acpi/tables/dynamic
If yes, please attach them.

BTW. It is also interesting that your BIOS would offer to disable C1E,
as that would void the warranty on your processor.

Comment 27 Len Brown 2012-10-12 02:53:40 UTC
Looking at shuttle's web site, this product
claims to support the i7-870 processor.

However, their BIOS download page has only this description
for version 2010/09/01 BIOS:

"Improved stability for some CPUs."

So it would be a good idea to verify you've got that version or later.

http://global.shuttle.com/products/productsDownload?productId=1409

What do you see here?:
$ grep . /sys/devices/system/cpu/cpu0/cpufreq/*

One possibility is that there is an electrical problem on this
board and Shuttle tried to de-feature voltage scaling in their BIOS.
Under "Advanced", what is "Intel(R)SpeedStep(tm) tech" set to?

if it is off and if you enable it when C-states are off
and you see stability issues, that may indicate a voltage issue.

If you have an easy way to reproduce the failure, I'd be interested
to know if they settings under 
"Advanced"/"Frequency Voltage Configuration"
have an effect.  In particular, does the system get more stable
if you increase the processor and DIMM voltages?

Please show the output from

# turbostat -M 0xe2 sleep 1

MSR 0xE2 is the MSR_PKG_CST_CONFIG_CONTROL register.
The bottom 3 bits say what the deepest enabled package C-state is.
If this is mis-configured, then using the core c-states could result
in a package c-state which has issues.  If bit 15 is clear, then
this MSR is unlocked and you could write this MSR with the bottom 3-bits
clear to disable package C-states.  Note this is a per-core MSR, so you'd
use the version of wrmsr with the -a capability.  turbostat or rdmsr -a
can tell you if it worked.

If the MSR is locked, then a low-tech way to prevent package c-states
(as a test) is to have 1 thread running (eg, a spin loop), and see
if the other 3 cores are able to get into a deep core c-state w/o problems.
(as shown by turbostat).

Comment 28 J. Bruce Fields 2012-10-12 12:52:13 UTC
(In reply to comment #27)
> However, their BIOS download page has only this description
> for version 2010/09/01 BIOS:
> 
> "Improved stability for some CPUs."
> 
> So it would be a good idea to verify you've got that version or later.

In fact, the BIOS reports version 103 (06/18/10); thanks for the suggestion, I'll try their latest.  Results below are before doing that, and with max_cstate still 0:

> http://global.shuttle.com/products/productsDownload?productId=1409
> 
> What do you see here?:
> $ grep . /sys/devices/system/cpu/cpu0/cpufreq/*

/sys/devices/system/cpu/cpu0/cpufreq/affected_cpus:0
/sys/devices/system/cpu/cpu0/cpufreq/bios_limit:2934000
/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq:1200000
/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq:2934000
/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq:1200000
/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_transition_latency:10000
/sys/devices/system/cpu/cpu0/cpufreq/related_cpus:0 1 2 3 4 5 6 7
/sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies:2934000 2933000 2800000 2667000 2533000 2400000 2267000 2133000 2000000 1867000 1733000 1600000 1467000 1333000 1200000 
/sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors:conservative userspace powersave ondemand performance 
/sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq:1200000
/sys/devices/system/cpu/cpu0/cpufreq/scaling_driver:acpi-cpufreq
/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:ondemand
/sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq:2934000
/sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq:1200000
/sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed:<unsupported>


> One possibility is that there is an electrical problem on this
> board and Shuttle tried to de-feature voltage scaling in their BIOS.
> Under "Advanced", what is "Intel(R)SpeedStep(tm) tech" set to?
> 
> if it is off and if you enable it when C-states are off
> and you see stability issues, that may indicate a voltage issue.

That's set to "enabled" and always has been.

> If you have an easy way to reproduce the failure,

I think I can reproduce it reliably in under an hour.

> I'd be interested
> to know if they settings under 
> "Advanced"/"Frequency Voltage Configuration"
> have an effect.  In particular, does the system get more stable
> if you increase the processor and DIMM voltages?

I could try that, sure.

> Please show the output from
> 
> # turbostat -M 0xe2 sleep 1

# ./turbostat -M 0xe2 sleep 1
cor CPU    %c0  GHz  TSC           MSR 0x0E2    %c1    %c3    %c6   %pc3   %pc6
          0.05 1.20 2.93  0x0000000000000000  99.95   0.00   0.00   0.00   0.00
  0   0   0.05 1.20 2.93  0x0000000000000003  99.95   0.00   0.00   0.00   0.00
  0   4   0.04 1.20 2.93  0x0000000000000003  99.96
  1   1   0.04 1.20 2.93  0x0000000000000003  99.96   0.00   0.00
  1   5   0.11 1.20 2.93  0x0000000000000003  99.89
  2   2   0.04 1.20 2.93  0x0000000000000003  99.96   0.00   0.00
  2   6   0.02 1.20 2.93  0x0000000000000003  99.98
  3   3   0.03 1.20 2.93  0x0000000000000003  99.97   0.00   0.00
  3   7   0.03 1.20 2.93  0x0000000000000003  99.97
1.001803 sec

> MSR 0xE2 is the MSR_PKG_CST_CONFIG_CONTROL register.
> The bottom 3 bits say what the deepest enabled package C-state is.
> If this is mis-configured, then using the core c-states could result
> in a package c-state which has issues.  If bit 15 is clear, then
> this MSR is unlocked and you could write this MSR with the bottom 3-bits
> clear to disable package C-states.  Note this is a per-core MSR, so you'd
> use the version of wrmsr with the -a capability.

"yum install msr-tools" gets me wr/rdmsr without any (documented) "-a" option, and googling isn't finding anything else.  Would

  for (( i=0; i<9; i++ )); do wrmsr -p$i 0xe2; done

do the job?

> turbostat or rdmsr -a can tell you if it worked.

Apologies, I'm not completely sure what you're asking for here.  The problem was only reproduceable on booting with intel_idle.max_cstate >= 2.  So I should boot with max_cstate >=2, then try the above wrmsr, then see if the problem still occurs?

Anyway, I'm assuming I should try the BIOS upgrade first.

> If the MSR is locked, then a low-tech way to prevent package c-states
> (as a test) is to have 1 thread running (eg, a spin loop), and see
> if the other 3 cores are able to get into a deep core c-state w/o problems.
> (as shown by turbostat).

Comment 29 Len Brown 2012-10-12 14:01:25 UTC
Yes, best to focus first on updating the BIOS.

After the upgrade, please re-send:

acpidump output
and with intel_idle.max_cstate=0
    grep . /sys/devices/system/cpu/cpu0/cpuidle/*/*

    turbostat -M 0xe2 sleep 1

Thanks for verifying that P-states are enabled.
That suggests that the configuration isn't totally crippled.

Note, however, that the lack of deep C-state support
on this configuration may prevent you from reaching
maximum frequency:

9 * 133 = 1200 MHz max efficiency
22 * 133 = 2933 MHz TSC frequency
24 * 133 = 3200 MHz max turbo 4 active cores
24 * 133 = 3200 MHz max turbo 3 active cores
26 * 133 = 3467 MHz max turbo 2 active cores
27 * 133 = 3600 MHz max turbo 1 active cores

you can find out with a simple test.

# cat /dev/zero > /dev/null &
# cat /dev/zero > /dev/null &
# turbostat

and see if you get up to 3.4 Ghz.
kill one of the threads and see if you can get up to 3.6 GHz.
It is possible that the lack of C-states deeper than C1
will limit turbo to 3.2 GHz.

Good news on MSR 0xE2.  First, bit 15 is clear, so this MSR
is unlocked and enabled for writing.  The 3 means that PC6
is enabled.  Set this MSR to 0 and re-test.  (and re-run
your test above to see if you can then get to 3.6 Ghz:-)

Comment 30 Len Brown 2012-10-12 14:04:50 UTC
Created attachment 626014 [details]
rdmsr.c

Here is the rdmsr.c that I use.
I modified it some time ago to add the -a parameter.
Looks like I failed to get that change back upstream.

Comment 31 Len Brown 2012-10-12 14:05:43 UTC
Created attachment 626015 [details]
wrmsr.c

This version has -a option

Comment 32 J. Bruce Fields 2012-10-14 15:30:59 UTC
Thanks!

The BIOS upgrade did indeed help: the machine's been running for a couple days with intel_idle.max_cstate=2 without any crashes.  But I also have a backup of the original BIOS and would be happy to reflash back to that if it would be useful.  (Presumably it was buggy, but should the kernel have been able to work around whatever the problem was?)

(For future reference, the BIOS upgrade was:
  # download and unzip update from http://global.shuttle.com/products/productsDownload?productId=1409

  yum install flashrom
  flashrom -pinternal -r backup.bin
  flashrom -pinternal -w SH55JSHU.107
)

Comment 33 J. Bruce Fields 2012-10-14 15:39:16 UTC
With new BIOS:
# cat /sys/module/intel_idle/parameters/max_cstate 
0
# grep . /sys/devices/system/cpu/cpu0/cpuidle/*/*
grep: /sys/devices/system/cpu/cpu0/cpuidle/*/*: No such file or directory

Comment 34 J. Bruce Fields 2012-10-14 15:41:42 UTC
Also with new BIOS and max_cstate=0, turbostat output looks the same?:

# ./turbostat -M 0xe2 sleep 1
cor CPU    %c0  GHz  TSC           MSR 0x0E2    %c1    %c3    %c6   %pc3   %pc6
          2.13 1.20 2.93  0x0000000000000000  97.87   0.00   0.00   0.00   0.00
  0   0   3.61 1.20 2.93  0x0000000000000003  96.39   0.00   0.00   0.00   0.00
  0   4   1.47 1.20 2.93  0x0000000000000003  98.53
  1   1   6.14 1.20 2.93  0x0000000000000003  93.86   0.00   0.00
  1   5   0.12 1.20 2.93  0x0000000000000003  99.88
  2   2   0.42 1.20 2.93  0x0000000000000003  99.58   0.00   0.00
  2   6   0.02 1.20 2.93  0x0000000000000003  99.98
  3   3   5.21 1.20 2.93  0x0000000000000003  94.79   0.00   0.00
  3   7   0.05 1.20 2.93  0x0000000000000003  99.95
1.001825 sec

Comment 35 J. Bruce Fields 2012-10-14 15:43:21 UTC
Created attachment 627002 [details]
acpidump output (new BIOS, max_cstate=0)

Comment 36 J. Bruce Fields 2012-10-14 16:13:39 UTC
"you can find out with a simple test"

Right, I never see anything in the "GHz" column over 3.20.

After:

# ./wrmsr -a 0x0E2 0
[root@pop ~]# ./rdmsr -a 0x0E2
0
0
0
0
0
0
0
0

there's no change--still nothing over 3.20.

Comment 37 Len Brown 2012-10-16 21:53:47 UTC
re: comment #32 "intel_idle.max_cstate=2 is now stable"

Promising news.

Please show the turbostat output for this case
to verify that we are getting c-state residency we expect.

Please also try with no intel_idle.max_cstate parameter at all,
(the out of the box case)
to see if we can get stable c6 residency in addition
to c3 residency.  Send turbostat output.

re: comment #33

this is with intel_idle not loaded, yes?
("dmesg |grep idle" will confirm you what is loaded)

This is consistent with comment #32
where it seems that ACPI mode is still exporting just C1,
and thus you'll see no cpuidle stuff in sysfs.

Indeed, it is strange (and likely some sort of BIOS bug,
that you're getting only C1 in ACPI mode)

re: comment #35

I'll have to get back to you on this - maybe can figure
out why ACPI is exporting just C1 -- though we care here
more about the intel_idle case -- the ACPI case is primarily
for comparison.

re: comment #36

Okay, in the ACPI case where you have just C1,
you can never run faster than 3.2 GHz.
Certainly that isn't what customers will want.
Unclear if Shuttle disabled this on purpose,
say, they don't have power or cooling capacity,
or if this is a BIOS bug.  Of course it would be interesting
to see what windows does on this box.  It will use ACPI,
and its C-states should be visible in its perfmon utility.

re: clearing of MSR 0xE2.
This is interesting only if you have a failure with deep c-states
and this makes it go away.  But since we don't have a failure
to fix at the moment, this doesn't tell us anything.

Comment 38 J. Bruce Fields 2012-10-25 01:03:07 UTC
Apologies for the delayed response:

(In reply to comment #37)
> re: comment #32 "intel_idle.max_cstate=2 is now stable"
> 
> Promising news.
> 
> Please show the turbostat output for this case
> to verify that we are getting c-state residency we expect.

[root@pop turbostat]# cat /sys/module/intel_idle/parameters/max_cstate 
2
[root@pop turbostat]# ./turbostat sleep 1
cor CPU    %c0  GHz  TSC    %c1    %c3    %c6   %pc3   %pc6
          5.21 1.20 2.93   7.12  87.67   0.00  43.23   0.00
  0   0   3.99 1.20 2.93   2.03  93.98   0.00  43.23   0.00
  0   4   0.04 1.20 2.93   5.98
  1   1   8.50 1.20 2.93   2.69  88.81   0.00
  1   5   0.05 1.20 2.93  11.15
  2   2  12.56 1.20 2.93  15.93  71.51   0.00
  2   6  13.78 1.20 2.93  14.71
  3   3   2.69 1.20 2.93   0.93  96.39   0.00
  3   7   0.05 1.19 2.93   3.57
1.002127 sec

> Please also try with no intel_idle.max_cstate parameter at all,
> (the out of the box case)
> to see if we can get stable c6 residency in addition
> to c3 residency.  Send turbostat output.

# cat /sys/module/intel_idle/parameters/max_cstate 
7
[root@pop turbostat]# ./turbostat sleep 1
cor CPU    %c0  GHz  TSC    %c1    %c3    %c6   %pc3   %pc6
          4.42 1.25 2.93   7.83  46.71  41.04  38.62   4.58
  0   0   7.25 1.24 2.93   3.04  69.86  19.85  38.62   4.58
  0   4   0.39 1.62 2.93   9.90
  1   1   3.26 1.25 2.93   9.48  33.45  53.81
  1   5   5.55 1.25 2.93   7.18
  2   2   7.39 1.22 2.93   3.33  40.26  49.03
  2   6   0.36 1.66 2.93  10.36
  3   3   9.36 1.23 2.93   5.90  43.26  41.48
  3   7   1.79 1.37 2.93  13.47
1.002048 sec

Neato.

> re: comment #33
> 
> this is with intel_idle not loaded, yes?
> ("dmesg |grep idle" will confirm you what is loaded)

I don't remember.... Booting with max_cstate=0 to check: that's right, it's not loaded.

> This is consistent with comment #32
> where it seems that ACPI mode is still exporting just C1,
> and thus you'll see no cpuidle stuff in sysfs.
> 
> Indeed, it is strange (and likely some sort of BIOS bug,
> that you're getting only C1 in ACPI mode)
> 
> re: comment #35
> 
> I'll have to get back to you on this - maybe can figure
> out why ACPI is exporting just C1 -- though we care here
> more about the intel_idle case -- the ACPI case is primarily
> for comparison.
> 
> re: comment #36
> 
> Okay, in the ACPI case where you have just C1,
> you can never run faster than 3.2 GHz.
> Certainly that isn't what customers will want.
> Unclear if Shuttle disabled this on purpose,
> say, they don't have power or cooling capacity,
> or if this is a BIOS bug.  Of course it would be interesting
> to see what windows does on this box.  It will use ACPI,
> and its C-states should be visible in its perfmon utility.

I'm pretty ignorant of Windows--unless there's some Windows equivalent to a live CD that I could get my hands on easily, getting it on this box is probably more of a project than I can take on right now.

Comment 39 Len Brown 2012-10-31 05:58:53 UTC
So your system is stable and working properly after the BIOS upgrade,
and with no special boot parameters, intel_idle is loading,
c6 and pc6 are being utilized?

I expect you will also find that turbo mode goes faster now.
try a single-threaded cycle-soaker
# cat /dev/zero > /dev/null &
and see if turbostat shows that you are now able to get
past 3.2 Ghz.

If this is the case, then this bug is closed, yes?

The remaining mystery is actually why legacy ACPI mode
(intel_idle.max_cstate=0) that you see only C1.
That, of course, would be an ACPI-mode bug, not an intel_idle bug:-)
If you file that bug, I'll look at it.

Comment 40 J. Bruce Fields 2012-10-31 13:58:40 UTC
> see if turbostat shows that you are now able to get past 3.2 Ghz.

Yep, looks like it:

cor CPU    %c0  GHz  TSC    %c1    %c3    %c6   %pc3   %pc6
         14.86 3.46 2.93  18.26  42.67  24.21   0.00   0.00
  0   0   5.22 3.16 2.93   6.30  57.91  30.57   0.00   0.00
  0   4   1.96 3.05 2.93   9.56
  1   1   4.24 3.16 2.93   4.77  39.02  51.96
  1   5   1.10 2.74 2.93   7.91
  2   2  98.38 3.52 2.93   1.62   0.00   0.00
  2   6   0.13 3.33 2.93  99.87
  3   3   6.43 3.23 2.93   5.51  73.74  14.32
  3   7   1.42 3.22 2.93  10.52

> If this is the case, then this bug is closed, yes?

My one remaining concern aside from the ACPI mode behavior is whether the kernel could have worked around the buggy BIOS.  I'm lame for not thinking to check for a BIOS upgrade, but: my experience as a user was that a machine that had been stable for months under F13 suddenly started crashing on upgrade to F14, so my first thought was to blame the software....

That said, my immediate problems are solved so I'm not going to push for anything more unless you judge it's a big priority--I'm fine with closing the bug.

Comment 41 Len Brown 2012-10-31 19:27:52 UTC
I recommend closing this bug.

I don't think Linux can check for this issue in the general case --
since we have no idea what the BIOS changed for
"Improved stability for some CPUs"

In theory, we could add a specific DMI check for the bad BIOS version --
but we typically don't do that when there is a known good BIOS.
And this is a pretty low-volume system, making it hard to justify
carrying code to check BIOS version.

Finally, this is an end-user assembled "bare bones" system.
The integrator selected and installed an i7-870, but failed
to notice that they paid extra for higher MHz, but the system
didn't deliver that MHz.  To say that FC13 was functioning would
be fair, but it with no C-states and no turbo-mode, it wasn't
working properly, and it is likely that most system integrators
would have noticed that and installed the latest BIOS as part
of system integration.

I think that it is an additional bug that Linux in ACPI mode
(intel_idle.max_cstate=0) is not working properly on this system,
and I would be interested in debugging that one if you open
a new report for it.

Comment 42 J. Bruce Fields 2012-11-13 01:42:27 UTC
(In reply to comment #41)
> I recommend closing this bug.
> 
> I don't think Linux can check for this issue in the general case --
> since we have no idea what the BIOS changed for
> "Improved stability for some CPUs"
> 
> In theory, we could add a specific DMI check for the bad BIOS version --
> but we typically don't do that when there is a known good BIOS.
> And this is a pretty low-volume system, making it hard to justify
> carrying code to check BIOS version.

OK, makes sense.

> Finally, this is an end-user assembled "bare bones" system.
> The integrator selected and installed an i7-870, but failed
> to notice that they paid extra for higher MHz, but the system
> didn't deliver that MHz.  To say that FC13 was functioning would
> be fair, but it with no C-states and no turbo-mode, it wasn't
> working properly, and it is likely that most system integrators
> would have noticed that and installed the latest BIOS as part
> of system integration.

Yeah, my bad; it worked and built my kernels fast enough, so I was happy....

> I think that it is an additional bug that Linux in ACPI mode
> (intel_idle.max_cstate=0) is not working properly on this system,
> and I would be interested in debugging that one if you open
> a new report for it.

OK, I've opened bug 875988.

Thanks for all your help!


Note You need to log in before you can comment on or make changes to this bug.