Bug 1152717 - I have kernel oops "mce: [Hardware Error]: Machine check events logged" but mcelog not provide any additional information
I have kernel oops "mce: [Hardware Error]: Machine check events logged" but m...
Status: CLOSED EOL
Product: Fedora
Classification: Fedora
Component: mcelog (Show other bugs)
21
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Prarit Bhargava
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2014-10-14 15:06 EDT by Mikhail
Modified: 2015-12-02 11:23 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-12-01 23:17:54 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
oops-2014-10-14-22:19:50-716-0 (32.23 KB, application/x-gzip)
2014-10-14 15:06 EDT, Mikhail
no flags Details
kernel 3.17.7 log (128.37 KB, text/plain)
2015-01-01 12:23 EST, Mikhail
no flags Details
collection of oops (284.02 KB, application/x-7z-compressed)
2015-01-02 07:47 EST, Mikhail
no flags Details

  None (edit)
Description Mikhail 2014-10-14 15:06:53 EDT
Created attachment 947005 [details]
oops-2014-10-14-22:19:50-716-0

Description of problem:
I have kernel oops "mce: [Hardware Error]: Machine check events logged" but mcelog not provide any additional information :(

[root@localhost ~]# mcelog 
[root@localhost ~]# cat /dev/mcelog
# rpm -q mcelog
mcelog-1.0-0.13.f0d7654.fc21.x86_64
Comment 1 Prarit Bhargava 2014-12-22 08:29:06 EST
Mikhail, I have done an update of the mcelog package.  Could you test and add karma so that it gets pushed to stable?

https://admin.fedoraproject.org/updates/FEDORA-2014-17348/mcelog-101-1.9bfaad8f92c5.fc21

P.
Comment 2 Christopher Beland 2014-12-29 12:32:19 EST
I installed mcelog-101-1.9bfaad8f92c5.fc21.x86_64 from updates-testing, but I'm still having this problem as well.

"systemctl status mcelog" reports:

● mcelog.service - Machine Check Exception Logging Daemon
   Loaded: loaded (/usr/lib/systemd/system/mcelog.service; enabled)
   Active: active (running) since Mon 2014-12-29 11:06:04 EST; 1h 19min ago
  Process: 751 ExecStartPre=/etc/mcelog/mcelog.setup (code=exited, status=0/SUCCESS)
 Main PID: 784 (mcelog)
   CGroup: /system.slice/mcelog.service
           └─784 /usr/sbin/mcelog --ignorenodev --daemon --foreground

Dec 29 12:20:57 localhost.localdomain mcelog[784]: MCGCAP c07 APICID 1 SOCKETID 0
Dec 29 12:20:57 localhost.localdomain mcelog[784]: CPUID Vendor Intel Family 6 Model 58
Dec 29 12:20:57 localhost.localdomain mcelog[784]: Hardware event. This is not a software error.
Dec 29 12:20:57 localhost.localdomain mcelog[784]: MCE 3
Dec 29 12:20:57 localhost.localdomain mcelog[784]: CPU 0 THERMAL EVENT TSC a336f7da332
Dec 29 12:20:57 localhost.localdomain mcelog[784]: TIME 1419873636 Mon Dec 29 12:20:36 2014
Dec 29 12:20:57 localhost.localdomain mcelog[784]: Processor 0 below trip temperature. Throttling disabled
Dec 29 12:20:57 localhost.localdomain mcelog[784]: STATUS 8802028a MCGSTATUS 0
Dec 29 12:20:57 localhost.localdomain mcelog[784]: MCGCAP c07 APICID 0 SOCKETID 0
Dec 29 12:20:57 localhost.localdomain mcelog[784]: CPUID Vendor Intel Family 6 Model 58

---

This is what abrt is telling me:

The kernel log indicates that hardware errors were detected.
The data was saved by kernel for processing by the mcelog tool.
However, neither /var/log/mcelog nor system log contain mcelog messages.
Most likely reason is that mcelog is not installed or not configured
to be started during boot.
Without this tool running, the binary data saved by kernel
is of limited usefulness.
(You can save this data anyway by running 'cat </dev/mcelog >FILE').
The recommended course of action is to install mcelog.
If another hardware error would occur, a user-readable description
of it will be saved in system log or /var/log/mcelog.

---

/var/log/mcelog does not exist.
Comment 3 igor.redhat@gmail.com 2014-12-30 14:05:38 EST
I'm getting mcelog errors with mcelog-101-1.9bfaad8f92c5.fc21.x86_64 too. systemctl status mcelog doesn't report anything for me but here is the output of journalctl -b -u mcelog.service:

-- Logs begin at Fri 2014-08-01 21:53:38 EDT, end at Tue 2014-12-30 14:02:45 EST. --
Dec 30 13:43:38 iy50 mcelog[754]: Hardware event. This is not a software error.
Dec 30 13:43:38 iy50 mcelog[754]: MCE 0
Dec 30 13:43:38 iy50 mcelog[754]: CPU 0 BANK 5
Dec 30 13:43:38 iy50 mcelog[754]: MISC 38a0000086 ADDR ff881e00
Dec 30 13:43:38 iy50 mcelog[754]: TIME 1419965015 Tue Dec 30 13:43:35 2014
Dec 30 13:43:38 iy50 mcelog[754]: MCG status:
Dec 30 13:43:38 iy50 mcelog[754]: MCi status:
Dec 30 13:43:38 iy50 mcelog[754]: Error overflow
Dec 30 13:43:38 iy50 mcelog[754]: Uncorrected error
Dec 30 13:43:38 iy50 mcelog[754]: MCi_MISC register valid
Dec 30 13:43:38 iy50 mcelog[754]: MCi_ADDR register valid
Dec 30 13:43:38 iy50 mcelog[754]: Processor context corrupt
Dec 30 13:43:38 iy50 mcelog[754]: MCA: corrected filtering (some unreported errors in same region)
Dec 30 13:43:38 iy50 mcelog[754]: Generic CACHE Level-2 Generic Error
Dec 30 13:43:38 iy50 mcelog[754]: STATUS ee0000000040110a MCGSTATUS 0
Dec 30 13:43:38 iy50 mcelog[754]: MCGCAP c09 APICID 0 SOCKETID 0
Dec 30 13:43:38 iy50 mcelog[754]: CPUID Vendor Intel Family 6 Model 60
Dec 30 13:43:38 iy50 mcelog[754]: Hardware event. This is not a software error.
Dec 30 13:43:38 iy50 mcelog[754]: MCE 1
Dec 30 13:43:38 iy50 mcelog[754]: CPU 0 BANK 6
Dec 30 13:43:38 iy50 mcelog[754]: MISC 38a0000086 ADDR ff882600
Dec 30 13:43:38 iy50 mcelog[754]: TIME 1419965015 Tue Dec 30 13:43:35 2014

Running with kernel 3.17.7-300.fc21.x86_64. This happens every time on resume from suspend and doesn't seem to have any adverse effects.
Comment 4 Mikhail 2015-01-01 12:22:49 EST
(In reply to Prarit Bhargava from comment #1)
> Mikhail, I have done an update of the mcelog package.  Could you test and
> add karma so that it gets pushed to stable?
> 
> https://admin.fedoraproject.org/updates/FEDORA-2014-17348/mcelog-101-1.
> 9bfaad8f92c5.fc21
> 
> P.

[root@localhost ~]# systemctl status mcelog
● mcelog.service - Machine Check Exception Logging Daemon
   Loaded: loaded (/usr/lib/systemd/system/mcelog.service; enabled)
   Active: active (running) since Thu 2015-01-01 18:11:52 YEKT; 4h 9min ago
  Process: 673 ExecStartPre=/etc/mcelog/mcelog.setup (code=exited, status=0/SUCCESS)
 Main PID: 714 (mcelog)
   CGroup: /system.slice/mcelog.service
           └─714 /usr/sbin/mcelog --ignorenodev --daemon --foreground

Jan 01 22:16:37 localhost.localdomain mcelog[714]: CPU 2 BANK 0
Jan 01 22:16:37 localhost.localdomain mcelog[714]: TIME 1420132597 Thu Jan  1 22:16:37 2015
Jan 01 22:16:37 localhost.localdomain mcelog[714]: MCG status:
Jan 01 22:16:37 localhost.localdomain mcelog[714]: MCi status:
Jan 01 22:16:37 localhost.localdomain mcelog[714]: Corrected error
Jan 01 22:16:37 localhost.localdomain mcelog[714]: Error enabled
Jan 01 22:16:37 localhost.localdomain mcelog[714]: MCA: Internal parity error
Jan 01 22:16:37 localhost.localdomain mcelog[714]: STATUS 90000040000f0005 MCGSTATUS 0
Jan 01 22:16:37 localhost.localdomain mcelog[714]: MCGCAP c09 APICID 4 SOCKETID 0
Jan 01 22:16:37 localhost.localdomain mcelog[714]: CPUID Vendor Intel Family 6 Model 60



[   55.529160] SELinux: initialized (dev proc, type proc), uses genfs_contexts
[   68.867669] DMA-API: debugging out of memory - disabling
[   84.768597] SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs
[   92.709445] pool (3211) used greatest stack depth: 11424 bytes left
[ 1158.635168] Render (5567) used greatest stack depth: 11200 bytes left
[ 1440.144384] kvm: zapping shadow pages for mmio generation wraparound
[ 1528.088490] nf_conntrack: automatic helper assignment is deprecated and it will be removed soon. Use the iptables CT target to attach helpers instead.
[ 2431.906716] qemu-system-x86 (6569) used greatest stack depth: 11088 bytes left
[ 2963.509258] SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs
[ 3096.788500] ------------[ cut here ]------------
[ 3096.788516] WARNING: CPU: 3 PID: 2451 at lib/dma-debug.c:593 debug_dma_assert_idle+0x1a4/0x220()
[ 3096.788518] i915 0000:00:02.0: DMA-API: cpu touching an active dma mapped cacheline [cln=0x000000001e017fc0]
[ 3096.788519] Modules linked in: rfcomm fuse nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ip6t_REJECT xt_conntrack cfg80211 ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw bnep hid_logitech_dj iTCO_wdt iTCO_vendor_support ppdev btrfs xor x86_pkg_temp_thermal coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel raid6_pq vfat fat snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi serio_raw snd_hda_intel snd_hda_controller snd_emu10k1 snd_hda_codec snd_util_mem snd_hwdep snd_rawmidi
[ 3096.788580]  snd_seq i2c_i801 snd_ac97_codec ac97_bus snd_seq_device emu10k1_gp gameport snd_pcm snd_timer mei_me lpc_ich snd mei mfd_core shpchp soundcore btusb bluetooth parport_pc rfkill parport tpm_infineon tpm_tis tpm nfsd auth_rpcgss nfs_acl lockd binfmt_misc sunrpc uas usb_storage i915 i2c_algo_bit drm_kms_helper firewire_ohci firewire_core drm crc_itu_t r8169 mii video
[ 3096.788603] CPU: 3 PID: 2451 Comm: chrome Not tainted 3.17.7-300.fc21.x86_64+debug #1
[ 3096.788604] Hardware name: Gigabyte Technology Co., Ltd. Z87M-D3H/Z87M-D3H, BIOS F11 08/12/2014
[ 3096.788605]  0000000000000000 0000000039fb8405 ffff8807c9107c38 ffffffff8183413f
[ 3096.788616]  ffff8807c9107c80 ffff8807c9107c70 ffffffff810a348d ffff8807f2df9270
[ 3096.788618]  ffff8807eeb7e3c0 00001c6508481d28 ffffea001d6d33c0 ffff8804b4b57210
[ 3096.788620] Call Trace:
[ 3096.788623]  [<ffffffff8183413f>] dump_stack+0x4d/0x66
[ 3096.788627]  [<ffffffff810a348d>] warn_slowpath_common+0x7d/0xa0
[ 3096.788629]  [<ffffffff810a350c>] warn_slowpath_fmt+0x5c/0x80
[ 3096.788631]  [<ffffffff810ffaed>] ? trace_hardirqs_on+0xd/0x10
[ 3096.788634]  [<ffffffff814301f4>] debug_dma_assert_idle+0x1a4/0x220
[ 3096.788637]  [<ffffffff81206e24>] do_wp_page+0xf4/0x9a0
[ 3096.788639]  [<ffffffff812097dd>] ? handle_mm_fault+0x31d/0x1020
[ 3096.788640]  [<ffffffff81209dac>] handle_mm_fault+0x8ec/0x1020
[ 3096.788643]  [<ffffffff81068f7f>] ? __do_page_fault+0x1cf/0x620
[ 3096.788645]  [<ffffffff81068fe9>] __do_page_fault+0x239/0x620
[ 3096.788648]  [<ffffffff8140801e>] ? memzero_explicit+0xe/0x10
[ 3096.788650]  [<ffffffff81069401>] do_page_fault+0x31/0x70
[ 3096.788652]  [<ffffffff81840478>] page_fault+0x28/0x30
[ 3096.788653] ---[ end trace f5c75bbb4ecd1f18 ]---
[ 3096.788654] Mapped at:
[ 3096.788655]  [<ffffffff8142e7c2>] debug_dma_map_sg+0x52/0x190
[ 3096.788657]  [<ffffffffa00f16eb>] i915_gem_gtt_prepare_object+0xdb/0x110 [i915]
[ 3096.788680]  [<ffffffffa00f8c9b>] i915_gem_object_pin+0x4fb/0x7a0 [i915]
[ 3096.788688]  [<ffffffffa00eb9e5>] i915_gem_execbuffer_reserve_vma.isra.18+0x85/0x130 [i915]
[ 3096.788695]  [<ffffffffa00ebdb2>] i915_gem_execbuffer_reserve+0x322/0x350 [i915]
[ 4107.842482] qemu-system-x86 (11595) used greatest stack depth: 10544 bytes left
[ 5789.947486] perf interrupt took too long (2502 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
[ 6565.251654] SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs
[10166.951076] SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs
[13768.603726] SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs
[14704.304678] mce: [Hardware Error]: Machine check events logged
[root@localhost ~]# mcelog 
[root@localhost ~]# 


mcelog still not log my problems :(
Comment 5 Mikhail 2015-01-01 12:23:46 EST
Created attachment 974986 [details]
kernel 3.17.7 log
Comment 6 Mikhail 2015-01-01 18:00:46 EST
Demonstration: https://drive.google.com/file/d/0B0nwzlfiB4aQTnJERjN4RkpyQnM/view?usp=sharing


This hell begin occurs after I run Google Chrome in Virtual Machine (Windows XP) in gnome-boxes.
Comment 7 Mikhail 2015-01-02 07:47:03 EST
Created attachment 975194 [details]
collection of oops
Comment 8 Fedora End Of Life 2015-11-04 09:28:11 EST
This message is a reminder that Fedora 21 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 21. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '21'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 21 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.
Comment 9 Fedora End Of Life 2015-12-01 23:17:56 EST
Fedora 21 changed to end-of-life (EOL) status on 2015-12-01. Fedora 21 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.