Summary: | I have kernel oops "mce: [Hardware Error]: Machine check events logged" but mcelog not provide any additional information | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Mikhail <mikhail.v.gavrilov> | ||||||||
Component: | mcelog | Assignee: | Prarit Bhargava <prarit> | ||||||||
Status: | CLOSED EOL | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||
Severity: | unspecified | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | 21 | CC: | beland, igor.redhat, prarit | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2015-12-02 04:17:54 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Attachments: |
|
Mikhail, I have done an update of the mcelog package. Could you test and add karma so that it gets pushed to stable? https://admin.fedoraproject.org/updates/FEDORA-2014-17348/mcelog-101-1.9bfaad8f92c5.fc21 P. I installed mcelog-101-1.9bfaad8f92c5.fc21.x86_64 from updates-testing, but I'm still having this problem as well. "systemctl status mcelog" reports: ● mcelog.service - Machine Check Exception Logging Daemon Loaded: loaded (/usr/lib/systemd/system/mcelog.service; enabled) Active: active (running) since Mon 2014-12-29 11:06:04 EST; 1h 19min ago Process: 751 ExecStartPre=/etc/mcelog/mcelog.setup (code=exited, status=0/SUCCESS) Main PID: 784 (mcelog) CGroup: /system.slice/mcelog.service └─784 /usr/sbin/mcelog --ignorenodev --daemon --foreground Dec 29 12:20:57 localhost.localdomain mcelog[784]: MCGCAP c07 APICID 1 SOCKETID 0 Dec 29 12:20:57 localhost.localdomain mcelog[784]: CPUID Vendor Intel Family 6 Model 58 Dec 29 12:20:57 localhost.localdomain mcelog[784]: Hardware event. This is not a software error. Dec 29 12:20:57 localhost.localdomain mcelog[784]: MCE 3 Dec 29 12:20:57 localhost.localdomain mcelog[784]: CPU 0 THERMAL EVENT TSC a336f7da332 Dec 29 12:20:57 localhost.localdomain mcelog[784]: TIME 1419873636 Mon Dec 29 12:20:36 2014 Dec 29 12:20:57 localhost.localdomain mcelog[784]: Processor 0 below trip temperature. Throttling disabled Dec 29 12:20:57 localhost.localdomain mcelog[784]: STATUS 8802028a MCGSTATUS 0 Dec 29 12:20:57 localhost.localdomain mcelog[784]: MCGCAP c07 APICID 0 SOCKETID 0 Dec 29 12:20:57 localhost.localdomain mcelog[784]: CPUID Vendor Intel Family 6 Model 58 --- This is what abrt is telling me: The kernel log indicates that hardware errors were detected. The data was saved by kernel for processing by the mcelog tool. However, neither /var/log/mcelog nor system log contain mcelog messages. Most likely reason is that mcelog is not installed or not configured to be started during boot. Without this tool running, the binary data saved by kernel is of limited usefulness. (You can save this data anyway by running 'cat </dev/mcelog >FILE'). The recommended course of action is to install mcelog. If another hardware error would occur, a user-readable description of it will be saved in system log or /var/log/mcelog. --- /var/log/mcelog does not exist. I'm getting mcelog errors with mcelog-101-1.9bfaad8f92c5.fc21.x86_64 too. systemctl status mcelog doesn't report anything for me but here is the output of journalctl -b -u mcelog.service: -- Logs begin at Fri 2014-08-01 21:53:38 EDT, end at Tue 2014-12-30 14:02:45 EST. -- Dec 30 13:43:38 iy50 mcelog[754]: Hardware event. This is not a software error. Dec 30 13:43:38 iy50 mcelog[754]: MCE 0 Dec 30 13:43:38 iy50 mcelog[754]: CPU 0 BANK 5 Dec 30 13:43:38 iy50 mcelog[754]: MISC 38a0000086 ADDR ff881e00 Dec 30 13:43:38 iy50 mcelog[754]: TIME 1419965015 Tue Dec 30 13:43:35 2014 Dec 30 13:43:38 iy50 mcelog[754]: MCG status: Dec 30 13:43:38 iy50 mcelog[754]: MCi status: Dec 30 13:43:38 iy50 mcelog[754]: Error overflow Dec 30 13:43:38 iy50 mcelog[754]: Uncorrected error Dec 30 13:43:38 iy50 mcelog[754]: MCi_MISC register valid Dec 30 13:43:38 iy50 mcelog[754]: MCi_ADDR register valid Dec 30 13:43:38 iy50 mcelog[754]: Processor context corrupt Dec 30 13:43:38 iy50 mcelog[754]: MCA: corrected filtering (some unreported errors in same region) Dec 30 13:43:38 iy50 mcelog[754]: Generic CACHE Level-2 Generic Error Dec 30 13:43:38 iy50 mcelog[754]: STATUS ee0000000040110a MCGSTATUS 0 Dec 30 13:43:38 iy50 mcelog[754]: MCGCAP c09 APICID 0 SOCKETID 0 Dec 30 13:43:38 iy50 mcelog[754]: CPUID Vendor Intel Family 6 Model 60 Dec 30 13:43:38 iy50 mcelog[754]: Hardware event. This is not a software error. Dec 30 13:43:38 iy50 mcelog[754]: MCE 1 Dec 30 13:43:38 iy50 mcelog[754]: CPU 0 BANK 6 Dec 30 13:43:38 iy50 mcelog[754]: MISC 38a0000086 ADDR ff882600 Dec 30 13:43:38 iy50 mcelog[754]: TIME 1419965015 Tue Dec 30 13:43:35 2014 Running with kernel 3.17.7-300.fc21.x86_64. This happens every time on resume from suspend and doesn't seem to have any adverse effects. (In reply to Prarit Bhargava from comment #1) > Mikhail, I have done an update of the mcelog package. Could you test and > add karma so that it gets pushed to stable? > > https://admin.fedoraproject.org/updates/FEDORA-2014-17348/mcelog-101-1. > 9bfaad8f92c5.fc21 > > P. [root@localhost ~]# systemctl status mcelog ● mcelog.service - Machine Check Exception Logging Daemon Loaded: loaded (/usr/lib/systemd/system/mcelog.service; enabled) Active: active (running) since Thu 2015-01-01 18:11:52 YEKT; 4h 9min ago Process: 673 ExecStartPre=/etc/mcelog/mcelog.setup (code=exited, status=0/SUCCESS) Main PID: 714 (mcelog) CGroup: /system.slice/mcelog.service └─714 /usr/sbin/mcelog --ignorenodev --daemon --foreground Jan 01 22:16:37 localhost.localdomain mcelog[714]: CPU 2 BANK 0 Jan 01 22:16:37 localhost.localdomain mcelog[714]: TIME 1420132597 Thu Jan 1 22:16:37 2015 Jan 01 22:16:37 localhost.localdomain mcelog[714]: MCG status: Jan 01 22:16:37 localhost.localdomain mcelog[714]: MCi status: Jan 01 22:16:37 localhost.localdomain mcelog[714]: Corrected error Jan 01 22:16:37 localhost.localdomain mcelog[714]: Error enabled Jan 01 22:16:37 localhost.localdomain mcelog[714]: MCA: Internal parity error Jan 01 22:16:37 localhost.localdomain mcelog[714]: STATUS 90000040000f0005 MCGSTATUS 0 Jan 01 22:16:37 localhost.localdomain mcelog[714]: MCGCAP c09 APICID 4 SOCKETID 0 Jan 01 22:16:37 localhost.localdomain mcelog[714]: CPUID Vendor Intel Family 6 Model 60 [ 55.529160] SELinux: initialized (dev proc, type proc), uses genfs_contexts [ 68.867669] DMA-API: debugging out of memory - disabling [ 84.768597] SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs [ 92.709445] pool (3211) used greatest stack depth: 11424 bytes left [ 1158.635168] Render (5567) used greatest stack depth: 11200 bytes left [ 1440.144384] kvm: zapping shadow pages for mmio generation wraparound [ 1528.088490] nf_conntrack: automatic helper assignment is deprecated and it will be removed soon. Use the iptables CT target to attach helpers instead. [ 2431.906716] qemu-system-x86 (6569) used greatest stack depth: 11088 bytes left [ 2963.509258] SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs [ 3096.788500] ------------[ cut here ]------------ [ 3096.788516] WARNING: CPU: 3 PID: 2451 at lib/dma-debug.c:593 debug_dma_assert_idle+0x1a4/0x220() [ 3096.788518] i915 0000:00:02.0: DMA-API: cpu touching an active dma mapped cacheline [cln=0x000000001e017fc0] [ 3096.788519] Modules linked in: rfcomm fuse nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ip6t_REJECT xt_conntrack cfg80211 ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw bnep hid_logitech_dj iTCO_wdt iTCO_vendor_support ppdev btrfs xor x86_pkg_temp_thermal coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel raid6_pq vfat fat snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi serio_raw snd_hda_intel snd_hda_controller snd_emu10k1 snd_hda_codec snd_util_mem snd_hwdep snd_rawmidi [ 3096.788580] snd_seq i2c_i801 snd_ac97_codec ac97_bus snd_seq_device emu10k1_gp gameport snd_pcm snd_timer mei_me lpc_ich snd mei mfd_core shpchp soundcore btusb bluetooth parport_pc rfkill parport tpm_infineon tpm_tis tpm nfsd auth_rpcgss nfs_acl lockd binfmt_misc sunrpc uas usb_storage i915 i2c_algo_bit drm_kms_helper firewire_ohci firewire_core drm crc_itu_t r8169 mii video [ 3096.788603] CPU: 3 PID: 2451 Comm: chrome Not tainted 3.17.7-300.fc21.x86_64+debug #1 [ 3096.788604] Hardware name: Gigabyte Technology Co., Ltd. Z87M-D3H/Z87M-D3H, BIOS F11 08/12/2014 [ 3096.788605] 0000000000000000 0000000039fb8405 ffff8807c9107c38 ffffffff8183413f [ 3096.788616] ffff8807c9107c80 ffff8807c9107c70 ffffffff810a348d ffff8807f2df9270 [ 3096.788618] ffff8807eeb7e3c0 00001c6508481d28 ffffea001d6d33c0 ffff8804b4b57210 [ 3096.788620] Call Trace: [ 3096.788623] [<ffffffff8183413f>] dump_stack+0x4d/0x66 [ 3096.788627] [<ffffffff810a348d>] warn_slowpath_common+0x7d/0xa0 [ 3096.788629] [<ffffffff810a350c>] warn_slowpath_fmt+0x5c/0x80 [ 3096.788631] [<ffffffff810ffaed>] ? trace_hardirqs_on+0xd/0x10 [ 3096.788634] [<ffffffff814301f4>] debug_dma_assert_idle+0x1a4/0x220 [ 3096.788637] [<ffffffff81206e24>] do_wp_page+0xf4/0x9a0 [ 3096.788639] [<ffffffff812097dd>] ? handle_mm_fault+0x31d/0x1020 [ 3096.788640] [<ffffffff81209dac>] handle_mm_fault+0x8ec/0x1020 [ 3096.788643] [<ffffffff81068f7f>] ? __do_page_fault+0x1cf/0x620 [ 3096.788645] [<ffffffff81068fe9>] __do_page_fault+0x239/0x620 [ 3096.788648] [<ffffffff8140801e>] ? memzero_explicit+0xe/0x10 [ 3096.788650] [<ffffffff81069401>] do_page_fault+0x31/0x70 [ 3096.788652] [<ffffffff81840478>] page_fault+0x28/0x30 [ 3096.788653] ---[ end trace f5c75bbb4ecd1f18 ]--- [ 3096.788654] Mapped at: [ 3096.788655] [<ffffffff8142e7c2>] debug_dma_map_sg+0x52/0x190 [ 3096.788657] [<ffffffffa00f16eb>] i915_gem_gtt_prepare_object+0xdb/0x110 [i915] [ 3096.788680] [<ffffffffa00f8c9b>] i915_gem_object_pin+0x4fb/0x7a0 [i915] [ 3096.788688] [<ffffffffa00eb9e5>] i915_gem_execbuffer_reserve_vma.isra.18+0x85/0x130 [i915] [ 3096.788695] [<ffffffffa00ebdb2>] i915_gem_execbuffer_reserve+0x322/0x350 [i915] [ 4107.842482] qemu-system-x86 (11595) used greatest stack depth: 10544 bytes left [ 5789.947486] perf interrupt took too long (2502 > 2500), lowering kernel.perf_event_max_sample_rate to 50000 [ 6565.251654] SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs [10166.951076] SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs [13768.603726] SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs [14704.304678] mce: [Hardware Error]: Machine check events logged [root@localhost ~]# mcelog [root@localhost ~]# mcelog still not log my problems :( Created attachment 974986 [details]
kernel 3.17.7 log
Demonstration: https://drive.google.com/file/d/0B0nwzlfiB4aQTnJERjN4RkpyQnM/view?usp=sharing This hell begin occurs after I run Google Chrome in Virtual Machine (Windows XP) in gnome-boxes. Created attachment 975194 [details]
collection of oops
This message is a reminder that Fedora 21 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 21. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '21'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 21 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. Fedora 21 changed to end-of-life (EOL) status on 2015-12-01. Fedora 21 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed. |
Created attachment 947005 [details] oops-2014-10-14-22:19:50-716-0 Description of problem: I have kernel oops "mce: [Hardware Error]: Machine check events logged" but mcelog not provide any additional information :( [root@localhost ~]# mcelog [root@localhost ~]# cat /dev/mcelog # rpm -q mcelog mcelog-1.0-0.13.f0d7654.fc21.x86_64