Bug 1498969

Summary: x86/mce: suspicious RCU usage
Product: [Fedora] Fedora Reporter: Mikhail <mikhail.v.gavrilov>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 27CC: airlied, ajax, bskeggs, eparis, esandeen, hdegoede, ichavero, itamar, jarodwilson, jeremy, jforbes, jglisse, jonathan, josef, jwboyer, kernel-maint, labbott, linville, mchehab, mjg59, nhorman, quintela, steved
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: kernel-4.13.12-200.fc26 kernel-4.13.12-100.fc25 kernel-4.13.12-300.fc27 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-11-14 01:59:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
dmesg none

Description Mikhail 2017-10-05 16:46:59 UTC
Created attachment 1334917 [details]
dmesg

Description of problem:
Oct 05 11:40:17 localhost.localdomain kernel: mce: [Hardware Error]: Machine check events logged
Oct 05 11:40:17 localhost.localdomain kernel: 
Oct 05 11:40:17 localhost.localdomain kernel: =============================
Oct 05 11:40:17 localhost.localdomain kernel: WARNING: suspicious RCU usage
Oct 05 11:40:17 localhost.localdomain kernel: 4.13.4-301.fc27.x86_64+debug #1 Not tainted
Oct 05 11:40:17 localhost.localdomain kernel: -----------------------------
Oct 05 11:40:17 localhost.localdomain kernel: arch/x86/kernel/cpu/mcheck/dev-mcelog.c:60 suspicious mce_log_get_idx_check() usage!
Oct 05 11:40:17 localhost.localdomain kernel: 
                                              other info that might help us debug this:
Oct 05 11:40:17 localhost.localdomain kernel: 
                                              rcu_scheduler_active = 2, debug_locks = 1
Oct 05 11:40:17 localhost.localdomain kernel: 3 locks held by kworker/1:2/14637:
Oct 05 11:40:17 localhost.localdomain kernel:  #0:  ("events"){.+.+.+}, at: [<ffffffffaa0d2ac0>] process_one_work+0x1d0/0x6a0
Oct 05 11:40:17 localhost.localdomain kernel:  #1:  ((&mce_work)){+.+...}, at: [<ffffffffaa0d2ac0>] process_one_work+0x1d0/0x6a0
Oct 05 11:40:17 localhost.localdomain kernel:  #2:  ((x86_mce_decoder_chain).rwsem){++++..}, at: [<ffffffffaa0dc92f>] blocking_notifier_call_chain+0x2f/0x70
Oct 05 11:40:17 localhost.localdomain kernel: 
                                              stack backtrace:
Oct 05 11:40:17 localhost.localdomain kernel: CPU: 1 PID: 14637 Comm: kworker/1:2 Not tainted 4.13.4-301.fc27.x86_64+debug #1
Oct 05 11:40:17 localhost.localdomain kernel: Hardware name: Gigabyte Technology Co., Ltd. Z87M-D3H/Z87M-D3H, BIOS F11 08/12/2014
Oct 05 11:40:17 localhost.localdomain kernel: Workqueue: events mce_gen_pool_process
Oct 05 11:40:17 localhost.localdomain kernel: Call Trace:
Oct 05 11:40:17 localhost.localdomain kernel:  dump_stack+0x8e/0xd6
Oct 05 11:40:17 localhost.localdomain kernel:  lockdep_rcu_suspicious+0xc5/0x100
Oct 05 11:40:17 localhost.localdomain kernel:  dev_mce_log+0xf6/0x1e0
Oct 05 11:40:17 localhost.localdomain kernel:  notifier_call_chain+0x39/0x90
Oct 05 11:40:17 localhost.localdomain kernel:  blocking_notifier_call_chain+0x49/0x70
Oct 05 11:40:17 localhost.localdomain kernel:  mce_gen_pool_process+0x41/0x70
Oct 05 11:40:17 localhost.localdomain kernel:  process_one_work+0x253/0x6a0
Oct 05 11:40:17 localhost.localdomain kernel:  worker_thread+0x4d/0x3b0
Oct 05 11:40:17 localhost.localdomain kernel:  kthread+0x133/0x150
Oct 05 11:40:17 localhost.localdomain kernel:  ? process_one_work+0x6a0/0x6a0
Oct 05 11:40:17 localhost.localdomain kernel:  ? kthread_create_on_node+0x70/0x70
Oct 05 11:40:17 localhost.localdomain kernel:  ret_from_fork+0x2a/0x40
Oct 05 11:40:17 localhost.localdomain mcelog[762]: Hardware event. This is not a software error.
Oct 05 11:40:17 localhost.localdomain mcelog[762]: MCE 0
Oct 05 11:40:17 localhost.localdomain mcelog[762]: CPU 1 BANK 0 TSC 71eec2000849
Oct 05 11:40:17 localhost.localdomain mcelog[762]: TIME 1507185617 Thu Oct  5 11:40:17 2017
Oct 05 11:40:17 localhost.localdomain mcelog[762]: MCG status:
Oct 05 11:40:17 localhost.localdomain mcelog[762]: MCi status:
Oct 05 11:40:17 localhost.localdomain mcelog[762]: Corrected error
Oct 05 11:40:17 localhost.localdomain mcelog[762]: Error enabled
Oct 05 11:40:17 localhost.localdomain mcelog[762]: MCA: Internal parity error
Oct 05 11:40:17 localhost.localdomain mcelog[762]: STATUS 90000040000f0005 MCGSTATUS 0
Oct 05 11:40:17 localhost.localdomain mcelog[762]: MCGCAP c09 APICID 2 SOCKETID 0
Oct 05 11:40:17 localhost.localdomain mcelog[762]: CPUID Vendor Intel Family 6 Model 60


Could anybody look into this?
What means this error message?

Comment 1 Jeremy Cline 2017-10-10 19:52:31 UTC
Hi,

Thank you for taking the time to report this bug. I've brought it to the attention of the x86 MCE maintainers:

https://marc.info/?l=linux-kernel&m=150766207223899

Comment 2 Fedora Update System 2017-11-08 22:10:56 UTC
kernel-4.13.12-200.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-31d7720d7e

Comment 3 Fedora Update System 2017-11-08 22:11:29 UTC
kernel-4.13.12-300.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2017-abda708cee

Comment 4 Fedora Update System 2017-11-08 22:11:54 UTC
kernel-4.13.12-100.fc25 has been submitted as an update to Fedora 25. https://bodhi.fedoraproject.org/updates/FEDORA-2017-08a350c878

Comment 5 Fedora Update System 2017-11-09 19:54:52 UTC
kernel-4.13.12-300.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-abda708cee

Comment 6 Fedora Update System 2017-11-11 16:01:55 UTC
kernel-4.13.12-100.fc25 has been pushed to the Fedora 25 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-08a350c878

Comment 7 Fedora Update System 2017-11-11 17:29:10 UTC
kernel-4.13.12-200.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-31d7720d7e

Comment 8 Fedora Update System 2017-11-14 01:59:59 UTC
kernel-4.13.12-200.fc26 has been pushed to the Fedora 26 stable repository. If problems still persist, please make note of it in this bug report.

Comment 9 Fedora Update System 2017-11-14 08:51:00 UTC
kernel-4.13.12-100.fc25 has been pushed to the Fedora 25 stable repository. If problems still persist, please make note of it in this bug report.

Comment 10 Fedora Update System 2017-11-14 09:24:11 UTC
kernel-4.13.12-300.fc27 has been pushed to the Fedora 27 stable repository. If problems still persist, please make note of it in this bug report.