Bug 603911
Summary: | RHEL UV: x86_64: NMI Watchdog detected LOCKUP | ||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | George Beshers <gbeshers> | ||||||||||||||||||||||||||
Component: | kernel | Assignee: | George Beshers <gbeshers> | ||||||||||||||||||||||||||
Status: | CLOSED CANTFIX | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||||||||||||||||||||||||
Severity: | urgent | Docs Contact: | |||||||||||||||||||||||||||
Priority: | high | ||||||||||||||||||||||||||||
Version: | 6.0 | CC: | dwa, dzickus, martinez, michael.hagmann, peterm, prarit, srostedt, tee, travis | ||||||||||||||||||||||||||
Target Milestone: | rc | ||||||||||||||||||||||||||||
Target Release: | 6.0 | ||||||||||||||||||||||||||||
Hardware: | x86_64 | ||||||||||||||||||||||||||||
OS: | Linux | ||||||||||||||||||||||||||||
Whiteboard: | |||||||||||||||||||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||||||||||||||||
Doc Text: |
Due to the way ftrace works when modifying the code during startup, the NMI watchdog causes too much noise and ftrace can not find a quiet period to instrument the code. Consequently, machines with more than 512 cpus will encounter issues with the NMI watchdog. Such issues will return error messages similar to "BUG: NMI Watchdog detected LOCKUP" and have either 'ftrace_modify_code' or 'ipi_handler' in the backtrace. To work around this issue, disable nmi_watchdog using the command:
nmi_watchdog=0
|
Story Points: | --- | ||||||||||||||||||||||||||
Clone Of: | |||||||||||||||||||||||||||||
: | 619578 (view as bug list) | Environment: | |||||||||||||||||||||||||||
Last Closed: | 2010-07-29 20:51:41 UTC | Type: | --- | ||||||||||||||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||||||||||||
Embargoed: | |||||||||||||||||||||||||||||
Bug Depends On: | |||||||||||||||||||||||||||||
Bug Blocks: | 555548, 619578 | ||||||||||||||||||||||||||||
Attachments: |
|
Adding Steven to the cc list for his opinion. I am not sure how long it is expected for ftrace to do its code modification. Considering it seems to be syncing across 2048 cpus, I would think it would be appropriate for a 'touch_nmi_watchdog' inside say 'do_ftrace_mod_code()'? Perhaps Steven could give us some insight here. Cheers, Don This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux major release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Major release. This request is not yet committed for inclusion. Ouch, 2048 CPUS! But still, this is on module loading which does not even run stop_machine. Oh God, I think it may be this where we are stuck: static void wait_for_nmi(void) { if (!atomic_read(&nmi_running)) return; do { cpu_relax(); } while (atomic_read(&nmi_running)); nmi_wait_count++; } If we have 2048 CPUs, and all of them running NMI's it is possible that there's always an NMI running on one of the CPUs. To solve this, we would need to convert this into a RCU like loop. That is, this only needs to wait for NMIs that are currently running. We can set this up to continue when all currently running NMIs have finished. It may take a bit of work. To see if this is the case, does this box boot up fine without NMIs. You could also check by making the interval of NMI triggering longer. I recently tried with a similar patch a TIMEOUT of 120 (rhel6 currently is 30). I still got some LOCKUPs with that setting. I have not tried TIMEOUT=0 with the latest patches; I have the machine reserved tomorrow morning. George I'm not talking about making the timeout longer, because if an NMI is always firing on one CPU, it will still lockup at any timeout. We need to lengthen the time between NMIs. That would require modifying the code itself though. George, We can probably answer part of Steven's question by booting with nmi_watchdog=0. That is probably the easiest for now. Let me attach a patch that will lengthen the timeout from every second on an active cpu to every 10 seconds. This will address Steven's other thought. Cheers, Don Created attachment 424157 [details]
slow down nmis from every 1 second to 10 seconds
Created attachment 424503 [details]
console log with 'nmi_watchdog=0'
Created attachment 424504 [details]
console log with Don's nmi delay patch
Note that at the point were the log stops I left the
machine for about half an hour with no forward progress.
George
Forgot to note that 'nmi_watchdog=0' did boot successfully. Cheers, George Thanks George, I just had a BZ mid air collision asking that very question ;-) I guess the answer is for me to come up with an RCU style wait_for_nmi() function. -- Steve Hi George, Thanks for the console logs. In regards to the logs using my delay patch, the message LOCKUP related delay of 10 on XXXX is repeated forever, but for the life of me I can't find in the kernel where that is displayed. Did you put an extra patch in? Cheers, Don Looking back at the code, a RCU style wait_or_nmi() can not work, since we must wait for a point that all CPUs are not running any NMI. I'm thinking since all NMIs are accessing an atomic global variable when doing the ftrace modifications that the cacheline bouncing that it is causing with >2000 CPUs is making it to the point that it is locking up. I can add to the loop that if it fails to clear the nmi_running variable after X amount of iterations that it bails and disables ftrace. It may just be that we can have ftrace or NMI watchdogs in a machine like that, but not both. Another solution is to block the NMIs on exit. Have them spin until the last one does the work. This might work. I'll write up a patch to try it out. Don, Yes -- and I forgot about it :(. Do you think it materially altered the test? Its in ./arch/x86/kernel/apic/nmi.c if (t % 10 == 0) { printk(KERN_WARNING "LOCKUP related delay of %ld on CPU%d\n", t, smp_processor_id()); } if (local_read(&__get_cpu_var(alert_counter)) == CONFIG_DEBUG_NMI_TIMEOUT * nmi_hz) /* * die_nmi will return ONLY if NOTIFY_STOP happens.. */ die_nmi("BUG: NMI Watchdog detected LOCKUP", regs, panic_on_timeout); George (In reply to comment #16) > Don, > > Yes -- and I forgot about it :(. Do you think it materially > altered the test? I can't tell until I see the _whole_ patch. But yes, putting printks in there can slow down a processors ability to generate interrupts and thus mislead the machine into thinking it has stuck cpus (because interrupts are not firing in a periodic fashion). Also note that during the check_nmi test, the nmi watchdog is configured to fire HZ (1000) times a second instead of the usual once a second (or 10 with my patch). But please in the future let me know all the patches you put on the kernel so I can accurately figure out what is going on. Thanks, Don Created attachment 424537 [details]
patch actually used.
I have the large machine reserved for early tomorrow morning,
will re-run with a clean set of patches.
Sorry for the confusion.
George
Created attachment 424810 [details]
Rerun with the NMI delay patch
The system failed to boot, confirming the validity
of yesterday's run.
Created attachment 424811 [details]
Patched used against -33 to produce previous log
(In reply to comment #19) > Created an attachment (id=424810) [details] > Rerun with the NMI delay patch > > > The system failed to boot, confirming the validity > of yesterday's run. Well not really. It failed in the original spot at the beginning of this bz. The timeout characteristics were probably changed such that you didn't see the LOCKUP yet. But this hang is different than yesterday's hang. What this test proves is that having the nmi trigger every 10 seconds instead of every 1, still doesn't resolve this issue. So far only disabling nmi_watchdog entirely works (which is probably a good workaround for now). Perhaps Steven can craft a patch that allows ftrace to do its magic without having to worry about all the NMIs. Cheers, Don Created attachment 424905 [details]
pause nmis till all nmi's are done
George,
Can you try this patch. It's just a test patch. While the code modification is happening, it forces NMIs on other CPUs to wait till all other NMIs are done.
I can do even more, but I want to see if this fixes the issue first.
Hi Steve, This patch failed early on. I have access to the machine this evening and tomorrow morning. George Booting Node 126, Processors #2032 #2033 #2034 #2035 #2036 #2037 #2038 #2039 Ok. Booting Node 127, Processors #2040 #2041 #2042 #2043 #2044 #2045 #2046 #2047 Ok. Brought up 2048 CPUs Total of 2048 processors activated (9289042.89 BogoMIPS). Testing NMI watchdog ... OK. BUG: NMI Watchdog detected LOCKUP on CPU0, ip ffffffff8102a0ca, registers: CPU 0 Modules linked in: Modules linked in: Pid: 1, comm: swapper Not tainted 2.6.32-33.el6.SRostedt.x86_64 #1 Altix UV RIP: 0010:[<ffffffff8102a0ca>] [<ffffffff8102a0ca>] set_mtrr+0xca/0x150 RSP: 0018:ffff8c2ff8e13e80 EFLAGS: 00000006 RAX: 00000000000007a5 RBX: 00000000ffffffff RCX: 3333333333333333 RDX: 0000000000000020 RSI: 0000000000000800 RDI: ffffffffffffffff RBP: ffff8c2ff8e13ed0 R08: ffffffff818bb9b8 R09: 00000000000007c0 R10: 0000000000000020 R11: 0000000000000000 R12: 0000000000000246 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000000 CR3: 0000000001001000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper (pid: 1, threadinfo ffff8c2ff8e12000, task ffff8c27f9e12040) Stack: 00000001000007a5 0000000000000000 0000000000000000 00000000ffffffff <0> ffff8c2ff8e13ee0 0000000114d5c5e5 ffffffff818bb6c0 0000000000016700 <0> 0000000000000800 0000000000090000 ffff8c2ff8e13ee0 ffffffff8102a1e0 Call Trace: [<ffffffff8102a1e0>] mtrr_aps_init+0x30/0x40 [<ffffffff818ea04b>] native_smp_cpus_done+0x111/0x150 [<ffffffff818dc737>] kernel_init+0x165/0x29c [<ffffffff810141ca>] child_rip+0xa/0x20 [<ffffffff818dc5d2>] ? kernel_init+0x0/0x29c [<ffffffff810141c0>] ? child_rip+0x0/0x20 Code: 45 b0 83 fb ff c7 45 b4 01 00 00 00 74 7b 48 8b 05 dc 22 a1 00 41 0f b6 cf 4c 89 f2 4c 89 ee 89 df ff 50 08 e b 05 0f 1f 00 f3 90 <8b> 45 b0 85 c0 75 f7 8b 35 a9 78 88 00 48 8b 3d f2 1c a1 00 e8 ---[ end trace 8a80a59122ce32a3 ]--- Kernel panic - not syncing: Non maskable interrupt Pid: 1, comm: swapper Tainted: G D 2.6.32-33.el6.SRostedt.x86_64 #1 Call Trace: <NMI> [<ffffffff814d72cd>] panic+0x78/0x137 [<ffffffff8106a3d3>] ? print_oops_end_marker+0x23/0x30 [<ffffffff814db5dc>] die_nmi+0xfc/0x100 [<ffffffff814dbb8a>] nmi_watchdog_tick+0x1aa/0x200 [<ffffffff814db103>] do_nmi+0x1a3/0x2d0 [<ffffffff814da9e0>] nmi+0x20/0x30 [<ffffffff8102a0ca>] ? set_mtrr+0xca/0x150 <<EOE>> [<ffffffff8102a1e0>] mtrr_aps_init+0x30/0x40 [<ffffffff818ea04b>] native_smp_cpus_done+0x111/0x150 [<ffffffff818dc737>] kernel_init+0x165/0x29c [<ffffffff810141ca>] child_rip+0xa/0x20 [<ffffffff818dc5d2>] ? kernel_init+0x0/0x29c [<ffffffff810141c0>] ? child_rip+0x0/0x20 Created attachment 425787 [details]
Simply touch_nmi_watchdog()
George,
Remove the other patch and try this one. This is the simplest approach, all it does is to do a touch_nmi_watchdog() after each conversion of a mcount nop.
Created attachment 426047 [details]
console log
This is with the touch_nmi_watchdog() patch installed.
George
George, Are you sure something else isn't broken? That crash does not seem to be related to the patch I have at all. Did you remove the first patch? Steve, I started over fresh, but I didn't include Don's patch. I have the machine reserved again in a few hours and will try both. George Good news > Bad news. First, I switched from -33 based kernel to -37 based kernel. It booted -- Steve's touch_nmi_watchdog() patch seems to have solved the timeout problem. The bad news is that the mdelay in arch/x86/kernel/apic/nmi.c is too short. I tried changing it to 80 and you can see the results in the attachment. I will try 200 in the morning, but I am concerned about the future. We would like to achieve parity between distros and that means running on 4 racks and 4096 cpus / 16Tb of memory. A 3 rack 3000cpu system is being assembled but I don't think I can get on it until next week or (more likely) 2 weeks. George Created attachment 426320 [details]
messages file from 2048p/5tb system
NOTE: this booted without the nmi_watchdog=0.
George
(In reply to comment #28) > Good news > Bad news. > > First, I switched from -33 based kernel to -37 based kernel. > > It booted -- Steve's touch_nmi_watchdog() patch seems to have > solved the timeout problem. > > The bad news is that the mdelay in arch/x86/kernel/apic/nmi.c > is too short. I tried changing it to 80 and you can see the > results in the attachment. > > I will try 200 in the morning, but I am concerned about > the future. We would like to achieve parity between distros > and that means running on 4 racks and 4096 cpus / 16Tb of > memory. > Well I am pretty sure that will fail too. My guess is that uv_handle_nmi is grabbing all the nmi_watchdog NMIs and as a result the nmi_watchdog can not re-arm itself. This leads to what appears as an NMI stuck message (because no NMIs are being generate on the system). Of course, without the full patch you applied to the system I can not tell for sure as the uv_handle_nmi function does not seem to be in the -37.el6 kernel. Or can you just test plain -37.el6 with just Steven's patch? Oh, you will probably need to add my patch too. I thought that made into -37.el6 but it doesn't appear to. Cheers, Don Ok so I talked to George about this problem. I didn't realize a patch was posted that put uv_handle_nmi in the RHEL-6 kernel. After looking at that code, I realized that won't work with the nmi_watchdog. What happens is that uv_handle_nmi grabs all the nmis before the nmi_watchdog (and possibly others). As a result the nmi_watchdog never gets to re-arm and uv_handle_nmi prints all the stack traces from nmis that were supposed to be for the nmi_watchdog. So the uv_handle_nmi code needs to be modified to become a shared nmi handler, otherwise (even with nmi_watchdog=0 set) it can starve other nmi handlers in the system depending on when it gets registered in the die_chain (because it just assumes the nmi is for uv_handle_nmi). George mentioned that uv_handle_nmi was just a debug hook to print out backtraces on the system for all cpus. The kernel has something like that already or if that doesn't work, SGI could implement a similar mechanism that at least sets a flag signaling to the uv_handle_nmi that the nmi is most likely intended for it. In arch/x86/kernel/apic/nmi.c::arch_trigger_all_cpu_backtrace() look at how the variable 'backtrace_mask' is used. Or just call arch_trigger_all_cpu_backtrace() if it meets SGIs needs. Hope that helps. Cheers, Don Created attachment 434727 [details] console log Note that this has the rt_sched_patch (see BZ607587) and a patch for kdump. The uv_nmi stackdump has been removed. Main thing to note is that ipi_handler is always the *first* thing to cat the NMI Watchdog detected LOCKUP. So one cpu has trouble flushing its cache, locks up the cpu, then another one wants to panic because of another NMI lockup but can't because the first cpu has interrupts disabled (and spinning on a spinlock!) which prevents the panic ipi from finishing. This results in the 2000+ cpu pile up. Lovely. The question is why couldn't cpu 769 flush its cache? Cheers, Don pushing to 6.1. If SGI feels this is a blocker for 6.0, feel free to talk with your PM and have it changed. Cheers, Don I cloned this bz to track for 6.1. For 6.0, it is to late to fix this properly. It seems the workaround for this problem will be 'nmi_watchdog=0'. I will close this bz and add a tech note for the workaround, please use bz619578 to continue the discussion. Tech note: Machines with larger than 512 cpus will have known issues with the NMI watchdog. These issues will manifest itself as "BUG: NMI Watchdog detected LOCKUP" and have either 'ftrace_modify_code' or 'ipi_handler' in its backtrace. Due to the way ftrace works when modifying the code during startup, the NMI watchdog causes too much noise and ftrace can not find a quiet period in which to instrument the code. Using nmi_watchdog=0, resolves this issue for now. Due to slowness when invalidating the cache, the cpu may take longer than normal to finish. This causes the nmi watchdog to prematurely fire. Using nmi_watchdog=0, resolves this issue for now. Tech note is for RHEL 6.0. Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Machines with larger than 512 cpus will have known issues with the NMI watchdog. These issues will manifest itself as "BUG: NMI Watchdog detected LOCKUP" and have either 'ftrace_modify_code' or 'ipi_handler' in its backtrace. Due to the way ftrace works when modifying the code during startup, the NMI watchdog causes too much noise and ftrace can not find a quiet period in which to instrument the code. Using nmi_watchdog=0, resolves this issue for now. Due to slowness when invalidating the cache, the cpu may take longer than normal to finish. This causes the nmi watchdog to prematurely fire. Using nmi_watchdog=0, resolves this issue for now. Per our conversation yesterday. I ran into a hardware problem so only limited testing; I still ended up with messages with the slower clock-tick with the count a 6 (60 seconds). George What did the stack trace look like, did it have the ipi_handler messages in it or the ftrace ones? Regardless, I was hoping that would have worked because that would have been a nice generic time to use across everything. I guess increasing 6 to 12, 18 or 24 would be the next logical step. Of course providing a commandline knob will probably make sense too if this gets too large. Cheers, Don I lost the initial message -- scrollback wasn't set large enough. However, previous testing it was almost always the ipi_handler. George Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,11 +1,3 @@ -Machines with larger than 512 cpus will have known issues with the NMI -watchdog. These issues will manifest itself as "BUG: NMI Watchdog detected -LOCKUP" and have either 'ftrace_modify_code' or 'ipi_handler' in its backtrace. +Due to the way ftrace works when modifying the code during startup, the NMI watchdog causes too much noise and ftrace can not find a quiet period to instrument the code. Consequently, machines with more than 512 cpus will encounter issues with the NMI watchdog. Such issues will return error messages similar to "BUG: NMI Watchdog detected LOCKUP" and have either 'ftrace_modify_code' or 'ipi_handler' in the backtrace. To work around this issue, disable nmi_watchdog using the command: -Due to the way ftrace works when modifying the code during startup, the NMI +nmi_watchdog=0-watchdog causes too much noise and ftrace can not find a quiet period in which -to instrument the code. Using nmi_watchdog=0, resolves this issue for now. - -Due to slowness when invalidating the cache, the cpu may take longer than -normal to finish. This causes the nmi watchdog to prematurely fire. Using -nmi_watchdog=0, resolves this issue for now. |
Created attachment 423971 [details] console log Description of problem: The following comes up on a 512p UV system, the issue is more pronounced on a 2048p system. The problem is that we are getting NMI Watchdog messages, but only when CONFIG_DYNAMIC_FTRACE is turned on. The following portion of the console log. The kernel is -33 with dzickus patch plus commit 72680a191b934377430032f93af15ef50aafb3a8 Author: Hedi Berriche <hedi> Date: Wed May 26 14:44:06 2010 -0700 pids: increase pid_max based on num_possible_cpus This is backtrace is from uv35-sys. [ 72.214393] BUG: NMI Watchdog detected LOCKUP on CPU401, ip ffffffff810362ad, registers: [ 72.214393] CPU 401 [ 72.214393] Modules linked in: dm_mod(+)(U) [ 72.214393] [ 72.214393] Modules linked in: dm_mod(+)(U) [ 72.214393] Pid: 6244, comm: modprobe Not tainted 2.6.32-33.el6.DZ_Hedi.x86_64 #1 Stoutland Platform [ 72.214393] RIP: 0010:[<ffffffff810362ad>] [<ffffffff810362ad>] do_ftrace_mod_code+0x3d/0x100 [ 72.214393] RSP: 0018:ffff8901fe725e08 EFLAGS: 00000097 [ 72.214393] RAX: 000000000000000b RBX: ffff8901fe725e18 RCX: ffffffff80000000 [ 72.214393] RDX: 0000000000000000 RSI: ffffffff81a655dc RDI: ffffffffa000a660 [ 72.214393] RBP: ffff8901fe725e08 R08: 0000000000000000 R09: 0000000000000000 [ 72.214393] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffa000a660 [ 72.214393] R13: ffffffff81a655b4 R14: ffffffff81a655dc R15: 0000001380aa269c [ 72.214393] FS: 00007fbcc326d700(0000) GS:ffff892007520000(0000) knlGS:0000000000000000 [ 72.214393] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 72.214393] CR2: 00007fbcc322e00f CR3: 00000121fe52e000 CR4: 00000000000006e0 [ 72.214393] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 72.214393] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 72.214393] Process modprobe (pid: 6244, threadinfo ffff8901fe724000, task ffff8901fe7c6b30) [ 72.214393] Stack: [ 72.214393] ffff8901fe725e48 ffffffff81036559 ffff89e10087dbe8 ffffffff810e6057 [ 72.214393] <0> ffff8921fe523150 ffffffffa000a660 ffffffffa000d860 0000000000000246 [ 72.214393] <0> ffff8901fe725e58 ffffffff81036633 ffff8901fe725e98 ffffffff810e431e [ 72.214393] Call Trace: [ 72.214393] [<ffffffff81036559>] ftrace_modify_code+0x79/0x90 [ 72.214393] [<ffffffff810e6057>] ? ring_buffer_time_stamp+0x7/0x10 [ 72.214393] [<ffffffffa000a660>] ? dm_attr_show+0x10/0x68 [dm_mod] [ 72.214393] [<ffffffff81036633>] ftrace_make_nop+0x33/0x40 [ 72.214393] [<ffffffff810e431e>] ftrace_convert_nops+0xfe/0x290 [ 72.214393] [<ffffffff810e44f5>] ftrace_module_notify+0x45/0x50 [ 72.214393] [<ffffffff814dd035>] notifier_call_chain+0x55/0x80 [ 72.214393] [<ffffffff81095eea>] __blocking_notifier_call_chain+0x5a/0x80 [ 72.214393] [<ffffffff81095f26>] blocking_notifier_call_chain+0x16/0x20 [ 72.214393] [<ffffffff810aec58>] sys_init_module+0xa8/0x250 [ 72.214393] [<ffffffff81013172>] system_call_fastpath+0x16/0x1b [ 72.214393] Code: 0f ae f0 48 c7 c1 00 00 00 80 31 c0 f0 0f b1 0d 1a f3 a2 00 85 c0 74 1e 31 d2 0f 1f 40 00 f3 90 48 89 d0 f0 0f b1 0d 03 f3 a2 00 <85> c0 75 ef 83 05 20 f3 a2 00 01 0f ae f0 48 8b 35 0e f3 a2 00 [ 72.214393] ---[ end trace 2f5ff6cb68e61c06 ]--- [ 72.214393] Kernel panic - not syncing: Non maskable interrupt [ 72.214393] Pid: 6244, comm: modprobe Tainted: G D 2.6.32-33.el6.DZ_Hedi.x86_64 #1 [ 72.214393] Call Trace: [ 72.214393] <NMI> [<ffffffff814d72cd>] panic+0x78/0x137 [ 72.214393] [<ffffffff8106a3d3>] ? print_oops_end_marker+0x23/0x30 [ 72.214393] [<ffffffff814db5dc>] die_nmi+0xfc/0x100 [ 72.214393] [<ffffffff814dbb8a>] nmi_watchdog_tick+0x1aa/0x200 [ 72.214393] [<ffffffff814db103>] do_nmi+0x1a3/0x2d0 [ 72.214393] [<ffffffffa000a660>] ? dm_attr_show+0x10/0x68 [dm_mod] [ 72.214393] [<ffffffff814da9e0>] nmi+0x20/0x30 [ 72.214393] [<ffffffffa000a660>] ? dm_attr_show+0x10/0x68 [dm_mod] [ 72.214393] [<ffffffffa000a660>] ? dm_attr_show+0x10/0x68 [dm_mod] [ 72.214393] [<ffffffff810362ad>] ? do_ftrace_mod_code+0x3d/0x100 [ 72.214393] <<EOE>> [<ffffffff81036559>] ftrace_modify_code+0x79/0x90 [ 72.214393] [<ffffffff810e6057>] ? ring_buffer_time_stamp+0x7/0x10 [ 72.214393] [<ffffffffa000a660>] ? dm_attr_show+0x10/0x68 [dm_mod] [ 72.214393] [<ffffffff81036633>] ftrace_make_nop+0x33/0x40 [ 72.214393] [<ffffffff810e431e>] ftrace_convert_nops+0xfe/0x290 [ 72.214393] [<ffffffff810e44f5>] ftrace_module_notify+0x45/0x50 [ 72.214393] [<ffffffff814dd035>] notifier_call_chain+0x55/0x80 [ 72.214393] [<ffffffff81095eea>] __blocking_notifier_call_chain+0x5a/0x80 [ 72.214393] [<ffffffff81095f26>] blocking_notifier_call_chain+0x16/0x20 [ 72.214393] [<ffffffff810aec58>] sys_init_module+0xa8/0x250 [ 72.214393] [<ffffffff81013172>] system_call_fastpath+0x16/0x1b Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: