Bug 1431263
| Summary: | on aarch64, running the hw_watch_addr.stp systemtap examples cause a stuck CPU | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | David Smith <dsmith> |
| Component: | systemtap | Assignee: | David Smith <dsmith> |
| Status: | CLOSED ERRATA | QA Contact: | Martin Cermak <mcermak> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 7.3 | CC: | fche, jistone, lberk, mbenitez, mcermak, mjw |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | aarch64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | systemtap-3.1-1.el7 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-08-01 09:34:40 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
On the debug kernel (4.5.0-15.el7.aarch64.debug), I see the following on the console: ==== [ 1456.231082] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [kworker/u16:1:15106] [ 1456.239059] Modules linked in: stap_054c2f68f83da5b9627cb81f07300d2_15586(OE) vfat fat sg i2c_xgene_slimpro xgene_rng gpio_xgene_sb i2c_core gpio_generic nfsd ip_tables xfs libcrc32c xgene_enet sdhci_acpi at803x realtek sdhci mmc_core ahci_xgene mdio_xgene libahci_platform dm_mirror dm_region_hash dm_log dm_mod [ 1456.266787] irq event stamp: 0 [ 1456.269826] hardirqs last enabled at (0): [< (null)>] (null) [ 1456.277200] hardirqs last disabled at (0): [<fffffe00000ce3dc>] copy_process.isra.32.part.33+0x454/0x16d8 [ 1456.286741] softirqs last enabled at (0): [<fffffe00000ce3dc>] copy_process.isra.32.part.33+0x454/0x16d8 [ 1456.296280] softirqs last disabled at (0): [< (null)>] (null) [ 1456.303658] [ 1456.305146] CPU: 3 PID: 15106 Comm: kworker/u16:1 Tainted: G W OE ------------ 4.5.0-15.el7.aarch64.debug #1 [ 1456.315884] Hardware name: AppliedMicro X-Gene Mustang Board/X-Gene Mustang Board, BIOS 3.06.25 Oct 17 2016 [ 1456.325589] Workqueue: writeback wb_workfn (flush-253:0) [ 1456.330900] task: fffffe035fb59000 ti: fffffe00037f4000 task.ti: fffffe00037f4000 [ 1456.338356] PC is at domain_dirty_limits+0x44/0x30c [ 1456.343215] LR is at wb_over_bg_thresh+0xcc/0x1c8 [ 1456.347899] pc : [<fffffe00002129ec>] lr : [<fffffe0000215f24>] pstate: 20000145 [ 1456.355266] sp : fffffe00037f7aa0 [ 1456.358567] x29: fffffe00037f7aa0 x28: fffffe00010d2000 [ 1456.363878] x27: 0000000000000000 x26: fffffe00010d0000 [ 1456.369189] x25: fffffe000120c000 x24: 0000000000000000 [ 1456.374500] x23: fffffe0000ee0550 x22: fffffe0000ee05a0 [ 1456.379809] x21: fffffe00037f7b40 x20: 0000000000000000 [ 1456.385118] x19: fffffe0000ee0280 x18: 000003ffd504dcb0 [ 1456.390428] x17: 000003ffa02b1bfc x16: fffffe00002a89f4 [ 1456.395738] x15: 0000000000000004 x14: 000003ffd50502d0 [ 1456.401049] x13: 0000000000000001 x12: 0000000000000000 [ 1456.406358] x11: fffffe00008a5940 x10: 00000000000023e0 [ 1456.411669] x9 : fffffe00037f4000 x8 : 0000000000000000 [ 1456.416980] x7 : fffffe00002c9cd8 x6 : 0000000000000000 [ 1456.422290] x5 : fffffe0002499438 x4 : 0000000000034b64 [ 1456.427601] x3 : 0000000000000000 x2 : fffffe0001143848 [ 1456.432910] x1 : 0000000000000000 x0 : 0000000000000000 [ 1456.438220] ==== I've used the following script to see if 'perf' had any problems running the equivalent of the hw_watch_addr.stp example script:
====
#!/bin/sh
addr="0x`grep "vm_dirty_ratio" /proc/kallsyms | awk '{print $1}'`"
echo "probing $addr..."
perf record -e mem:${addr}:rw -ag sleep 5
perf report
====
The script runs with no problems. So, this looks like a systemtap issue, not a kernel issue.
Here's an update. After lots more investigation, I've found a small problem with the systemtap generated code, but it isn't enough to cause the problem we're seeing here. I'm seeing a large stream of these events before the cpu gets stuck - several thousand. I guessed that perhaps something in the hardware watch code also accessed the vm_dirty_ratio variable, causing an infinite stream of events. However, after creating my own kernel module (where I can control the access to the probed variable), I'm still seeing an infinite stream of events. I'll continue to debug this issue. According to <http://stackoverflow.com/questions/28280813/register-wide-hw-breakpoint-continually-triggers-handler-callback>: ==== This is by design. When an ARM hardware watchpoint is hit, it generates a Data Abort exception. On ARM, Data Abort exceptions trigger before the instruction that triggers them finishes1. This means that, in the exception handler, registers and memory locations affected by the instruction still hold their old values (or, in some cases, undefined values). As such, when the handler finishes, it must retry the aborted instruction so that the interrupted program runs as intended. If the watchpoint is still set when the handler returns, the instruction will trigger it again. This causes the loop... ==== Now the above is actually talking about ARMv7, and we're dealing with ARMv8 here. But, I'll bet a similar thing is happening. Next I plan to look at what 'perf' is doing and how it handles this problem. (In reply to David Smith from comment #4) > Here's an update. > > After lots more investigation, I've found a small problem with the systemtap > generated code, but it isn't enough to cause the problem we're seeing here. The "small problem" I mentioned here I fixed in upstream commit d0b322e. <https://sourceware.org/git/gitweb.cgi?p=systemtap.git;a=commit;h=d0b322eddb2f0f248ced58df8e8448c6e557c81e> Here's the interesting section from the arm64 hardware breakpoint code (arch/arm64/hw_breakpoint.c):
====
/*
* Debug exception handlers.
*/
static int breakpoint_handler(unsigned long unused, unsigned int esr,
struct pt_regs *regs)
{
int i, step = 0, *kernel_step;
u32 ctrl_reg;
u64 addr, val;
struct perf_event *bp, **slots;
struct debug_info *debug_info;
struct arch_hw_breakpoint_ctrl ctrl;
slots = this_cpu_ptr(bp_on_reg);
addr = instruction_pointer(regs);
debug_info = ¤t->thread.debug;
for (i = 0; i < core_num_brps; ++i) {
rcu_read_lock();
bp = slots[i];
if (bp == NULL)
goto unlock;
/* Check if the breakpoint value matches. */
val = read_wb_reg(AARCH64_DBG_REG_BVR, i);
if (val != (addr & ~0x3))
goto unlock;
/* Possible match, check the byte address select to confirm. */
ctrl_reg = read_wb_reg(AARCH64_DBG_REG_BCR, i);
decode_ctrl_reg(ctrl_reg, &ctrl);
if (!((1 << (addr & 0x3)) & ctrl.len))
goto unlock;
counter_arch_bp(bp)->trigger = addr;
perf_bp_event(bp, regs);
/* Do we need to handle the stepping? */
if (is_default_overflow_handler(bp))
step = 1;
unlock:
rcu_read_unlock();
}
if (!step)
return 0;
if (user_mode(regs)) {
debug_info->bps_disabled = 1;
toggle_bp_registers(AARCH64_DBG_REG_BCR, DBG_ACTIVE_EL0, 0);
/* If we're already stepping a watchpoint, just return. */
if (debug_info->wps_disabled)
return 0;
if (test_thread_flag(TIF_SINGLESTEP))
debug_info->suspended_step = 1;
else
user_enable_single_step(current);
} else {
toggle_bp_registers(AARCH64_DBG_REG_BCR, DBG_ACTIVE_EL1, 0);
kernel_step = this_cpu_ptr(&stepping_kernel_bp);
if (*kernel_step != ARM_KERNEL_STEP_NONE)
return 0;
if (kernel_active_single_step()) {
*kernel_step = ARM_KERNEL_STEP_SUSPEND;
} else {
*kernel_step = ARM_KERNEL_STEP_ACTIVE;
kernel_enable_single_step(regs);
}
}
return 0;
}
====
If I'm reading that correctly, the kernel will handle single-stepping over the breakpoint, but only if the breakpoint uses the default handler (see the is_default_overflow_handler() section of code). Since we've got a custom handler, we're expected to handle this ourselves.
I filed a kernel bug against this issue (bug #1435450). For now, we'll have to disable 'kernel.data' probes in systemtap for arm64. Upstream commit 26f262a disables kernel.data probes on arm64. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2301 |
Description of problem: On 4.5.0-15.el7.aarch64, when the systemtap hw_watch_addr.stp example is run, the system reports a stuck CPU. Version-Release number of selected component (if applicable): systemtap-3.0-7.el7.aarch64.rpm kernel-4.5.0-15.el7.aarch64 How reproducible: Every time. Steps to Reproduce: 1. Run the hw_watch_addr.stp example (see below) Actual results: ==== # stap -v --all-modules hw_watch_addr.stp 0x`grep "vm_dirty_ratio" /proc/kallsyms | awk '{print $1}'` -T 5 Pass 1: parsed user script and 447 library scripts using 160896virt/45760res/6464shr/37184data kb, in 970usr/60sys/1079real ms. WARNING: Too many hardware breakpoint probes requested for arm64 (1 vs. 0) Pass 2: analyzed script: 2 probes, 2 functions, 0 embeds, 0 globals using 161664virt/45760res/6464shr/37952data kb, in 10usr/0sys/11real ms. Pass 3: translated to C into "/tmp/stap7FznIg/stap_f39c77b4ea97aaadfb4c753f4b720a85_1973_src.c" using 167104virt/53376res/7808shr/43392data kb, in 4000usr/470sys/4732real ms. Pass 4: compiled C into "stap_f39c77b4ea97aaadfb4c753f4b720a85_1973.ko" in 31610usr/1210sys/33113real ms. Pass 5: starting run. 0xfffffe0000cb7748 value is accessed 0xfffffe00001d2238 : domain_dirty_limits+0x3c/0x198 [kernel] 0xfffffe00001d45fc : wb_over_bg_thresh+0xcc/0x1c8 [kernel] 0xfffffe000026ff48 : wb_workfn+0x1a8/0x40c [kernel] 0xfffffe00000e0700 : process_one_work+0x15c/0x3b4 [kernel] 0xfffffe00000e09b8 : worker_thread+0x60/0x440 [kernel] 0xfffffe00000e72e0 : kthread+0xdc/0xf0 [kernel] 0xfffffe00000919c0 : ret_from_fork+0x10/0x50 [kernel] 0x0 (inexact) 0xfffffe0000cb7748 value is accessed 0xfffffe00001d2238 : domain_dirty_limits+0x3c/0x198 [kernel] 0xfffffe00001d45fc : wb_over_bg_thresh+0xcc/0x1c8 [kernel] 0xfffffe000026ff48 : wb_workfn+0x1a8/0x40c [kernel] 0xfffffe00000e0700 : process_one_work+0x15c/0x3b4 [kernel] 0xfffffe00000e09b8 : worker_thread+0x60/0x440 [kernel] 0xfffffe00000e72e0 : kthread+0xdc/0xf0 [kernel] 0xfffffe00000919c0 : ret_from_fork+0x10/0x50 [kernel] 0x0 (inexact) 0xfffffe0000cb7748 value is accessed 0xfffffe00001d2238 : domain_dirty_limits+0x3c/0x198 [kernel] 0xfffffe00001d45fc : wb_over_bg_thresh+0xcc/0x1c8 [kernel] 0xfffffe000026ff48 : wb_workfn+0x1a8/0x40c [kernel] 0xfffffe00000e0700 : process_one_work+0x15c/0x3b4 [kernel] 0xfffffe00000e09b8 : worker_thread+0x60/0x440 [kernel] 0xfffffe00000e72e0 : kthread+0xdc/0xf0 [kernel] 0xfffffe00000919c0 : ret_from_fork+0x10/0x50 [kernel] 0x0 (inexact) 0xfffffe0000cb7748 value is accessed 0xfffffe00001d2238 : domain_dirty_limits+0x3c/0x198 [kernel] 0xfffffe00001d45fc : wb_over_bg_thresh+0xcc/0x1c8 [kernel] 0xfffffe000026ff48 : wb_workfn+0x1a8/0x40c [kernel] 0xfffffe00000e0700 : process_one_work+0x15c/0x3b4 [kernel] 0xfffffe00000e09b8 : worker_thread+0x60/0x440 [kernel] 0xfffffe00000e72e0 : kthread+0xdc/0xf0 [kernel] 0xfffffe00000919c0 : ret_from_fork+0x10/0x50 [kernel] 0x0 (inexact) ... constant stream of the above output deleted ... ==== In addition, the following is reported on the console: ==== Mar 10 13:51:11 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel: stap_f39c77b4ea97aaadfb4c753f4b720a85_3078: systemtap: 3.2/0.166, base: fffffdfffcb80000, memory: 6576data/42text/55ctx/2063net/124alloc kb, probes: 2 Mar 10 13:51:40 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel: NMI watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [kworker/u16:0:6] Mar 10 13:51:40 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel: Modules linked in: stap_f39c77b4ea97aaadfb4c753f4b720a85_3078(OE) vfat fat sg i2c_xgene_slimpro gpio_xgene_sb i2c_core xgene_rng gpio_generic hed nfsd ip_tables xfs libcrc32c mlx4_en vxlan ip6_udp_tunnel udp_tunnel ptp pps_core mlx4_core ahci_xgene libahci_platform dm_mirror dm_region_hash dm_log dm_mod Mar 10 13:51:40 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel: Mar 10 13:51:40 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel: CPU: 2 PID: 6 Comm: kworker/u16:0 Tainted: G OE ------------ 4.5.0-15.el7.aarch64 #1 Mar 10 13:51:40 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel: Hardware name: HPE ProLiant m400 Server/ProLiant m400 Server, BIOS U02 06/25/2016 Mar 10 13:51:40 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel: Workqueue: writeback wb_workfn (flush-253:0) Mar 10 13:51:40 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel: task: fffffe0f691e4b00 ti: fffffe0f69214000 task.ti: fffffe0f69214000 Mar 10 13:51:40 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel: PC is at domain_dirty_limits+0x3c/0x198 Mar 10 13:51:40 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel: LR is at wb_over_bg_thresh+0xcc/0x1c8 Mar 10 13:51:40 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel: pc : [<fffffe00001d2238>] lr : [<fffffe00001d45fc>] pstate: 20000145 Mar 10 13:51:40 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel: sp : fffffe0f69217b70 Mar 10 13:51:40 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel: x29: fffffe0f69217b70 x28: fffffe0000c50000 Mar 10 13:51:40 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel: x27: fffffe0000d6aeb0 x26: fffffe0fec380358 Mar 10 13:51:40 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel: x25: fffffe0fec380368 x24: 0000000000000000 Mar 10 13:51:40 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel: x23: fffffe0fec380354 x22: fffffe0fec380370 Mar 10 13:51:40 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel: x21: fffffe0f69217bf0 x20: 0000000000000000 Mar 10 13:51:40 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel: x19: fffffe0fec3801e0 x18: 000003fffd92b930 Mar 10 13:51:40 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel: x17: 000000000000001f x16: 00000000000003ff Mar 10 13:51:40 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel: x15: 0000000000000020 x14: ff5d2d0001a20400 Mar 10 13:51:40 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel: x13: ff5d2d0001a20400 x12: 0000000000000000 Mar 10 13:51:40 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel: x11: fffffe0000a0fff8 x10: 0000000000000000 Mar 10 13:51:40 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel: x9 : 0000000000000000 x8 : 0000000000001000 Mar 10 13:51:40 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel: x7 : fffffe0ff9f49000 x6 : fffffe0f62405200 Mar 10 13:51:40 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel: x5 : fffffe00bc3919e0 x4 : 00000000000f35d5 Mar 10 13:51:40 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel: x3 : 0000000000000000 x2 : fffffe0000cb7748 Mar 10 13:51:40 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel: x1 : 0000000000000000 x0 : 0000000000000000 Mar 10 13:51:40 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel: ==== Expected results: The example should cause a CPU to get stuck. Additional info: The same thing happens with the upstream version of systemtap. This is probably related up and old sourceware bug 15908. The symptoms are a bit different, but it is against the same example script.