Bug 1431263

Summary:	on aarch64, running the hw_watch_addr.stp systemtap examples cause a stuck CPU
Product:	Red Hat Enterprise Linux 7	Reporter:	David Smith <dsmith>
Component:	systemtap	Assignee:	David Smith <dsmith>
Status:	CLOSED ERRATA	QA Contact:	Martin Cermak <mcermak>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	7.3	CC:	fche, jistone, lberk, mbenitez, mcermak, mjw
Target Milestone:	rc
Target Release:	---
Hardware:	aarch64
OS:	Linux
Whiteboard:
Fixed In Version:	systemtap-3.1-1.el7	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-08-01 09:34:40 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description David Smith 2017-03-10 19:24:24 UTC

Description of problem:

On 4.5.0-15.el7.aarch64, when the systemtap hw_watch_addr.stp example is run, the system reports a stuck CPU. 

Version-Release number of selected component (if applicable):

systemtap-3.0-7.el7.aarch64.rpm
kernel-4.5.0-15.el7.aarch64

How reproducible:

Every time.

Steps to Reproduce:

1. Run the hw_watch_addr.stp example (see below)

Actual results:

====
# stap -v --all-modules hw_watch_addr.stp 0x`grep "vm_dirty_ratio" /proc/kallsyms | awk '{print $1}'` -T 5
Pass 1: parsed user script and 447 library scripts using 160896virt/45760res/6464shr/37184data kb, in 970usr/60sys/1079real ms.
WARNING: Too many hardware breakpoint probes requested for arm64 (1 vs. 0)
Pass 2: analyzed script: 2 probes, 2 functions, 0 embeds, 0 globals using 161664virt/45760res/6464shr/37952data kb, in 10usr/0sys/11real ms.
Pass 3: translated to C into "/tmp/stap7FznIg/stap_f39c77b4ea97aaadfb4c753f4b720a85_1973_src.c" using 167104virt/53376res/7808shr/43392data kb, in 4000usr/470sys/4732real ms.
Pass 4: compiled C into "stap_f39c77b4ea97aaadfb4c753f4b720a85_1973.ko" in 31610usr/1210sys/33113real ms.
Pass 5: starting run.
0xfffffe0000cb7748 value is accessed
 0xfffffe00001d2238 : domain_dirty_limits+0x3c/0x198 [kernel]
 0xfffffe00001d45fc : wb_over_bg_thresh+0xcc/0x1c8 [kernel]
 0xfffffe000026ff48 : wb_workfn+0x1a8/0x40c [kernel]
 0xfffffe00000e0700 : process_one_work+0x15c/0x3b4 [kernel]
 0xfffffe00000e09b8 : worker_thread+0x60/0x440 [kernel]
 0xfffffe00000e72e0 : kthread+0xdc/0xf0 [kernel]
 0xfffffe00000919c0 : ret_from_fork+0x10/0x50 [kernel]
 0x0 (inexact)
0xfffffe0000cb7748 value is accessed
 0xfffffe00001d2238 : domain_dirty_limits+0x3c/0x198 [kernel]
 0xfffffe00001d45fc : wb_over_bg_thresh+0xcc/0x1c8 [kernel]
 0xfffffe000026ff48 : wb_workfn+0x1a8/0x40c [kernel]
 0xfffffe00000e0700 : process_one_work+0x15c/0x3b4 [kernel]
 0xfffffe00000e09b8 : worker_thread+0x60/0x440 [kernel]
 0xfffffe00000e72e0 : kthread+0xdc/0xf0 [kernel]
 0xfffffe00000919c0 : ret_from_fork+0x10/0x50 [kernel]
 0x0 (inexact)
0xfffffe0000cb7748 value is accessed
 0xfffffe00001d2238 : domain_dirty_limits+0x3c/0x198 [kernel]
 0xfffffe00001d45fc : wb_over_bg_thresh+0xcc/0x1c8 [kernel]
 0xfffffe000026ff48 : wb_workfn+0x1a8/0x40c [kernel]
 0xfffffe00000e0700 : process_one_work+0x15c/0x3b4 [kernel]
 0xfffffe00000e09b8 : worker_thread+0x60/0x440 [kernel]
 0xfffffe00000e72e0 : kthread+0xdc/0xf0 [kernel]
 0xfffffe00000919c0 : ret_from_fork+0x10/0x50 [kernel]
 0x0 (inexact)
0xfffffe0000cb7748 value is accessed
 0xfffffe00001d2238 : domain_dirty_limits+0x3c/0x198 [kernel]
 0xfffffe00001d45fc : wb_over_bg_thresh+0xcc/0x1c8 [kernel]
 0xfffffe000026ff48 : wb_workfn+0x1a8/0x40c [kernel]
 0xfffffe00000e0700 : process_one_work+0x15c/0x3b4 [kernel]
 0xfffffe00000e09b8 : worker_thread+0x60/0x440 [kernel]
 0xfffffe00000e72e0 : kthread+0xdc/0xf0 [kernel]
 0xfffffe00000919c0 : ret_from_fork+0x10/0x50 [kernel]
 0x0 (inexact)
... constant stream of the above output deleted ...
====

In addition, the following is reported on the console:

====
Mar 10 13:51:11 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel: stap_f39c77b4ea97aaadfb4c753f4b720a85_3078: systemtap: 3.2/0.166, base: fffffdfffcb80000, memory: 6576data/42text/55ctx/2063net/124alloc kb, probes: 2
Mar 10 13:51:40 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel: NMI watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [kworker/u16:0:6]
Mar 10 13:51:40 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel: Modules linked in: stap_f39c77b4ea97aaadfb4c753f4b720a85_3078(OE) vfat fat sg i2c_xgene_slimpro gpio_xgene_sb i2c_core xgene_rng gpio_generic hed nfsd ip_tables xfs libcrc32c mlx4_en vxlan ip6_udp_tunnel udp_tunnel ptp pps_core mlx4_core ahci_xgene libahci_platform dm_mirror dm_region_hash dm_log dm_mod
Mar 10 13:51:40 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel:
Mar 10 13:51:40 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel: CPU: 2 PID: 6 Comm: kworker/u16:0 Tainted: G           OE  ------------   4.5.0-15.el7.aarch64 #1
Mar 10 13:51:40 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel: Hardware name: HPE ProLiant m400 Server/ProLiant m400 Server, BIOS U02 06/25/2016
Mar 10 13:51:40 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel: Workqueue: writeback wb_workfn (flush-253:0)
Mar 10 13:51:40 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel: task: fffffe0f691e4b00 ti: fffffe0f69214000 task.ti: fffffe0f69214000
Mar 10 13:51:40 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel: PC is at domain_dirty_limits+0x3c/0x198
Mar 10 13:51:40 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel: LR is at wb_over_bg_thresh+0xcc/0x1c8
Mar 10 13:51:40 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel: pc : [<fffffe00001d2238>] lr : [<fffffe00001d45fc>] pstate: 20000145
Mar 10 13:51:40 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel: sp : fffffe0f69217b70
Mar 10 13:51:40 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel: x29: fffffe0f69217b70 x28: fffffe0000c50000
Mar 10 13:51:40 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel: x27: fffffe0000d6aeb0 x26: fffffe0fec380358
Mar 10 13:51:40 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel: x25: fffffe0fec380368 x24: 0000000000000000
Mar 10 13:51:40 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel: x23: fffffe0fec380354 x22: fffffe0fec380370
Mar 10 13:51:40 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel: x21: fffffe0f69217bf0 x20: 0000000000000000
Mar 10 13:51:40 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel: x19: fffffe0fec3801e0 x18: 000003fffd92b930
Mar 10 13:51:40 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel: x17: 000000000000001f x16: 00000000000003ff
Mar 10 13:51:40 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel: x15: 0000000000000020 x14: ff5d2d0001a20400
Mar 10 13:51:40 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel: x13: ff5d2d0001a20400 x12: 0000000000000000
Mar 10 13:51:40 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel: x11: fffffe0000a0fff8 x10: 0000000000000000
Mar 10 13:51:40 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel: x9 : 0000000000000000 x8 : 0000000000001000
Mar 10 13:51:40 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel: x7 : fffffe0ff9f49000 x6 : fffffe0f62405200
Mar 10 13:51:40 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel: x5 : fffffe00bc3919e0 x4 : 00000000000f35d5
Mar 10 13:51:40 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel: x3 : 0000000000000000 x2 : fffffe0000cb7748
Mar 10 13:51:40 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel: x1 : 0000000000000000 x0 : 0000000000000000
Mar 10 13:51:40 hp-moonshot-03-c21.lab.eng.rdu.redhat.com kernel:
====


Expected results:

The example should cause a CPU to get stuck.

Additional info:

The same thing happens with the upstream version of systemtap.

This is probably related up and old sourceware bug 15908. The symptoms are a bit different, but it is against the same example script.

Comment 2 David Smith 2017-03-16 14:22:07 UTC

On the debug kernel (4.5.0-15.el7.aarch64.debug), I see the following on the console:

====
[ 1456.231082] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [kworker/u16:1:15106]
[ 1456.239059] Modules linked in: stap_054c2f68f83da5b9627cb81f07300d2_15586(OE) vfat fat sg i2c_xgene_slimpro xgene_rng gpio_xgene_sb i2c_core gpio_generic nfsd ip_tables xfs libcrc32c xgene_enet sdhci_acpi at803x realtek sdhci mmc_core ahci_xgene mdio_xgene libahci_platform dm_mirror dm_region_hash dm_log dm_mod
[ 1456.266787] irq event stamp: 0
[ 1456.269826] hardirqs last  enabled at (0): [<          (null)>]           (null)
[ 1456.277200] hardirqs last disabled at (0): [<fffffe00000ce3dc>] copy_process.isra.32.part.33+0x454/0x16d8
[ 1456.286741] softirqs last  enabled at (0): [<fffffe00000ce3dc>] copy_process.isra.32.part.33+0x454/0x16d8
[ 1456.296280] softirqs last disabled at (0): [<          (null)>]           (null)
[ 1456.303658] 
[ 1456.305146] CPU: 3 PID: 15106 Comm: kworker/u16:1 Tainted: G        W  OE  ------------   4.5.0-15.el7.aarch64.debug #1
[ 1456.315884] Hardware name: AppliedMicro X-Gene Mustang Board/X-Gene Mustang Board, BIOS 3.06.25 Oct 17 2016
[ 1456.325589] Workqueue: writeback wb_workfn (flush-253:0)
[ 1456.330900] task: fffffe035fb59000 ti: fffffe00037f4000 task.ti: fffffe00037f4000
[ 1456.338356] PC is at domain_dirty_limits+0x44/0x30c
[ 1456.343215] LR is at wb_over_bg_thresh+0xcc/0x1c8
[ 1456.347899] pc : [<fffffe00002129ec>] lr : [<fffffe0000215f24>] pstate: 20000145
[ 1456.355266] sp : fffffe00037f7aa0
[ 1456.358567] x29: fffffe00037f7aa0 x28: fffffe00010d2000 
[ 1456.363878] x27: 0000000000000000 x26: fffffe00010d0000 
[ 1456.369189] x25: fffffe000120c000 x24: 0000000000000000 
[ 1456.374500] x23: fffffe0000ee0550 x22: fffffe0000ee05a0 
[ 1456.379809] x21: fffffe00037f7b40 x20: 0000000000000000 
[ 1456.385118] x19: fffffe0000ee0280 x18: 000003ffd504dcb0 
[ 1456.390428] x17: 000003ffa02b1bfc x16: fffffe00002a89f4 
[ 1456.395738] x15: 0000000000000004 x14: 000003ffd50502d0 
[ 1456.401049] x13: 0000000000000001 x12: 0000000000000000 
[ 1456.406358] x11: fffffe00008a5940 x10: 00000000000023e0 
[ 1456.411669] x9 : fffffe00037f4000 x8 : 0000000000000000 
[ 1456.416980] x7 : fffffe00002c9cd8 x6 : 0000000000000000 
[ 1456.422290] x5 : fffffe0002499438 x4 : 0000000000034b64 
[ 1456.427601] x3 : 0000000000000000 x2 : fffffe0001143848 
[ 1456.432910] x1 : 0000000000000000 x0 : 0000000000000000 
[ 1456.438220]
====

Comment 3 David Smith 2017-03-17 17:53:45 UTC

I've used the following script to see if 'perf' had any problems running the equivalent of the hw_watch_addr.stp example script:

====
#!/bin/sh
addr="0x`grep "vm_dirty_ratio" /proc/kallsyms | awk '{print $1}'`"
echo "probing $addr..."
perf record -e mem:${addr}:rw -ag sleep 5
perf report
====

The script runs with no problems. So, this looks like a systemtap issue, not a kernel issue.

Comment 4 David Smith 2017-03-21 19:00:41 UTC

Here's an update.

After lots more investigation, I've found a small problem with the systemtap generated code, but it isn't enough to cause the problem we're seeing here. I'm seeing a large stream of these events before the cpu gets stuck - several thousand.

I guessed that perhaps something in the hardware watch code also accessed the vm_dirty_ratio variable, causing an infinite stream of events. However, after creating my own kernel module (where I can control the access to the probed variable), I'm still seeing an infinite stream of events.

I'll continue to debug this issue.

Comment 5 David Smith 2017-03-21 21:32:05 UTC

According to <http://stackoverflow.com/questions/28280813/register-wide-hw-breakpoint-continually-triggers-handler-callback>:	

====
This is by design. When an ARM hardware watchpoint is hit, it generates a Data Abort exception. On ARM, Data Abort exceptions trigger before the instruction that triggers them finishes1. This means that, in the exception handler, registers and memory locations affected by the instruction still hold their old values (or, in some cases, undefined values). As such, when the handler finishes, it must retry the aborted instruction so that the interrupted program runs as intended. If the watchpoint is still set when the handler returns, the instruction will trigger it again. This causes the loop...
====

Now the above is actually talking about ARMv7, and we're dealing with ARMv8 here. But, I'll bet a similar thing is happening.

Next I plan to look at what 'perf' is doing and how it handles this problem.

Comment 6 David Smith 2017-03-21 21:35:13 UTC

(In reply to David Smith from comment #4)
> Here's an update.
> 
> After lots more investigation, I've found a small problem with the systemtap
> generated code, but it isn't enough to cause the problem we're seeing here.

The "small problem" I mentioned here I fixed in upstream commit d0b322e.

<https://sourceware.org/git/gitweb.cgi?p=systemtap.git;a=commit;h=d0b322eddb2f0f248ced58df8e8448c6e557c81e>

Comment 7 David Smith 2017-03-23 17:48:39 UTC

Here's the interesting section from the arm64 hardware breakpoint code (arch/arm64/hw_breakpoint.c):

====
/*
 * Debug exception handlers.
 */
static int breakpoint_handler(unsigned long unused, unsigned int esr,
			      struct pt_regs *regs)
{
	int i, step = 0, *kernel_step;
	u32 ctrl_reg;
	u64 addr, val;
	struct perf_event *bp, **slots;
	struct debug_info *debug_info;
	struct arch_hw_breakpoint_ctrl ctrl;

	slots = this_cpu_ptr(bp_on_reg);
	addr = instruction_pointer(regs);
	debug_info = &current->thread.debug;

	for (i = 0; i < core_num_brps; ++i) {
		rcu_read_lock();

		bp = slots[i];

		if (bp == NULL)
			goto unlock;

		/* Check if the breakpoint value matches. */
		val = read_wb_reg(AARCH64_DBG_REG_BVR, i);
		if (val != (addr & ~0x3))
			goto unlock;

		/* Possible match, check the byte address select to confirm. */
		ctrl_reg = read_wb_reg(AARCH64_DBG_REG_BCR, i);
		decode_ctrl_reg(ctrl_reg, &ctrl);
		if (!((1 << (addr & 0x3)) & ctrl.len))
			goto unlock;

		counter_arch_bp(bp)->trigger = addr;
		perf_bp_event(bp, regs);

		/* Do we need to handle the stepping? */
		if (is_default_overflow_handler(bp))
			step = 1;
unlock:
		rcu_read_unlock();
	}

	if (!step)
		return 0;

	if (user_mode(regs)) {
		debug_info->bps_disabled = 1;
		toggle_bp_registers(AARCH64_DBG_REG_BCR, DBG_ACTIVE_EL0, 0);

		/* If we're already stepping a watchpoint, just return. */
		if (debug_info->wps_disabled)
			return 0;

		if (test_thread_flag(TIF_SINGLESTEP))
			debug_info->suspended_step = 1;
		else
			user_enable_single_step(current);
	} else {
		toggle_bp_registers(AARCH64_DBG_REG_BCR, DBG_ACTIVE_EL1, 0);
		kernel_step = this_cpu_ptr(&stepping_kernel_bp);

		if (*kernel_step != ARM_KERNEL_STEP_NONE)
			return 0;

		if (kernel_active_single_step()) {
			*kernel_step = ARM_KERNEL_STEP_SUSPEND;
		} else {
			*kernel_step = ARM_KERNEL_STEP_ACTIVE;
			kernel_enable_single_step(regs);
		}
	}

	return 0;
}
====

If I'm reading that correctly, the kernel will handle single-stepping over the breakpoint, but only if the breakpoint uses the default handler (see the is_default_overflow_handler() section of code). Since we've got a custom handler, we're expected to handle this ourselves.

Comment 8 David Smith 2017-03-23 20:29:54 UTC

I filed a kernel bug against this issue (bug #1435450). For now, we'll have to disable 'kernel.data' probes in systemtap for arm64.

Comment 9 David Smith 2017-03-23 21:22:40 UTC

Upstream commit 26f262a disables kernel.data probes on arm64.

Comment 12 errata-xmlrpc 2017-08-01 09:34:40 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2301