RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1546179 - nettop.stp fails with read fault on rhel-alt-7.5/s390x
Summary: nettop.stp fails with read fault on rhel-alt-7.5/s390x
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: systemtap
Version: 7.5-Alt
Hardware: s390x
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Frank Ch. Eigler
QA Contact: Martin Cermak
Vladimír Slávik
URL:
Whiteboard:
Depends On:
Blocks: 1477664 1505884 1609081
TreeView+ depends on / blocked
 
Reported: 2018-02-16 14:26 UTC by Martin Cermak
Modified: 2021-06-10 14:40 UTC (History)
15 users (show)

Fixed In Version: systemtap-3.3-1.el7
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of: 1506230
Environment:
Last Closed: 2018-10-30 10:46:00 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2018:3168 0 None None None 2018-10-30 10:47:19 UTC

Description Martin Cermak 2018-02-16 14:26:04 UTC
+++ This bug was initially created as a clone of Bug #1506230 +++

=======
# stap nettop.stp  -c 'wget -q https://ftp.spline.de/pub/OpenBSD/ftplist'
ERROR: read fault [man error::fault] at 0x76af7f20 near identifier '$skb' at /usr/share/systemtap/tapset/linux/networking.stp:84:27
  PID   UID DEV     XMIT_PK RECV_PK XMIT_KB RECV_KB COMMAND        
19715     0 enccw0.0.8000       2       0       0       0 wget           

WARNING: Number of errors: 1, skipped probes: 0
WARNING: /usr/bin/staprun exited with status: 1
Pass 5: run failed.  [man error::pass5]
#
=======

Note that -P doesn't help.  SSH, or ICMP isn't sufficient to reproduce, but https or ftp traffic seems to kill.

Comment 2 David Smith 2018-02-27 21:20:01 UTC
OK, I think I know what is going on here. The following kernel commit changed the sk_buff structure:

====
commit bffa72cf7f9df842f0016ba03586039296b4caaf
Author: Eric Dumazet <edumazet>
Date:   Tue Sep 19 05:14:24 2017 -0700

    net: sk_buff rbnode reorg
    
    skb->rbnode shares space with skb->next, skb->prev and skb->tstamp
====

The sk_buff structure now looks like this:

====
struct sk_buff {
	union {
		struct {
			/* These two members must be first. */
			struct sk_buff		*next;
			struct sk_buff		*prev;

			union {
				struct net_device	*dev;
				/* Some protocols might use this space to store information,
				 * while device pointer would be NULL.
				 * UDP receive path is one user.
				 */
				unsigned long		dev_scratch;
			};
		};
		struct rb_node	rbnode; /* used in netem & tcp stack */
	};
        ... stuff deleted ...
};
====

The systemtap read fault error is coming from the following tapset line:

	dev_name = kernel_string($skb->dev->name)

(Of course, I'll remind readers that a "read fault" error is really a good thing - that's systemtap realizing that this address isn't valid and giving an error instead of trying to read a bad address and potentially crashing the kernel.)

In this case, the problem I see in finding a solution is that I don't see a way of knowing which of the two different unions have valid values in them:

1) In this particular skb, is skb->rbnode valid or is the unnamed structure containing next/prev valid?

2) Assuming this particular skb's unnamed structure containing next/prev is valid, is dev or dev_scratch valid?

The s390x is probably seeing the read fault error more than other platforms because it has always been more "sensitive" to bad addresses.

The quickest solution would just be to surround that tapset code with 'try { ... } catch { ... }' and returning the dev_name field as something like "UNKNOWN", but that doesn't really follow the spirit of the following bug where we tried to eradicate strings like that:

<https://sourceware.org/bugzilla/show_bug.cgi?id=15044>

Comment 3 David Smith 2018-03-02 20:06:41 UTC
Fixed upstream in commit 2f6fcfc68. Accesses to sk_buff structures are now surrounded by try/catch in probes.

Comment 11 errata-xmlrpc 2018-10-30 10:46:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:3168


Note You need to log in before you can comment on or make changes to this bug.