Bug 1546179
| Summary: | nettop.stp fails with read fault on rhel-alt-7.5/s390x | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Martin Cermak <mcermak> |
| Component: | systemtap | Assignee: | Frank Ch. Eigler <fche> |
| Status: | CLOSED ERRATA | QA Contact: | Martin Cermak <mcermak> |
| Severity: | medium | Docs Contact: | Vladimír Slávik <vslavik> |
| Priority: | medium | ||
| Version: | 7.5-Alt | CC: | bgollahe, chorn, cww, dsmith, fche, fj-lsoft-kernel-it, fj-lsoft-rh-dump, jistone, lberk, lherbolt, mbenitez, mcermak, mjw, pasik, vslavik |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | s390x | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | systemtap-3.3-1.el7 | Doc Type: | No Doc Update |
| Doc Text: |
undefined
|
Story Points: | --- |
| Clone Of: | 1506230 | Environment: | |
| Last Closed: | 2018-10-30 10:46:00 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1477664, 1505884, 1609081 | ||
|
Description
Martin Cermak
2018-02-16 14:26:04 UTC
OK, I think I know what is going on here. The following kernel commit changed the sk_buff structure:
====
commit bffa72cf7f9df842f0016ba03586039296b4caaf
Author: Eric Dumazet <edumazet>
Date: Tue Sep 19 05:14:24 2017 -0700
net: sk_buff rbnode reorg
skb->rbnode shares space with skb->next, skb->prev and skb->tstamp
====
The sk_buff structure now looks like this:
====
struct sk_buff {
union {
struct {
/* These two members must be first. */
struct sk_buff *next;
struct sk_buff *prev;
union {
struct net_device *dev;
/* Some protocols might use this space to store information,
* while device pointer would be NULL.
* UDP receive path is one user.
*/
unsigned long dev_scratch;
};
};
struct rb_node rbnode; /* used in netem & tcp stack */
};
... stuff deleted ...
};
====
The systemtap read fault error is coming from the following tapset line:
dev_name = kernel_string($skb->dev->name)
(Of course, I'll remind readers that a "read fault" error is really a good thing - that's systemtap realizing that this address isn't valid and giving an error instead of trying to read a bad address and potentially crashing the kernel.)
In this case, the problem I see in finding a solution is that I don't see a way of knowing which of the two different unions have valid values in them:
1) In this particular skb, is skb->rbnode valid or is the unnamed structure containing next/prev valid?
2) Assuming this particular skb's unnamed structure containing next/prev is valid, is dev or dev_scratch valid?
The s390x is probably seeing the read fault error more than other platforms because it has always been more "sensitive" to bad addresses.
The quickest solution would just be to surround that tapset code with 'try { ... } catch { ... }' and returning the dev_name field as something like "UNKNOWN", but that doesn't really follow the spirit of the following bug where we tried to eradicate strings like that:
<https://sourceware.org/bugzilla/show_bug.cgi?id=15044>
Fixed upstream in commit 2f6fcfc68. Accesses to sk_buff structures are now surrounded by try/catch in probes. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:3168 |