Bug 1416204 - [RFE] systemtap script to monitor NFS traffic
Summary: [RFE] systemtap script to monitor NFS traffic
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: systemtap   
(Show other bugs)
Version: 7.2
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Frank Ch. Eigler
QA Contact: Martin Cermak
Vladimír Slávik
URL:
Whiteboard:
Keywords: FutureFeature
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-01-24 20:58 UTC by Efraim Marquez-Arreaza
Modified: 2017-08-01 09:34 UTC (History)
8 users (show)

Fixed In Version: systemtap-3.1-1.el7
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-08-01 09:34:40 UTC
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
nfsd-trace-sb.stp (974 bytes, text/plain)
2017-01-24 21:16 UTC, Frank Ch. Eigler
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:2301 normal SHIPPED_LIVE systemtap bug fix and enhancement update 2017-08-01 12:41:08 UTC

Description Efraim Marquez-Arreaza 2017-01-24 20:58:47 UTC
Description of problem:
Customer would like a systemtap script to monitor NFS traffic

Version-Release number of selected component (if applicable):
Red Hat Enterprise Linux Server release 7.2 (Maipo)
Linux <host> 3.10.0-327.13.1.el7.x86_64 #1 SMP Mon Feb 29 13:22:02 EST 2016 x86_64 x86_64 x86_64 GNU/Linux
---
systemtap-2.8-10.el7.x86_64
systemtap-client-2.8-10.el7.x86_64
systemtap-devel-2.8-10.el7.x86_64
systemtap-runtime-2.8-10.el7.x86_64
systemtap-sdt-devel-2.8-10.el7.x86_64

How reproducible:
N/A

Steps to Reproduce:
N/A

Actual results:

# stap nfsdtop.stp
lookups
reads
client		reads   bytes
<IP>:40194	24      6291456

# stap nfsdtop.stp 1
lookups
client                  last file
<IP>:41475     .dump_root_1108.log.swp

Expected results:

TIME	CLIENT	R/W    OFFSET(KB)  PATHNAME
Fri Jan 6 00:55:28 2017,<IP>, W, 464,324, /export/nfs/<directory>/file.log
Fri Jan 6 00:55:28 2017,<IP>, R, 464,321, /export/nfs/<directory>/file.log1

Additional info:

Customer has a similar script in Solaris/perl/dtrace "nfs_mon.pl" that has no license/copyright restriction

Comment 2 Frank Ch. Eigler 2017-01-24 21:16 UTC
Created attachment 1244089 [details]
nfsd-trace-sb.stp

A stap script that is closer to the model pasted.
It's too specialized to be included as a part of systemtap per se, so this is BZ more like a support request than a bug fix.

Comment 4 Efraim Marquez-Arreaza 2017-02-08 20:27:45 UTC
Customer tested the script and saw two (2) main issues:

#1 the script stops after some time after Pass 5 is completed, but customer wants it to be keep running until it exits manually:

Wed Feb  8 18:21:26 2017,<IP>:61442, W, 2056, /export/<dir>/batchCmd.EQUITYSP-3443.D1T.20170208_131351.try1.log
Wed Feb  8 18:21:26 2017,<IP>:55811, W, 4212, /export/<dir>/batchCmd.SWAPS-322.D1T.20170208_115118.try5.log
Wed Feb  8 18:21:26 2017,<IP>:55811, W, 1868, /export/<dir>/batchCmd.EQUITYSSO-94.D1T.20170208_131200.try2.log
Wed Feb  8 18:21:26 2017,<IP>:65027, nfsd.proc3.lookup, mergesession.mgd.scf, WARNING: Number of errors: 1, skipped probes: 1
WARNING: /usr/bin/staprun exited with status: 1
Pass 5: run completed in 0usr/340sys/1000real ms.
Pass 5: run failed.  [man error::pass5]

#2 the time stamp is wrong in the output (almost 5 hours off)

Script:
Wed Feb  8 18:21:26

Server
[root@server tmp]# date
Wed Feb  8 13:23:13 EST 2017

Comment 5 Frank Ch. Eigler 2017-02-09 02:11:49 UTC
(In reply to Efraim Marquez-Arreaza from comment #4)
> Customer tested the script and saw two (2) main issues:
> 
> #1 the script stops after some time after Pass 5 is completed, but customer
> wants it to be keep running until it exits manually:
> 
> Wed Feb  8 18:21:26 2017,<IP>:61442, W, 2056,
> /export/<dir>/batchCmd.EQUITYSP-3443.D1T.20170208_131351.try1.log
> Wed Feb  8 18:21:26 2017,<IP>:55811, W, 4212,
> /export/<dir>/batchCmd.SWAPS-322.D1T.20170208_115118.try5.log
> Wed Feb  8 18:21:26 2017,<IP>:55811, W, 1868,
> /export/<dir>/batchCmd.EQUITYSSO-94.D1T.20170208_131200.try2.log
> Wed Feb  8 18:21:26 2017,<IP>:65027, nfsd.proc3.lookup,
> mergesession.mgd.scf, WARNING: Number of errors: 1, skipped probes: 1
> WARNING: /usr/bin/staprun exited with status: 1
> Pass 5: run completed in 0usr/340sys/1000real ms.
> Pass 5: run failed.  [man error::pass5]

Hm, I wonder what the error message was.
But anyway, try running the script again with

 # stap --suppress-handler-errors  [....]


> #2 the time stamp is wrong in the output (almost 5 hours off)
> Script:
> Wed Feb  8 18:21:26
> Server
> [root@server tmp]# date
> Wed Feb  8 13:23:13 EST 2017

Probably a timezone issue: in the script, change ctime(...) to tz_ctime(...) to convert to the local time zone.  I am not sure where the extra two minutes' differential comes in though.

Comment 6 Efraim Marquez-Arreaza 2017-02-09 18:27:25 UTC
root@server tmp]# stap --suppress-handler-errors nfsd-trace-sb.stp
TIME    CLIENT  R/W     OFFSET(KB)      PATHNAME
Thu Feb  9 17:28:52 2017,<IP3>.42:65027, W, 9456, Thu Feb  9 17:28:54 2017,<IP3>.112:65027, W, 5716, Thu Feb  9 17:28:54 2017,<IP3>.112:65027, nfsd.proc3.commit, 4096,5853184, Thu Feb  9 17:28:56 2017,<IP3>.42:65027, W, 5712, Thu Feb  9 17:28:56 2017,<IP3>.42:65027, nfsd.proc3.commit, 4096,5849088, Thu Feb  9 17:28:56 2017,<IP3>.104:23298, W, 9460, Thu Feb  9 17:28:57 2017,<IP3>.41:65283, W, 9456, Thu Feb  9 17:28:57 2017,<IP3>.104:23298, W, 9460, Thu Feb  9 17:28:57 2017,<IP3>.112:65027, W, 5716, Thu Feb  9 17:28:57 2017,<IP3>.112:65027, nfsd.proc3.commit, 4096,5853184, Thu Feb  9 17:28:57 2017,<IP3>.112:65027, W, 5716, Thu Feb  9 17:28:57 2017,<IP3>.112:65027, nfsd.proc3.commit, 4096,5853184, Thu Feb  9 17:28:57 2017,<IP3>.72:65027, W, 9456, Thu Feb  9 17:28:57 2017,<IP3>.112:65027, W, 5716, Thu Feb  9 17:28:57 2017,<IP3>.112:65027, nfsd.proc3.commit, 4096,5853184, Thu Feb  9 17:28:58 2017,<IP3>.104:23298, W, 9460, Thu Feb  9 17:28:59 2017,<IP3>.112:65027, W, 5716, Thu Feb  9 17:28:59 2017,<IP3>.112:65027, nfsd.proc3.commit, 4096,5853184, Thu Feb  9 17:29:01 2017,<IP2>.52:13315, nfsd.proc3.lookup, .login, Thu Feb  9 17:29:01 2017,<IP2>.52:13315, R, 0, Thu Feb  9 17:29:01 2017,<IP2>.52:13315, nfsd.proc3.lookup, .cshrc, Thu Feb  9 17:29:01 2017,<IP2>.52:13315, R, 0, Thu Feb  9 17:29:01 2017,<IP2>.52:13315, nfsd.proc3.lookup, .netrc, Thu Feb  9 17:29:01 2017,<IP2>.52:13315, nfsd.proc3.lookup, .profile, Thu Feb  9 17:29:01 2017,<IP2>.52:13315, R, 0, Thu Feb  9 17:29:01 2017,<IP2>.52:13315, nfsd.proc3.lookup, .rhosts, Thu Feb  9 17:29:01 2017,<IP2>.52:13315, nfsd.proc3.lookup, .shosts, Thu Feb  9 17:29:01 2017,<IP3>.50:65283, W, 9452, Thu Feb  9 17:29:01 2017,<IP3>.84:65283, nfsd.proc3.lookup,



[root@server tmp]# stap -v nfsd-trace-sb.stp -g 
Pass 1: parsed user script and 115 library scripts using 219344virt/32464res/3264shr/29432data kb, in 240usr/10sys/254real ms.
Pass 2: analyzed script: 23 probes, 63 functions, 9 embeds, 1 global using 344996virt/159532res/4620shr/155084data kb, in 4550usr/510sys/5054real ms.
Pass 3: using cached /root/.systemtap/cache/55/stap_550f690a9cea68652910e0cb76b5ebc9_84452.c
Pass 4: using cached /root/.systemtap/cache/55/stap_550f690a9cea68652910e0cb76b5ebc9_84452.ko
Pass 5: starting run.
TIME    CLIENT  R/W     OFFSET(KB)      PATHNAME
ERROR: read fault [man error::fault] at 0x0000000000000038 (addr) near operator '->' at nfsd-trace-sb.stp:26:31
Thu Feb  9 17:32:11 2017,<IP>:65283, W, 9456, WARNING: Number of errors: 1, skipped probes: 1
WARNING: /usr/bin/staprun exited with status: 1
Pass 5: run completed in 10usr/350sys/1626real ms.
Pass 5: run failed.  [man error::pass5]

Comment 7 Frank Ch. Eigler 2017-02-09 18:38:45 UTC
OK, 

ERROR: read fault [man error::fault] at 0x0000000000000038 (addr) near operator '->' at nfsd-trace-sb.stp:26:31

answers my question about what the error was.  It's not a showstopper; must just be some unusual combination of nfs-operation and timing.  The --suppress-handler-errors option works on it.  Consider

# echo --suppress-handler-errors >> $HOME/.systemtap/rc

to make it a permanent default option.

Comment 11 errata-xmlrpc 2017-08-01 09:34:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2301


Note You need to log in before you can comment on or make changes to this bug.