Hide Forgot
Description of problem: Customer would like a systemtap script to monitor NFS traffic Version-Release number of selected component (if applicable): Red Hat Enterprise Linux Server release 7.2 (Maipo) Linux <host> 3.10.0-327.13.1.el7.x86_64 #1 SMP Mon Feb 29 13:22:02 EST 2016 x86_64 x86_64 x86_64 GNU/Linux --- systemtap-2.8-10.el7.x86_64 systemtap-client-2.8-10.el7.x86_64 systemtap-devel-2.8-10.el7.x86_64 systemtap-runtime-2.8-10.el7.x86_64 systemtap-sdt-devel-2.8-10.el7.x86_64 How reproducible: N/A Steps to Reproduce: N/A Actual results: # stap nfsdtop.stp lookups reads client reads bytes <IP>:40194 24 6291456 # stap nfsdtop.stp 1 lookups client last file <IP>:41475 .dump_root_1108.log.swp Expected results: TIME CLIENT R/W OFFSET(KB) PATHNAME Fri Jan 6 00:55:28 2017,<IP>, W, 464,324, /export/nfs/<directory>/file.log Fri Jan 6 00:55:28 2017,<IP>, R, 464,321, /export/nfs/<directory>/file.log1 Additional info: Customer has a similar script in Solaris/perl/dtrace "nfs_mon.pl" that has no license/copyright restriction
Created attachment 1244089 [details] nfsd-trace-sb.stp A stap script that is closer to the model pasted. It's too specialized to be included as a part of systemtap per se, so this is BZ more like a support request than a bug fix.
Customer tested the script and saw two (2) main issues: #1 the script stops after some time after Pass 5 is completed, but customer wants it to be keep running until it exits manually: Wed Feb 8 18:21:26 2017,<IP>:61442, W, 2056, /export/<dir>/batchCmd.EQUITYSP-3443.D1T.20170208_131351.try1.log Wed Feb 8 18:21:26 2017,<IP>:55811, W, 4212, /export/<dir>/batchCmd.SWAPS-322.D1T.20170208_115118.try5.log Wed Feb 8 18:21:26 2017,<IP>:55811, W, 1868, /export/<dir>/batchCmd.EQUITYSSO-94.D1T.20170208_131200.try2.log Wed Feb 8 18:21:26 2017,<IP>:65027, nfsd.proc3.lookup, mergesession.mgd.scf, WARNING: Number of errors: 1, skipped probes: 1 WARNING: /usr/bin/staprun exited with status: 1 Pass 5: run completed in 0usr/340sys/1000real ms. Pass 5: run failed. [man error::pass5] #2 the time stamp is wrong in the output (almost 5 hours off) Script: Wed Feb 8 18:21:26 Server [root@server tmp]# date Wed Feb 8 13:23:13 EST 2017
(In reply to Efraim Marquez-Arreaza from comment #4) > Customer tested the script and saw two (2) main issues: > > #1 the script stops after some time after Pass 5 is completed, but customer > wants it to be keep running until it exits manually: > > Wed Feb 8 18:21:26 2017,<IP>:61442, W, 2056, > /export/<dir>/batchCmd.EQUITYSP-3443.D1T.20170208_131351.try1.log > Wed Feb 8 18:21:26 2017,<IP>:55811, W, 4212, > /export/<dir>/batchCmd.SWAPS-322.D1T.20170208_115118.try5.log > Wed Feb 8 18:21:26 2017,<IP>:55811, W, 1868, > /export/<dir>/batchCmd.EQUITYSSO-94.D1T.20170208_131200.try2.log > Wed Feb 8 18:21:26 2017,<IP>:65027, nfsd.proc3.lookup, > mergesession.mgd.scf, WARNING: Number of errors: 1, skipped probes: 1 > WARNING: /usr/bin/staprun exited with status: 1 > Pass 5: run completed in 0usr/340sys/1000real ms. > Pass 5: run failed. [man error::pass5] Hm, I wonder what the error message was. But anyway, try running the script again with # stap --suppress-handler-errors [....] > #2 the time stamp is wrong in the output (almost 5 hours off) > Script: > Wed Feb 8 18:21:26 > Server > [root@server tmp]# date > Wed Feb 8 13:23:13 EST 2017 Probably a timezone issue: in the script, change ctime(...) to tz_ctime(...) to convert to the local time zone. I am not sure where the extra two minutes' differential comes in though.
root@server tmp]# stap --suppress-handler-errors nfsd-trace-sb.stp TIME CLIENT R/W OFFSET(KB) PATHNAME Thu Feb 9 17:28:52 2017,<IP3>.42:65027, W, 9456, Thu Feb 9 17:28:54 2017,<IP3>.112:65027, W, 5716, Thu Feb 9 17:28:54 2017,<IP3>.112:65027, nfsd.proc3.commit, 4096,5853184, Thu Feb 9 17:28:56 2017,<IP3>.42:65027, W, 5712, Thu Feb 9 17:28:56 2017,<IP3>.42:65027, nfsd.proc3.commit, 4096,5849088, Thu Feb 9 17:28:56 2017,<IP3>.104:23298, W, 9460, Thu Feb 9 17:28:57 2017,<IP3>.41:65283, W, 9456, Thu Feb 9 17:28:57 2017,<IP3>.104:23298, W, 9460, Thu Feb 9 17:28:57 2017,<IP3>.112:65027, W, 5716, Thu Feb 9 17:28:57 2017,<IP3>.112:65027, nfsd.proc3.commit, 4096,5853184, Thu Feb 9 17:28:57 2017,<IP3>.112:65027, W, 5716, Thu Feb 9 17:28:57 2017,<IP3>.112:65027, nfsd.proc3.commit, 4096,5853184, Thu Feb 9 17:28:57 2017,<IP3>.72:65027, W, 9456, Thu Feb 9 17:28:57 2017,<IP3>.112:65027, W, 5716, Thu Feb 9 17:28:57 2017,<IP3>.112:65027, nfsd.proc3.commit, 4096,5853184, Thu Feb 9 17:28:58 2017,<IP3>.104:23298, W, 9460, Thu Feb 9 17:28:59 2017,<IP3>.112:65027, W, 5716, Thu Feb 9 17:28:59 2017,<IP3>.112:65027, nfsd.proc3.commit, 4096,5853184, Thu Feb 9 17:29:01 2017,<IP2>.52:13315, nfsd.proc3.lookup, .login, Thu Feb 9 17:29:01 2017,<IP2>.52:13315, R, 0, Thu Feb 9 17:29:01 2017,<IP2>.52:13315, nfsd.proc3.lookup, .cshrc, Thu Feb 9 17:29:01 2017,<IP2>.52:13315, R, 0, Thu Feb 9 17:29:01 2017,<IP2>.52:13315, nfsd.proc3.lookup, .netrc, Thu Feb 9 17:29:01 2017,<IP2>.52:13315, nfsd.proc3.lookup, .profile, Thu Feb 9 17:29:01 2017,<IP2>.52:13315, R, 0, Thu Feb 9 17:29:01 2017,<IP2>.52:13315, nfsd.proc3.lookup, .rhosts, Thu Feb 9 17:29:01 2017,<IP2>.52:13315, nfsd.proc3.lookup, .shosts, Thu Feb 9 17:29:01 2017,<IP3>.50:65283, W, 9452, Thu Feb 9 17:29:01 2017,<IP3>.84:65283, nfsd.proc3.lookup, [root@server tmp]# stap -v nfsd-trace-sb.stp -g Pass 1: parsed user script and 115 library scripts using 219344virt/32464res/3264shr/29432data kb, in 240usr/10sys/254real ms. Pass 2: analyzed script: 23 probes, 63 functions, 9 embeds, 1 global using 344996virt/159532res/4620shr/155084data kb, in 4550usr/510sys/5054real ms. Pass 3: using cached /root/.systemtap/cache/55/stap_550f690a9cea68652910e0cb76b5ebc9_84452.c Pass 4: using cached /root/.systemtap/cache/55/stap_550f690a9cea68652910e0cb76b5ebc9_84452.ko Pass 5: starting run. TIME CLIENT R/W OFFSET(KB) PATHNAME ERROR: read fault [man error::fault] at 0x0000000000000038 (addr) near operator '->' at nfsd-trace-sb.stp:26:31 Thu Feb 9 17:32:11 2017,<IP>:65283, W, 9456, WARNING: Number of errors: 1, skipped probes: 1 WARNING: /usr/bin/staprun exited with status: 1 Pass 5: run completed in 10usr/350sys/1626real ms. Pass 5: run failed. [man error::pass5]
OK, ERROR: read fault [man error::fault] at 0x0000000000000038 (addr) near operator '->' at nfsd-trace-sb.stp:26:31 answers my question about what the error was. It's not a showstopper; must just be some unusual combination of nfs-operation and timing. The --suppress-handler-errors option works on it. Consider # echo --suppress-handler-errors >> $HOME/.systemtap/rc to make it a permanent default option.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2301