Bugzilla will be upgraded to version 5.0 on a still to be determined date in the near future. The original upgrade date has been delayed.
Bug 1416569 - [LLNL 7.4 FEAT] provide a way to clear stats
[LLNL 7.4 FEAT] provide a way to clear stats
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: device-mapper-multipath (Show other bugs)
7.4
x86_64 Linux
medium Severity medium
: rc
: 7.4
Assigned To: Ben Marzinski
Lin Li
Steven J. Levine
: FutureFeature
Depends On:
Blocks: 1289208 1332226 1446211 1384257 1448945
  Show dependency treegraph
 
Reported: 2017-01-25 15:33 EST by Ben Woodard
Modified: 2017-08-01 12:34 EDT (History)
10 users (show)

See Also:
Fixed In Version: device-mapper-multipath-0.4.9-101.el7
Doc Type: Enhancement
Doc Text:
New "multipathd reset multipaths stats" commands Multipath now supports two new "multipathd" commands: "multipathd reset multipaths stats" and "multipathd reset multipath" _dev_ "stats". These commands reset the device stats that `multipathd` tracks for all the devices, or the specified device, respectively. This allows users to reset their device stats after they make changes to them.
Story Points: ---
Clone Of:
: 1448945 (view as bug list)
Environment:
Last Closed: 2017-08-01 12:34:26 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:1961 normal SHIPPED_LIVE device-mapper-multipath bug fix and enhancement update 2017-08-01 13:56:09 EDT

  None (edit)
Description Ben Woodard 2017-01-25 15:33:04 EST
Description of problem:
We recently had a drive that was constantly disappearing and reappearing due to a misbehaving SAS phy .  We were able to notice the drive by looking at it's multipath errors counters:


# multipathd show multipaths stats
name              path_faults switch_grp map_loads total_q_time q_timeouts
35000c5008583fe17 0           0          1         0            0         
35000c5008584088b 0           0          1         0            0         
35000c5008582ac83 0           0          1         0            0         
...
35000c500855d78b7 0           0          1         0            0         
35000c500855d7c0b 0           0          1         0            0         
35000c50085688837 13989       0          6         2            1         
35000c500855d7d0f 0           0          1         0            0         


We fixed the phy, and now we want to zero out the path_faults counter. We think that there should be a new command in multipathd which allows us to clear the stats. 

Reloading doesn't clear the stats. i.e.
# multipathd reload multipath 35000c50085688837 
didn't help.

Restarting the service did clear the counts but doing this on a large SAS array has too many unwanted side effects.

[root@mpath-1 ~]# multipathd show multipaths stats
name   path_faults switch_grp map_loads total_q_time q_timeouts
mpathb 2           1          1         0            0        
mpathc 0           0          1         0            0        
 
[root@mpath-1 ~]# service multipathd restart
ok
Stopping multipathd daemon: [  OK  ]
Starting multipathd daemon: [  OK  ]
 
[root@mpath-1 ~]# multipathd show multipaths stats
name   path_faults switch_grp map_loads total_q_time q_timeouts
mpathb 0           0          1         0            0        
mpathc 0           0          1         0            0



We realize that this will likely end up needing two parts. One part in the kernel which defines the interface to clear the stats and one part in userspace to add a command like:

# multipath clear multipaths stats 35000c50085688837

We also understand that getting this resolved will require coordination with upstream and will take some time.
Comment 1 Ben Marzinski 2017-01-26 04:04:12 EST
Actually, these stats are tracked completely in multipathd, and adding a new multipathd command is pretty straightforward and isolated, so this should be able to make 7.4
Comment 2 Ben Marzinski 2017-02-15 19:31:09 EST
A fix for this has been posted upstream.  I'll pull it into RHEL-7.4 when I get a pm-ack.
Comment 3 Mark Thacker 2017-02-15 20:38:29 EST
adding pm_ack
Comment 4 Ben Marzinski 2017-02-16 19:19:50 EST
I've added two new multipathd commands

reset multipaths stats
reset multipath <dev> stats

The first resets the stats on all multipath devices. The second resets the stats on the specified device.
Comment 6 Travis Gummels 2017-02-17 14:06:42 EST
LLNL,

Packages are here for testing.

http://people.redhat.com/tgummels/partners/.lc-d839231e87c805b7b71e764e0ed05825

Travis
Comment 9 Steven J. Levine 2017-05-12 13:23:25 EDT
Ben (Marzinski): In the latest RHEL 7.4 release that I have built (it's a couple of weeks old, admittedly) I'm not seeing this new option in the multipathd(8) man page. Is it there in the most recent builds? Should it be there?

In the DM-Multipath document I refer to that man page for the documentation for the specific commands.
Comment 10 Ben Marzinski 2017-05-30 12:57:25 EDT
Yeah, that should be in the man page, and isn't. I've fixed that, but it won't make it in until the next time the package is respun for another reason.
Comment 13 Lin Li 2017-06-11 22:53:15 EDT
Verified on device-mapper-multipath-0.4.9-111.el7
[root@storageqe-06 ~]# rpm -qa | grep multipath
device-mapper-multipath-0.4.9-111.el7.x86_64
device-mapper-multipath-libs-0.4.9-111.el7.x86_64

[root@storageqe-06 ~]# multipathd  -k
multipathd> --help
multipath-tools v0.4.9 (05/33, 2016)
CLI commands reference:
reset maps|multipaths stats   <------------------  
reset map|multipath $map stats  <------------------


[root@storageqe-06 ~]# multipathd show multipaths stats
name                              path_faults switch_grp map_loads total_q_time q_timeouts
360a98000324669436c2b45666c56786d 0           0          1         0            0         
360a98000324669436c2b45666c56786f 0           0          1         0            0         
360a98000324669436c2b45666c567871 0           0          1         0            0         
360a98000324669436c2b45666c567873 0           0          1         0            0         
360a98000324669436c2b45666c567875 0           0          0         0            0         

[root@storageqe-06 ~]# multipathd reset multipath 360a98000324669436c2b45666c56786d stats
ok

[root@storageqe-06 ~]# multipathd show multipaths stats
name                              path_faults switch_grp map_loads total_q_time q_timeouts
360a98000324669436c2b45666c56786d 0           0          0         0            0         
360a98000324669436c2b45666c56786f 0           0          1         0            0         
360a98000324669436c2b45666c567871 0           0          1         0            0         
360a98000324669436c2b45666c567873 0           0          1         0            0         
360a98000324669436c2b45666c567875 0           0          0         0            0         


[root@storageqe-06 ~]# multipathd reset multipaths stats
ok
[root@storageqe-06 ~]# multipathd show multipaths stats
name                              path_faults switch_grp map_loads total_q_time q_timeouts
360a98000324669436c2b45666c56786d 0           0          0         0            0         
360a98000324669436c2b45666c56786f 0           0          0         0            0         
360a98000324669436c2b45666c567871 0           0          0         0            0         
360a98000324669436c2b45666c567873 0           0          0         0            0         
360a98000324669436c2b45666c567875 0           0          0         0            0
Comment 14 errata-xmlrpc 2017-08-01 12:34:26 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1961

Note You need to log in before you can comment on or make changes to this bug.