Bug 1416569

Summary: [LLNL 7.4 FEAT] provide a way to clear stats
Product: Red Hat Enterprise Linux 7 Reporter: Ben Woodard <woodard>
Component: device-mapper-multipathAssignee: Ben Marzinski <bmarzins>
Status: CLOSED ERRATA QA Contact: Lin Li <lilin>
Severity: medium Docs Contact: Steven J. Levine <slevine>
Priority: medium    
Version: 7.4CC: agk, bmarzins, heinzm, hutter2, lilin, msnitzer, mthacker, prajnoha, tgummels, woodard
Target Milestone: rcKeywords: FutureFeature
Target Release: 7.4   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: device-mapper-multipath-0.4.9-101.el7 Doc Type: Enhancement
Doc Text:
New "multipathd reset multipaths stats" commands Multipath now supports two new "multipathd" commands: "multipathd reset multipaths stats" and "multipathd reset multipath" _dev_ "stats". These commands reset the device stats that `multipathd` tracks for all the devices, or the specified device, respectively. This allows users to reset their device stats after they make changes to them.
Story Points: ---
Clone Of:
: 1448945 (view as bug list) Environment:
Last Closed: 2017-08-01 16:34:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1289208, 1332226, 1384257, 1446211, 1448945    

Description Ben Woodard 2017-01-25 20:33:04 UTC
Description of problem:
We recently had a drive that was constantly disappearing and reappearing due to a misbehaving SAS phy .  We were able to notice the drive by looking at it's multipath errors counters:


# multipathd show multipaths stats
name              path_faults switch_grp map_loads total_q_time q_timeouts
35000c5008583fe17 0           0          1         0            0         
35000c5008584088b 0           0          1         0            0         
35000c5008582ac83 0           0          1         0            0         
...
35000c500855d78b7 0           0          1         0            0         
35000c500855d7c0b 0           0          1         0            0         
35000c50085688837 13989       0          6         2            1         
35000c500855d7d0f 0           0          1         0            0         


We fixed the phy, and now we want to zero out the path_faults counter. We think that there should be a new command in multipathd which allows us to clear the stats. 

Reloading doesn't clear the stats. i.e.
# multipathd reload multipath 35000c50085688837 
didn't help.

Restarting the service did clear the counts but doing this on a large SAS array has too many unwanted side effects.

[root@mpath-1 ~]# multipathd show multipaths stats
name   path_faults switch_grp map_loads total_q_time q_timeouts
mpathb 2           1          1         0            0        
mpathc 0           0          1         0            0        
 
[root@mpath-1 ~]# service multipathd restart
ok
Stopping multipathd daemon: [  OK  ]
Starting multipathd daemon: [  OK  ]
 
[root@mpath-1 ~]# multipathd show multipaths stats
name   path_faults switch_grp map_loads total_q_time q_timeouts
mpathb 0           0          1         0            0        
mpathc 0           0          1         0            0



We realize that this will likely end up needing two parts. One part in the kernel which defines the interface to clear the stats and one part in userspace to add a command like:

# multipath clear multipaths stats 35000c50085688837

We also understand that getting this resolved will require coordination with upstream and will take some time.

Comment 1 Ben Marzinski 2017-01-26 09:04:12 UTC
Actually, these stats are tracked completely in multipathd, and adding a new multipathd command is pretty straightforward and isolated, so this should be able to make 7.4

Comment 2 Ben Marzinski 2017-02-16 00:31:09 UTC
A fix for this has been posted upstream.  I'll pull it into RHEL-7.4 when I get a pm-ack.

Comment 3 Mark Thacker 2017-02-16 01:38:29 UTC
adding pm_ack

Comment 4 Ben Marzinski 2017-02-17 00:19:50 UTC
I've added two new multipathd commands

reset multipaths stats
reset multipath <dev> stats

The first resets the stats on all multipath devices. The second resets the stats on the specified device.

Comment 6 Travis Gummels 2017-02-17 19:06:42 UTC
LLNL,

Packages are here for testing.

http://people.redhat.com/tgummels/partners/.lc-d839231e87c805b7b71e764e0ed05825

Travis

Comment 9 Steven J. Levine 2017-05-12 17:23:25 UTC
Ben (Marzinski): In the latest RHEL 7.4 release that I have built (it's a couple of weeks old, admittedly) I'm not seeing this new option in the multipathd(8) man page. Is it there in the most recent builds? Should it be there?

In the DM-Multipath document I refer to that man page for the documentation for the specific commands.

Comment 10 Ben Marzinski 2017-05-30 16:57:25 UTC
Yeah, that should be in the man page, and isn't. I've fixed that, but it won't make it in until the next time the package is respun for another reason.

Comment 13 Lin Li 2017-06-12 02:53:15 UTC
Verified on device-mapper-multipath-0.4.9-111.el7
[root@storageqe-06 ~]# rpm -qa | grep multipath
device-mapper-multipath-0.4.9-111.el7.x86_64
device-mapper-multipath-libs-0.4.9-111.el7.x86_64

[root@storageqe-06 ~]# multipathd  -k
multipathd> --help
multipath-tools v0.4.9 (05/33, 2016)
CLI commands reference:
reset maps|multipaths stats   <------------------  
reset map|multipath $map stats  <------------------


[root@storageqe-06 ~]# multipathd show multipaths stats
name                              path_faults switch_grp map_loads total_q_time q_timeouts
360a98000324669436c2b45666c56786d 0           0          1         0            0         
360a98000324669436c2b45666c56786f 0           0          1         0            0         
360a98000324669436c2b45666c567871 0           0          1         0            0         
360a98000324669436c2b45666c567873 0           0          1         0            0         
360a98000324669436c2b45666c567875 0           0          0         0            0         

[root@storageqe-06 ~]# multipathd reset multipath 360a98000324669436c2b45666c56786d stats
ok

[root@storageqe-06 ~]# multipathd show multipaths stats
name                              path_faults switch_grp map_loads total_q_time q_timeouts
360a98000324669436c2b45666c56786d 0           0          0         0            0         
360a98000324669436c2b45666c56786f 0           0          1         0            0         
360a98000324669436c2b45666c567871 0           0          1         0            0         
360a98000324669436c2b45666c567873 0           0          1         0            0         
360a98000324669436c2b45666c567875 0           0          0         0            0         


[root@storageqe-06 ~]# multipathd reset multipaths stats
ok
[root@storageqe-06 ~]# multipathd show multipaths stats
name                              path_faults switch_grp map_loads total_q_time q_timeouts
360a98000324669436c2b45666c56786d 0           0          0         0            0         
360a98000324669436c2b45666c56786f 0           0          0         0            0         
360a98000324669436c2b45666c567871 0           0          0         0            0         
360a98000324669436c2b45666c567873 0           0          0         0            0         
360a98000324669436c2b45666c567875 0           0          0         0            0

Comment 14 errata-xmlrpc 2017-08-01 16:34:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1961