Hide Forgot
Description of problem: We recently had a drive that was constantly disappearing and reappearing due to a misbehaving SAS phy . We were able to notice the drive by looking at it's multipath errors counters: # multipathd show multipaths stats name path_faults switch_grp map_loads total_q_time q_timeouts 35000c5008583fe17 0 0 1 0 0 35000c5008584088b 0 0 1 0 0 35000c5008582ac83 0 0 1 0 0 ... 35000c500855d78b7 0 0 1 0 0 35000c500855d7c0b 0 0 1 0 0 35000c50085688837 13989 0 6 2 1 35000c500855d7d0f 0 0 1 0 0 We fixed the phy, and now we want to zero out the path_faults counter. We think that there should be a new command in multipathd which allows us to clear the stats. Reloading doesn't clear the stats. i.e. # multipathd reload multipath 35000c50085688837 didn't help. Restarting the service did clear the counts but doing this on a large SAS array has too many unwanted side effects. [root@mpath-1 ~]# multipathd show multipaths stats name path_faults switch_grp map_loads total_q_time q_timeouts mpathb 2 1 1 0 0 mpathc 0 0 1 0 0 [root@mpath-1 ~]# service multipathd restart ok Stopping multipathd daemon: [ OK ] Starting multipathd daemon: [ OK ] [root@mpath-1 ~]# multipathd show multipaths stats name path_faults switch_grp map_loads total_q_time q_timeouts mpathb 0 0 1 0 0 mpathc 0 0 1 0 0 We realize that this will likely end up needing two parts. One part in the kernel which defines the interface to clear the stats and one part in userspace to add a command like: # multipath clear multipaths stats 35000c50085688837 We also understand that getting this resolved will require coordination with upstream and will take some time.
Actually, these stats are tracked completely in multipathd, and adding a new multipathd command is pretty straightforward and isolated, so this should be able to make 7.4
A fix for this has been posted upstream. I'll pull it into RHEL-7.4 when I get a pm-ack.
adding pm_ack
I've added two new multipathd commands reset multipaths stats reset multipath <dev> stats The first resets the stats on all multipath devices. The second resets the stats on the specified device.
LLNL, Packages are here for testing. http://people.redhat.com/tgummels/partners/.lc-d839231e87c805b7b71e764e0ed05825 Travis
Ben (Marzinski): In the latest RHEL 7.4 release that I have built (it's a couple of weeks old, admittedly) I'm not seeing this new option in the multipathd(8) man page. Is it there in the most recent builds? Should it be there? In the DM-Multipath document I refer to that man page for the documentation for the specific commands.
Yeah, that should be in the man page, and isn't. I've fixed that, but it won't make it in until the next time the package is respun for another reason.
Verified on device-mapper-multipath-0.4.9-111.el7 [root@storageqe-06 ~]# rpm -qa | grep multipath device-mapper-multipath-0.4.9-111.el7.x86_64 device-mapper-multipath-libs-0.4.9-111.el7.x86_64 [root@storageqe-06 ~]# multipathd -k multipathd> --help multipath-tools v0.4.9 (05/33, 2016) CLI commands reference: reset maps|multipaths stats <------------------ reset map|multipath $map stats <------------------ [root@storageqe-06 ~]# multipathd show multipaths stats name path_faults switch_grp map_loads total_q_time q_timeouts 360a98000324669436c2b45666c56786d 0 0 1 0 0 360a98000324669436c2b45666c56786f 0 0 1 0 0 360a98000324669436c2b45666c567871 0 0 1 0 0 360a98000324669436c2b45666c567873 0 0 1 0 0 360a98000324669436c2b45666c567875 0 0 0 0 0 [root@storageqe-06 ~]# multipathd reset multipath 360a98000324669436c2b45666c56786d stats ok [root@storageqe-06 ~]# multipathd show multipaths stats name path_faults switch_grp map_loads total_q_time q_timeouts 360a98000324669436c2b45666c56786d 0 0 0 0 0 360a98000324669436c2b45666c56786f 0 0 1 0 0 360a98000324669436c2b45666c567871 0 0 1 0 0 360a98000324669436c2b45666c567873 0 0 1 0 0 360a98000324669436c2b45666c567875 0 0 0 0 0 [root@storageqe-06 ~]# multipathd reset multipaths stats ok [root@storageqe-06 ~]# multipathd show multipaths stats name path_faults switch_grp map_loads total_q_time q_timeouts 360a98000324669436c2b45666c56786d 0 0 0 0 0 360a98000324669436c2b45666c56786f 0 0 0 0 0 360a98000324669436c2b45666c567871 0 0 0 0 0 360a98000324669436c2b45666c567873 0 0 0 0 0 360a98000324669436c2b45666c567875 0 0 0 0 0
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:1961