Bug 1554516
Summary: | multipathd should show per-disk path_faults | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Tony Hutter <hutter2> | |
Component: | device-mapper-multipath | Assignee: | Ben Marzinski <bmarzins> | |
Status: | CLOSED ERRATA | QA Contact: | Lin Li <lilin> | |
Severity: | low | Docs Contact: | Steven J. Levine <slevine> | |
Priority: | unspecified | |||
Version: | 7.4 | CC: | agk, bmarzins, heinzm, jbrassow, lilin, msnitzer, prajnoha, rhandlin | |
Target Milestone: | rc | |||
Target Release: | --- | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | device-mapper-multipath-0.4.9-120.el7 | Doc Type: | Release Note | |
Doc Text: |
New `%0` wildcard added for the "multipathd show paths format" command to show path failures
The "multipathd show paths format" command now supports the `%0` wildcard to display path failures. Support for this wildcard makes it easier for users to track which paths have been failing in a multipath device.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1627884 (view as bug list) | Environment: | ||
Last Closed: | 2018-10-30 11:27:28 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1627884 |
Description
Tony Hutter
2018-03-12 20:18:26 UTC
Currently, the only way to get this information is to run "dmsetup status", and grab the information from there. # dmsetup status mpathc 0 488120320 multipath 2 0 0 0 2 1 A 0 1 2 8:32 A 0 0 1 E 0 1 2 8:64 A 1 0 1 This is a multipath device with two paths, 8:32 and 8:64. The letter immediately following the path major:minor tells the path state (either A for active or F for failed). Both paths are currently active. The number after that is the number of times the path has failed. So, 8:32 has never failed, and 8:64 has failed one time. I can add this information to the path format wildcards (probably as %x, to match the multipath wildcard, so that you could display it with multipathd's formatted output, using something like # multipathd show paths format "%d %x" > I can add this information to the path format wildcards (probably as %x, to > match the multipath wildcard, so that you could display it with multipathd's > formatted output, using something like That would be amazing! I see that 'multipath' has both: %x failures %0 path_faults Could you tell me the difference between the two? My guess is that we'd be interested in both. Background: We regularly monitor multipath stats with splunk, and this would make it easy to see which of the SAS links to our disk is bad. I'd also like to integrate it into the ZFS commands so that we could view mpath stats inline with the disk status, similar to what we've done with other stats: https://github.com/zfsonlinux/zfs/pull/7245 https://github.com/zfsonlinux/zfs/pull/7178 failures tells the number of times a multipath device has lost all of its paths, and stopped queueing IO. these are cases where IO going to the multipath device could get failed up to a higher layer, such as the filesystem. path_faults is the number of times that any of a multipath device's paths has switched from the active to the failed state. If you think %0 would make more sense for the path's failure wildcard, I'm fine with using either. Yea, it sounds like path_faults (%0) would be better for 'paths', since it is per-slave path. Path failures are now viewable using the %0 wildcard. Verified on device-mapper-multipath-0.4.9-122.el7 1, [root@storageqe-06 ~]# rpm -qa | grep multipath device-mapper-multipath-libs-0.4.9-122.el7.x86_64 device-mapper-multipath-0.4.9-122.el7.x86_64 2,[root@storageqe-06 ~]# multipath -ll 360a98000324669436c2b45666c56786d dm-2 NETAPP ,LUN size=20G features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='1 alua' wp=rw |-+- policy='service-time 0' prio=50 status=active | |- 1:0:0:0 sdl 8:176 active ready running | `- 4:0:1:0 sdg 8:96 active ready running `-+- policy='service-time 0' prio=10 status=enabled |- 1:0:1:0 sdq 65:0 active ready running `- 4:0:0:0 sdb 8:16 active ready running 360a98000324669436c2b45666c567875 dm-0 NETAPP ,LUN size=2.0G features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='1 alua' wp=rw |-+- policy='service-time 0' prio=50 status=active | |- 1:0:0:4 sdp 8:240 active ready running | `- 4:0:1:4 sdk 8:160 active ready running `-+- policy='service-time 0' prio=10 status=enabled |- 1:0:1:4 sdu 65:64 active ready running `- 4:0:0:4 sdf 8:80 active ready running 360a98000324669436c2b45666c567873 dm-1 NETAPP ,LUN size=2.0G features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='1 alua' wp=rw |-+- policy='service-time 0' prio=50 status=active | |- 1:0:0:3 sdo 8:224 active ready running | `- 4:0:1:3 sdj 8:144 active ready running `-+- policy='service-time 0' prio=10 status=enabled |- 1:0:1:3 sdt 65:48 active ready running `- 4:0:0:3 sde 8:64 active ready running 360a98000324669436c2b45666c567871 dm-3 NETAPP ,LUN size=2.0G features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='1 alua' wp=rw |-+- policy='service-time 0' prio=50 status=active | |- 1:0:0:2 sdn 8:208 active ready running | `- 4:0:1:2 sdi 8:128 active ready running `-+- policy='service-time 0' prio=10 status=enabled |- 1:0:1:2 sds 65:32 active ready running `- 4:0:0:2 sdd 8:48 active ready running 360a98000324669436c2b45666c56786f dm-6 NETAPP ,LUN size=2.0G features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='1 alua' wp=rw |-+- policy='service-time 0' prio=50 status=active | |- 1:0:0:1 sdm 8:192 active ready running | `- 4:0:1:1 sdh 8:112 active ready running `-+- policy='service-time 0' prio=10 status=enabled |- 1:0:1:1 sdr 65:16 active ready running `- 4:0:0:1 sdc 8:32 active ready running 3, [root@storageqe-06 ~]# multipathd list multipaths stats name path_faults switch_grp map_loads total_q_time q_timeouts 360a98000324669436c2b45666c56786d 0 0 1 0 0 360a98000324669436c2b45666c56786f 0 0 1 0 0 360a98000324669436c2b45666c567871 0 0 1 0 0 360a98000324669436c2b45666c567873 0 0 1 0 0 360a98000324669436c2b45666c567875 0 0 1 0 0 4,[root@storageqe-06 ~]# multipathd show paths format "%0" failures 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3236 |