1651772 – VDSM's multipath health check output is incorrect

Bug 1651772 - VDSM's multipath health check output is incorrect

Summary: VDSM's multipath health check output is incorrect

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	vdsm
Sub Component:
Version:	4.2.6
Hardware:	Unspecified
OS:	Linux
Priority:	high
Severity:	medium
Target Milestone:	ovirt-4.4.1
Target Release:	4.3.0
Assignee:	Amit Bawer
QA Contact:	Ilan Zuckerman
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-11-20 18:51 UTC by Gordon Watson
Modified:	2022-03-13 16:09 UTC (History)
CC List:	14 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-09-23 16:16:06 UTC
oVirt Team:	Storage
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	RHV-37514	None	None	None	2021-12-10 18:19:22 UTC
Red Hat Knowledge Base (Solution)	3707141	None	None	None	2019-02-11 19:13:12 UTC
Red Hat Product Errata	RHBA-2020:3822	None	None	None	2020-09-23 16:16:22 UTC
oVirt gerrit	105938	master	ABANDONED	mpathhealth: Exclude paths with no sysfs device names	2021-01-15 17:59:30 UTC
oVirt gerrit	105939	master	ABANDONED	udev: Drop device events with no sysfs name translation for MD_PATH	2021-01-15 17:59:29 UTC
oVirt gerrit	105948	master	MERGED	mpathhealth, udev: Remove device name resolution from monitored paths	2021-01-15 17:59:29 UTC
oVirt gerrit	106014	master	MERGED	mpathhealth, udev: Switch to a polling multipath health monitoring	2021-01-15 17:59:31 UTC

Description Gordon Watson 2018-11-20 18:51:48 UTC

Description of problem:

VDSM is reporting failed multipath paths when there are none, and the devices that it reports are incorrect.


Version-Release number of selected component (if applicable):

RHV 4.2.6
RHEL 7.5 host;
  kernel-3.10.0-862.11.6.el7.x86_64 
  vdsm-4.20.35-1.el7ev.x86_64       

How reproducible:

Not (yet).

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 5 Sandro Bonazzola 2019-01-28 09:40:17 UTC

This bug has not been marked as blocker for oVirt 4.3.0.
Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1.

Comment 9 Amit Bawer 2019-12-25 09:50:55 UTC

I think I see what's going on here:

The multipath health monitor works as following:

1. Registers to udev multipath events listener and starts gathering udev device events.

2. Starts the health monitor:

2.1. Runs "dmsetup status --target multipath"
2.2. Parses the command output from (2.1) and populates its state dictionary:

To exemplify, an output line of "DM_UUID: 0 314572800 multipath 1 0 0 3 1 E 0 1 0 8:64 A 0 E 0 1 0 8:176 F"
has two "Enabled" paths, "8:64" and "8:176" where first is "Active" and the second is "Failed",
so "failed_paths" for the relevant DM_UUID dictionary entry would be updated with device_name("8:176").

device_name("8:176") translation uses os.path.basename(os.path.realpath('/sys/dev/block/8:176'))
which is supposed to turn the "major:minor" sysfs notation into its "sdX" device name.
This works fine as long as there is a device attached to the system for this "major:minor" entry,
however if it was detached after step (2.1), the sysfs path being searched will no longer exist
and os.path.realpath(...) would return its input as it came, causing "8:176" to be added to failed_paths instead of "sdX".

3. Handles the gethered udev listener events periodically and updates the state dictionary:

The events are correlated with the relevant LUN by their DM_UUID field, and in case of reinstating/failing a device,
the DM_PATH field is used to correlate with the relevant device already in "failed_paths",
after being translated by the same device_name(...) method described in (2.2):

- If the DM_PATH="8:176" has only a failing event, then it would be left as is in the "failed_paths" set for this entry,
leaving what we see in the health report in "major:minor" notation.

- If the DM_PATH="8:176" has a reinstate udev event, then it would have a "sdX" symlink in sysfs when device_name(...) is called
to handle it, translating the DM_PATH="8:176" to its named "sdX" notation. This would fail the correlation with the original "8:176"
failed_paths entry, leaving it in the "failed_paths" report.

As for the mismatching "multipath -l" comparison with the "sdX" named devices in the health report,
this is possible when running the command later in time, after paths and devices statuses have changed since last health report.

Comment 10 Amit Bawer 2019-12-29 09:14:26 UTC

Hi Ben,

Current approach to resolve the uncorrelated device paths between the "dmsetup status" output used for the health initial report and udev events MD_PATH info, is to drop the sysfs name resolution used to translate major:minor device notations to their sd* names. 
Since we only use those names for logging and the actual relevant information is the quantity of the failing paths, this shouldn't impose a problem, at least as far Vdsm and Engine should consider. However, if you have inputs on this they would be appreciated.

Comment 11 Amit Bawer 2020-01-01 10:48:06 UTC

Recreation scenario:

1) systemctl stop vdsmd

2) Choose LUN not used for any domain

In this example we will work on dm-5:

multipath -ll
...
360014053abd2bb76ac742f989490c053 dm-5 LIO-ORG,iqn1-06
size=100G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='round-robin 0' prio=50 status=active
  |- 3:0:0:6 sdp  8:240  active ready running
  |- 4:0:0:6 sdz  65:144 active ready running
  |- 5:0:0:6 sdaj 66:48  active ready running
  `- 2:0:0:6 sdf  8:80   active ready running


3) Take one device path offline: 

echo offline > /sys/dev/block/8\:80/device/state

4) After a few seconds run multipath -ll to see device for 8:80 went faulty

multipath -ll
...
360014053abd2bb76ac742f989490c053 dm-5 LIO-ORG,iqn1-06
size=100G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='round-robin 0' prio=50 status=active
  |- 3:0:0:6 sdp  8:240  active ready running
  |- 4:0:0:6 sdz  65:144 active ready running
  |- 5:0:0:6 sdaj 66:48  active ready running
  `- 2:0:0:6 sdf  8:80   failed faulty offline

5) Backup the original dmsetup: mv /sbin/dmsetup /sbin/dmsetup.orig 


6) Replace original /sbin/dmsetup file with the following bash script,
make sure to set DEV_NAME value with the device name for your 
offline device from step (4):

#!/bin/bash
DEV_NAME="sdf"
/sbin/dmsetup.orig  $@
rc=$?
[[ -f /sys/block/${DEV_NAME}/device/delete ]] && echo 1 >  /sys/block/${DEV_NAME}/device/delete
exit $rc

7) chmod +x /sbin/dmsetup

8) systemctl start vdsmd

9) After Host becomes active, you will in see in vdsm.log multipath_health report
for the major:minor numbers of the detached multipath device instead of "sdf":

2020-01-01 05:06:22,947-0500 INFO  (jsonrpc/7) [vdsm.api] FINISH multipath_health return={'360014053abd2bb76ac742f989490c053': {'failed_paths': ['8:80'], 'valid_paths': 3}}

10) Multipath will show the device "sdf" path is missing for dm-5:

multipath -ll
...
360014053abd2bb76ac742f989490c053 dm-5 LIO-ORG,iqn1-06
size=100G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='round-robin 0' prio=50 status=active
  |- 3:0:0:6 sdp  8:240  active ready running
  |- 4:0:0:6 sdz  65:144 active ready running
  `- 5:0:0:6 sdaj 66:48  active ready running

11) You can re-discover the deleted device path from the "New Domain" web UI dialog for iSCSi type and "discover all" for the target machine ip and port.

12) After rediscovering all dm-5 paths, you can put some of them offline as in step (3) and get a mixed report with "sdX" and "8:80" in failed_paths.

- Remember to restore dmsetup when done: mv /sbin/dmsetup.orig /sbin/dmsetup

Comment 12 Ben Marzinski 2020-01-06 17:58:39 UTC

(In reply to Amit Bawer from comment #10)
> Hi Ben,
> 
> Current approach to resolve the uncorrelated device paths between the
> "dmsetup status" output used for the health initial report and udev events
> MD_PATH info, is to drop the sysfs name resolution used to translate
> major:minor device notations to their sd* names. 
> Since we only use those names for logging and the actual relevant
> information is the quantity of the failing paths, this shouldn't impose a
> problem, at least as far Vdsm and Engine should consider. However, if you
> have inputs on this they would be appreciated.

This seems reasonable to me.  Another alternative is to get the info from multipathd itself. Obviously, this involves talking to the daemon, and there could be delays if the daemon is getting swamped with events, or isn't running.  But both of those things seem to be something you would be interested in knowing anyways.

# multipathd show maps json

will give you json formatted information about all the multipath devices, including all the paths in device name and major:minor format, as well as their current states (whether they are offline or online, what their current kernel state is, and the last result of the multipathd checker. For reference, dmsetup is showing you their current kernel state).

Comment 13 Amit Bawer 2020-01-07 09:50:54 UTC

(In reply to Ben Marzinski from comment #12)
Thanks Ben,

Current intention is to have a dedicated multipath health monitoring thread, running at a fixed polling cycles (10 seconds by default, configured as part of the storage monitoring intervals of Vdsm). This direction drops the handling of udev events and simply invokes "dmsetup status --target multipath" command at fixed intervals for getting the current devices paths statuses. 

Given the "multipathd show maps json" option mentioned, does it have an edge over using "dmsetup status" when we poll now every X seconds?

If the multipathd daemon is not running we might need a different/additional status report since we only report when there are failed paths.
"dmsetup status" would show the last known status, even if multipathd is down. In that case "multipathd show maps json" would show an empty mapping.

However multipathd is periodically re-invoked as part of Vdsm storage refresh, so I am not sure if using the latter way has extra benefits in this case.

Comment 14 Ben Marzinski 2020-01-07 17:35:27 UTC

(In reply to Amit Bawer from comment #13)
> Given the "multipathd show maps json" option mentioned, does it have an edge
> over using "dmsetup status" when we poll now every X seconds?

Not really. It just does give you both the major:minor and device name at once, so you don't need to do the mapping yourself. Also, json might be easier to deal with and the multipathd commands have a C library interface to avoid execing a command, but I don't know if either of those points are important to your situation.

Comment 15 Amit Bawer 2020-01-13 11:56:41 UTC

Hi QE,

Please test this with Vdsm master 4.40 build containing the following commits

0f4c895245d9c1c4352c936d16706ce410f48fd2 mpathhealth, udev: Switch to a polling multipath health monitoring
3d82ff4bb8911d1f6608ed48c19b4a2dfcdbfe54 mpathhealth, udev: Remove device name resolution from monitored paths

Checking the scenario described in comment #11 is less relevant for this build since those changes drop device name resolution for multipath health monitoring.

What is recommended is to see that changing multipath devices statuses is reflected the same as in multipath -ll output on host and in
multipath_health monitoring report in vdsm.log.

In engine.log, this should be seen for multipathHealth when on DEBUG log level.

Please note that multipath health monitor polls the status within number of seconds specified by the 'sd_health_check_delay' Vdsm config parameter (defaulted to 10 seconds), so changes in multipath statuses reported to logs should occur within this time interval.

Also note there is another issue regarding Engine not displaying those reports in the UI event viewer unless the Host is re-activated [1].


[1] https://bugzilla.redhat.com/show_bug.cgi?id=1786999

Comment 18 Ilan Zuckerman 2020-03-29 13:03:58 UTC

Tested on: 

ovirt-engine-4.4.0-0.29.master.el8ev.noarch
vdsm-4.40.9-1.el8ev.x86_64


Verified according those steps:

1. Set engine log to DEBUG + vdsm log to INFO

2. Locate the block device to manipulate: '8:16' [1]

3. Set it offline: echo offline > /sys/dev/block/8\:16/device/state

4. Verify it is offline indeed [2]

5. Expected: engine to log with 'multipathHealth' about the faulty path - VERIFIED [3]

6. Expected vdsm to log with 'multipath_health' about the faulty path - VERIFIED [4]

7. Setting the device back to ONLINE and verify with multipath [5]

8. Expected: engine is NOT reporting about the path with 'multipathHealth' - VERIFIED

9. Expected: vdsm is NOT reporting about the path with 'multipath_health' - VERIFIED


[1]
[root@storage-ge5-vdsm2 ~]# multipath -ll
360002ac0000000000000003100021f6b dm-0 3PARdata,VV
size=150G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=50 status=enabled
  `- 4:0:0:0 sdb 8:16  active ready running

[2]
[root@storage-ge5-vdsm2 ~]# multipath -ll
360002ac0000000000000003100021f6b dm-0 3PARdata,VV
size=150G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=0 status=active
  `- 4:0:0:0 sdb 8:16  failed faulty offline

[3]
"multipathHealth": {"360002ac0000000000000003100021f6b": {"failed_paths": ["8:16"], "valid_paths": 0}}, "memAvailable": 3472, "memCommitted": 0, "memFree": 3216, "swapTotal": 2047, "swapFree": 2038, "vmCount": 0, "vmActive": 0, "vmMigrating": 0, "incomingVmMigrations": 0, "outgoingVmMigrations": 0, "dateTime": "2020-03-29T12:35:32 GMT", "momStatus": "active", "ksmState": false, "ksmPages": 100, "ksmMergeAcrossNodes": true, "memShared": 0, "ksmCpu": 0, "netConfigDirty": "False", "haStats": {"configured": false, "active": false, "score": 0, "globalMaintenance": false, "localMaintenance": false}}}

[4]
2020-03-29 15:42:04,332+0300 INFO  (jsonrpc/0) [vdsm.api] FINISH multipath_health return={'360002ac0000000000000003100021f6b': {'failed_paths': ['8:16'], 'valid_paths': 0}} from=::ffff:10.35.161.131,60902, task_id=54015c6d-a807-441a-a18b-eda638a7c486 (api:54)

[5]
[root@storage-ge5-vdsm2 ~]# echo running > /sys/dev/block/8\:16/device/state
[root@storage-ge5-vdsm2 ~]# multipath -ll
360002ac0000000000000003100021f6b dm-0 3PARdata,VV
size=150G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=50 status=enabled
  `- 4:0:0:0 sdb 8:16  active ready running

Comment 23 errata-xmlrpc 2020-09-23 16:16:06 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (RHV RHEL Host (ovirt-host) 4.4.z [ovirt-4.4.2]), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3822

Note You need to log in before you can comment on or make changes to this bug.