Bug 1526010 - Storage: Incorrect valid_paths in multipath events in some cases when several paths change states at the same time
Summary: Storage: Incorrect valid_paths in multipath events in some cases when several...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: General
Version: 4.20.15
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ovirt-4.2.1
: ---
Assignee: Fred Rolland
QA Contact: Lilach Zitnitski
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-12-14 15:31 UTC by Fred Rolland
Modified: 2018-02-12 11:47 UTC (History)
3 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2018-02-12 11:47:30 UTC
oVirt Team: Storage
Embargoed:
rule-engine: ovirt-4.2+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 85456 0 master MERGED udev: support dm_seqnum 2017-12-16 17:58:57 UTC
oVirt gerrit 85469 0 master MERGED udev: add failing test for unordered udev events 2017-12-16 17:59:00 UTC
oVirt gerrit 85470 0 master MERGED udev: use latest dm_seqnum for valid_paths 2017-12-16 17:59:03 UTC

Description Fred Rolland 2017-12-14 15:31:32 UTC
Description of problem:

The events from udev can arrive in a different order that device mapper creates them.
This can create a scenario when the current implementation of the multipath health reports a wrong number of valid_paths.
Eventually, the engine could create wrong events.

How reproducible:
Block several paths of the same device at the same time:


Steps to Reproduce:
1.
echo "offline" > /sys/block/sdb/device/state
echo "offline" > /sys/block/sdj/device/state
echo "offline" > /sys/block/sdr/device/state

3514f0c5a516008d4 dm-1 XtremIO ,XtremApp        
size=150G features='0' hwhandler='0' wp=rw
`-+- policy='queue-length 0' prio=0 status=active
  |- 3:0:0:1 sdb 8:16   failed faulty offline
  |- 4:0:0:1 sdj 8:144  failed faulty offline
  `- 5:0:0:1 sdr 65:16  failed faulty offline

Actual results:
multipath_health map:
{'valid_paths': 2, 'failed_paths': [u'sdb', u'sdj', u'sdr']}}

Expected results:
multipath_health map:
{'valid_paths': 0, 'failed_paths': [u'sdb', u'sdj', u'sdr']}}


Additional info:
This is not always reproducing

Comment 1 Nir Soffer 2017-12-14 20:31:34 UTC
Trying 4.2.1, this should be an easy fix.

Comment 2 Lilach Zitnitski 2018-01-10 09:15:46 UTC
--------------------------------------
Tested with the following code:
----------------------------------------
rhvm-4.2.1-0.2.el7.noarch
vdsm-4.20.11-1.el7ev.x86_64

Tested with the following scenario:

Steps to Reproduce:
1. echo offline to multipath devices state files

Actual results:
"multipathHealth": {
        "3514f0c5a51600283": {
            "valid_paths": 1,
            "failed_paths": [
                "sdb",
                "sdj",
                "sdz"
            ]
        }

Expected results:

Moving to VERIFIED!

Comment 3 Sandro Bonazzola 2018-02-12 11:47:30 UTC
This bugzilla is included in oVirt 4.2.1 release, published on Feb 12th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.1 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.