Bug 1526010

Summary: Storage: Incorrect valid_paths in multipath events in some cases when several paths change states at the same time
Product: [oVirt] vdsm Reporter: Fred Rolland <frolland>
Component: GeneralAssignee: Fred Rolland <frolland>
Status: CLOSED CURRENTRELEASE QA Contact: Lilach Zitnitski <lzitnits>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.20.15CC: bmarzins, bugs, nsoffer
Target Milestone: ovirt-4.2.1Flags: rule-engine: ovirt-4.2+
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-02-12 11:47:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Fred Rolland 2017-12-14 15:31:32 UTC
Description of problem:

The events from udev can arrive in a different order that device mapper creates them.
This can create a scenario when the current implementation of the multipath health reports a wrong number of valid_paths.
Eventually, the engine could create wrong events.

How reproducible:
Block several paths of the same device at the same time:


Steps to Reproduce:
1.
echo "offline" > /sys/block/sdb/device/state
echo "offline" > /sys/block/sdj/device/state
echo "offline" > /sys/block/sdr/device/state

3514f0c5a516008d4 dm-1 XtremIO ,XtremApp        
size=150G features='0' hwhandler='0' wp=rw
`-+- policy='queue-length 0' prio=0 status=active
  |- 3:0:0:1 sdb 8:16   failed faulty offline
  |- 4:0:0:1 sdj 8:144  failed faulty offline
  `- 5:0:0:1 sdr 65:16  failed faulty offline

Actual results:
multipath_health map:
{'valid_paths': 2, 'failed_paths': [u'sdb', u'sdj', u'sdr']}}

Expected results:
multipath_health map:
{'valid_paths': 0, 'failed_paths': [u'sdb', u'sdj', u'sdr']}}


Additional info:
This is not always reproducing

Comment 1 Nir Soffer 2017-12-14 20:31:34 UTC
Trying 4.2.1, this should be an easy fix.

Comment 2 Lilach Zitnitski 2018-01-10 09:15:46 UTC
--------------------------------------
Tested with the following code:
----------------------------------------
rhvm-4.2.1-0.2.el7.noarch
vdsm-4.20.11-1.el7ev.x86_64

Tested with the following scenario:

Steps to Reproduce:
1. echo offline to multipath devices state files

Actual results:
"multipathHealth": {
        "3514f0c5a51600283": {
            "valid_paths": 1,
            "failed_paths": [
                "sdb",
                "sdj",
                "sdz"
            ]
        }

Expected results:

Moving to VERIFIED!

Comment 3 Sandro Bonazzola 2018-02-12 11:47:30 UTC
This bugzilla is included in oVirt 4.2.1 release, published on Feb 12th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.1 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.