1696817 – Excessive trespass in EMC VNX5400 array when detaching snaplun

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1696817 - Excessive trespass in EMC VNX5400 array when detaching snaplun

Summary: Excessive trespass in EMC VNX5400 array when detaching snaplun

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	device-mapper-multipath
Sub Component:
Version:	7.6
Hardware:	x86_64
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	rc
Target Release:	---
Assignee:	Ben Marzinski
QA Contact:	Lin Li
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1711360
TreeView+	depends on / blocked

Reported:	2019-04-05 17:00 UTC by Stan Saner
Modified:	2023-09-07 19:53 UTC (History)
CC List:	10 users (show)
Fixed In Version:	device-mapper-multipath-0.4.9-129.el7
Doc Type:	Bug Fix
Doc Text:	Cause: multipath was not correctly setting the shared multipath context for the EMC path checker, if the path was discovered before the multipath device was created. Consequence: EMC devices were not able to distinguish passive paths to inactive snapshots from regular passive paths. Fix: Multipath now correctly initializes the shared multipath context for the EMC Result: EMC devices are now about to distinguish passive paths to inactive snapshots from regular passive paths.
Clone Of:
Environment:
Last Closed:	2020-03-31 19:47:09 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Test procedure record performed bythe customer (54.06 KB, text/plain) 2019-04-05 17:04 UTC, Stan Saner	no flags	Details
SOSreport collected after the test procedure in ERR_TRACE1.txt (11.05 MB, application/x-xz) 2019-04-05 17:06 UTC, Stan Saner	no flags	Details
multipath -ll and sg_vpd for page 0xC0 during trespasses on EMC reproducer (5.89 KB, text/plain) 2019-04-28 14:05 UTC, Stan Saner	no flags	Details
messages from the EMC reproducer during excessive trespasses (10.44 MB, application/gzip) 2019-04-28 14:11 UTC, Stan Saner	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2020:1066	0	None	None	None	2020-03-31 19:47:31 UTC

Description Stan Saner 2019-04-05 17:00:48 UTC

Description of problem:
-----------------------
The EMC VNX5400 LUNs are constantly failing over between SPA & SPB (trespassing)
A large telecoms customer uses EMC VNX5400 snapshot LUNs as a vehicle to back up the Raw Devices of a Database. The backup server has permission on the array to access the snapshot LUNs. These LUNs have no actual data until a snapshot session on the source LUNs is attached to the snapshot LUN.

When the backup server host is added to the Storage group (which allows the server to see these LUNs), excessive trespassing between SPA and SPB back and forth ensues.

They have been using this functionality for years when the backup server was on Solaris. Since they have transitioned to RHEL, the problem appeared.

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
kernel 3.10.0-957.10.1.el7.x86_64
device-mapper-multipath-0.4.9-123.el7.x86_64
device-mapper-multipath-libs-0.4.9-123.el7.x86_64

How reproducible:
-----------------
Solid reproducibility at the customer site. Every time the steps to reproduce are followed, the excessive ping-pong style trespass is triggered.

EMC attempted to reproduce in-house following this procedure, albeit with thin LUNS, but later also used real LUNs, without success.

We have not managed to figure out what the difference is, why the problem cannot be reproduce outside the customer environment. Knowing that difference is would likely amount to the root cause identification.

Steps to Reproduce:
-------------------
1. Connected a RHEL 7.6 server to an EMC VNX 5400.
2. Create a Storage Group on the VNX for the server
3. Create 3 Snapview Snapshot Luns and add to the Storage group (These LUNs present as real Luns to the host but do not have actually any data until a snapshot is attached to it)
4. Add the host to the Storage group (which allows the server to see these LUNs)
5. rescan-scsi-bus.sh

Actual results:
---------------
After the step 5. in the above procedure the RHEL 7.6 server starts to complain a lot in /var/log/messages. The LUNs trespass from Storage Processor A (SPA) to SPB and back (ping-pong style). This essentially leaves the LUNs unusable by customer.

Expected results:
-----------------
No trespassing between SPA and SPB ping pong style.

Additional info:
----------------
Several iterations modifying multipath.conf have been tried which will be described below, some of them allowed to reproduce partially at EMC reproducer, but ultimately the customer setup still continues excessive trespassing.

I will attach the log from the latest test performed with an aim to follow the movement of the "preferred" bit (Pref bit as reported by the sg_rtpg utility) between target port groups and the corresponding SOSreport with a bit of history of the investigation.

Comment 2 Stan Saner 2019-04-05 17:04:16 UTC

Created attachment 1552605 [details]
Test procedure record performed bythe customer

Heavily commented test procedure log created by the customer

Comment 3 Stan Saner 2019-04-05 17:06:58 UTC

Created attachment 1552606 [details]
SOSreport collected after the test procedure in ERR_TRACE1.txt

SOSreport collected after the test procedure in ERR_TRACE1.txt

Comment 4 Stan Saner 2019-04-05 17:14:36 UTC

Currently only this workaround exists that would allow customer to perform the backups using the snaplun

The workaround steps:

- Find the WWID of the snapshot LUN
- Within /etc/multipath.conf prior to scan add in the blacklist section

blacklist {
wwid "36001405432339bfc0c64e88af039d828" #### Change to match the snapshot LUN
}

- systemctl restart multipathd   ### safe to do with running LUNS
- Run the scan
- When the LUN is in session state 
- Remove the blacklist put a # in front
- systemctl restart multipathd 
- Run the backup

Comment 5 Stan Saner 2019-04-05 17:27:08 UTC

Review of the ERR_TRACE1.txt initial section that preceded reboot and ensuring the initramfs is synced up with the multipath configuration 
by executing 

# dracut -v -f -a multipath /boot/initramfs-3.10.0-957.10.1.el7.x86_64.img 3.10.0-957.10.1.el7.x86_64

to see whether that may be behind the issue. That did not help, the problem still present.



Verbosity restored to level 2 (in some of previous test runs customer had this set to 0)

# cat /etc/multipath.conf|egrep -v '^[[:blank:]]*#'

defaults {
        user_friendly_names yes
        find_multipaths yes
        verbosity 2                # RH wanted this added
        flush_on_last_del yes      # RH wanted this added
}

devices {
  device {
  vendor "DGC"
  product ".*"
  product_blacklist "LUNZ"
  path_grouping_policy group_by_prio
  path_selector "round-robin 0"
  path_checker emc_clariion
  features "0"                           # RH wanted this changed
  hardware_handler "1 alua"
  prio alua
  prio_args "exclusive_pref_bit"
  failback immediate
  rr_weight uniform
  no_path_retry fail                     # RH wanted this changed
  detect_path_checker no
  }
}

------

From SOSreport which was collected _after_ the whole test completion

sos_commands/multipath/multipath_-v4_-ll

mpathe (36006016005e03e0018c4aa2ede55e911) dm-0 DGC     ,RAID 10
size=2.0G features='1 retain_attached_hw_handler' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| |- 2:0:2:2 sdj 8:144 active ready running
| `- 2:0:3:2 sdl 8:176 active ready running
`-+- policy='service-time 0' prio=10 status=enabled
  |- 2:0:1:2 sdh 8:112 active ready running
  `- 2:0:0:2 sdf 8:80  active ready running


Apr 04 17:05:43 | reported target port group is 2
Apr 04 17:05:43 | aas = 80 [active/optimized] [preferred]    <-- the way it should be
Apr 04 17:05:43 | sdj: alua prio = 50

Apr 04 17:05:43 | reported target port group is 2
Apr 04 17:05:43 | aas = 80 [active/optimized] [preferred]    <-- the way it should be
Apr 04 17:05:43 | sdl: alua prio = 50

Apr 04 17:05:43 | reported target port group is 1    
Apr 04 17:05:43 | aas = 01 [active/non-optimized]     <-- non-optimized hasn't got pref bit set, expected
Apr 04 17:05:43 | sdh: alua prio = 10

Apr 04 17:05:43 | reported target port group is 1
Apr 04 17:05:43 | aas = 01 [active/non-optimized]     <-- non-optimized hasn't got pref bit set, expected
Apr 04 17:05:43 | sdf: alua prio = 10


Now optimized paths have the preferred bit set.


Correlating the events from the test with log entries

###############
Attach the snaplun as HLU 2 to the snap_grp_test SG. The source lun 9 has a current and default owner of SPB
################
[root@snaplun-test ~]# date
Wed Apr  3 09:31:55 IST 2019    ---|
                                   |
                                   |
###############                    |---- quite a big time gap here, will need to ask Mossey for the reason
Rescan the scsi bus                |
################                   |
[root@snaplun-test ~]# date        |
Thu Apr  4 15:06:10 IST 2019   ----|


Rescan scsi bus between Thu Apr  4 15:06:10 IST 2019  ---  Thu Apr  4 15:07:40 IST 2019

During the bus scan drives

sdi (2:0:0:2), sdj (2:0:1:2), sdk(2:0:2:2), sdh(2:0:3:0), sdl(2:0:3:2),

reporting SK/ASC/ASCQ 0x5/0x25/0x1 like for example

Apr  4 15:07:05 snaplun-test kernel: sd 2:0:0:2: [sdi] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Apr  4 15:07:05 snaplun-test kernel: sd 2:0:0:2: [sdi] tag#0 Sense Key : Illegal Request [current]
Apr  4 15:07:05 snaplun-test kernel: sd 2:0:0:2: [sdi] tag#0 ASC=0x25 ASCQ=0x1
Apr  4 15:07:05 snaplun-test kernel: sd 2:0:0:2: [sdi] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00
...
Apr  4 15:07:10 snaplun-test multipathd: mpathe: sdi - emc_clariion_checker: Active path to inactive snapshot WWN 6006016005e03e0018c4aa2ede55e911.


emc_clariion path checker as a result of SK/ASC/ASCQ 0x5/0x25/0x1 should set SET_INACTIVE_SNAP(c) which will fail the paths to an inactive snapshot LU.  

################
At 15:08:37 the first trespass was noticed on SPB
################
Excessive trespasses detected. Lun: 6006016005E03E00:18C4AA2EDE55E911 .
Contact Your Service Provider. Run HAVT on the host(s) connected to the LUN in question.
0000040007002c00d3040000004016a1004016a1000000000000000000000000000000000000000071164000

Excessive trespasses detected. Lun: 6006016046903E00:B2776099D673E811 .
Contact Your Service Provider. Run HAVT on the host(s) connected to the LUN in question.
0000040007002c00d3040000004016a1004016a1000000000000000000000000000000000000000071164000

################
At 15:08:37 the first trespass was noticed on SPA
################
Excessive trespasses detected. Lun: 6006016005E03E00:18C4AA2EDE55E911 .
Contact Your Service Provider. Run HAVT on the host(s) connected to the LUN in question.
0000040007002c00d3040000004016a1004016a1000000000000000000000000000000000000000071164000

Excessive trespasses detected. Lun: 6006016046903E00:B2776099D673E811 .
Contact Your Service Provider. Run HAVT on the host(s) connected to the LUN in question.
0000040007002c00d3040000004016a1004016a1000000000000000000000000000000000000000071164000


This is all expected initial state, problem present.


------ now rebuild initramfs and reboot Thu Apr  4 15:14:56 IST 2019 ------

The problem persists even after reboot, so the issue was not in the initramfs missing /mismatching multipath config

Comment 6 Ben Marzinski 2019-04-05 18:44:28 UTC

Looking at this, It seems that emc_clariion path checker is saying that the paths are working, when in truth, IO seems to fail on all of them. Here's a random snippet:

********************

*** Multipath runs out of paths ****

Apr  4 15:07:42 snaplun-test multipathd: mpathe: remaining active paths: 0

*** checkers says that passive path sdl is healthy ***

Apr  4 15:07:44 snaplun-test multipathd: mpathe: sdl - emc_clariion_checker: Pas
sive path is healthy.
Apr  4 15:07:44 snaplun-test multipathd: 8:176: reinstated

*** sdl gets reinstated and make active ***

Apr  4 15:07:44 snaplun-test kernel: device-mapper: multipath: Reinstating path 
8:176.
Apr  4 15:07:44 snaplun-test kernel: sd 2:0:3:2: Asymmetric access state changed
Apr  4 15:07:44 snaplun-test multipathd: mpathe: remaining active paths: 1

*** IO fails to sdl, causing the kernel to fail the path ***

Apr  4 15:07:44 snaplun-test kernel: sd 2:0:3:2: [sdl] tag#0 FAILED Result: host
byte=DID_OK driverbyte=DRIVER_SENSE
Apr  4 15:07:44 snaplun-test kernel: sd 2:0:3:2: [sdl] tag#0 Sense Key : Illegal
 Request [current] 
Apr  4 15:07:44 snaplun-test kernel: sd 2:0:3:2: [sdl] tag#0 ASC=0x25 ASCQ=0x1 
Apr  4 15:07:44 snaplun-test kernel: sd 2:0:3:2: [sdl] tag#0 CDB: Read(10) 28 00
 00 3f ff 80 00 00 08 00
Apr  4 15:07:44 snaplun-test kernel: sd 2:0:3:2: alua: port group 02 state N pre
ferred supports tolUsNA
Apr  4 15:07:44 snaplun-test kernel: sd 2:0:3:2: alua: port group 02 state A pre
ferred supports tolUsNA
Apr  4 15:07:44 snaplun-test kernel: sd 2:0:3:2: [sdl] tag#0 FAILED Result: host
byte=DID_OK driverbyte=DRIVER_SENSE
Apr  4 15:07:44 snaplun-test kernel: sd 2:0:3:2: [sdl] tag#0 Sense Key : Illegal
 Request [current] 
Apr  4 15:07:44 snaplun-test kernel: sd 2:0:3:2: [sdl] tag#0 ASC=0x25 ASCQ=0x1 
Apr  4 15:07:44 snaplun-test kernel: sd 2:0:3:2: [sdl] tag#0 CDB: Read(10) 28 00
 00 3f ff 80 00 00 08 00
Apr  4 15:07:44 snaplun-test kernel: device-mapper: multipath: Failing path 8:17
6.
Apr  4 15:07:45 snaplun-test kernel: sd 2:0:2:2: Asymmetric access state changed
Apr  4 15:07:45 snaplun-test multipathd: mpathe: sdk - emc_clariion_checker: Act
ive path to inactive snapshot WWN 6006016005e03e0018c4aa2ede55e911.
Apr  4 15:07:45 snaplun-test multipathd: sdl: mark as failed

*** multipathd is back where it started, and will do the same thing the next time the checker runs ***

Apr  4 15:07:45 snaplun-test multipathd: mpathe: remaining active paths: 0

***********************

Now it seems very possible that the multipath target is failing paths when it shouldn't be, this is what's causing the ping-ponging. The mutipath target is only supposed to fail paths on errors that could be transport related. However it doesn't appear that the IO to the multipath device will complete, even if you solve this, since the kernel repeatedly tries every path, and gets failures on all of them.  With no_path_retry set to fail, the IO should fail up once all paths are lost, and messages like this

Apr  4 15:07:41 snaplun-test kernel: Buffer I/O error on dev dm-3, logical block
 524272, async page read
Apr  4 15:07:41 snaplun-test kernel: Buffer I/O error on dev dm-3, logical block
 0, async page read

show that it is. This means it's not just one request that multipath isn't handling correctly, lots of IO (perhaps all of it) is failing. Do you know why this is?  It's clearly failing at the scsi devices, below the multipath layer.

Comment 7 Ben Marzinski 2019-04-05 19:03:26 UTC

If there really isn't some sort of problem with the devices causing this, one possible answer could be that these requests are failing because the path has already become passive again. Are there multiple nodes accessing the same storage? If so, is it possible that different nodes would see different priorities for the same path to the storage?  If that's possible, then possibly one node keeps switching to one path group, and another node switches back. If that's the case, you need to change

failback immediate

to

failback followover

in your device config from /etc/multipath.conf.  The "followover" method will not failback to the higher priority pathgroup unless the path checker had previously said that the paths in that pathgroup were faulty.  This means that if the lower priority paths are active, but the higher priority paths are still working, multipath assumes that some other node made these paths active for a good reason, and doesn't failback.  If the higher priority paths actually fail in the path checker, and then come back up, multipathd assumes that it is safe to use the higher priority paths. Since the path checker never reports your paths are down, multipathd wouldn't failback in your case.

Comment 9 Stan Saner 2019-04-08 10:24:53 UTC

Breaking news from EMC and the customer.

On Friday 5th April teleconference with EMC and the customer we realized there was a small difference in kernel version between EMC test system and the customer setup

Customer system:
----------------
kernel                                                      3.10.0-957.10.1.el7.x86_64
device-mapper-multipath-0.4.9-123.el7.x86_64                Thu Mar 21 14:42:37 2019
device-mapper-multipath-libs-0.4.9-123.el7.x86_64           Thu Mar 21 14:42:35 2019


EMC test system:
----------------
kernel                                                      3.10.0-957.el7.x86_64
device-mapper-multipath-0.4.9-123.el7.x86_64                Tue Apr  2 11:56:44 2019
device-mapper-multipath-libs-0.4.9-123.el7.x86_64           Tue Apr  2 11:56:44 2019


The kernel 3.10.0-957.10.1.el7.x86_64  corresponds to 

Internal name                   hotfix          Batch #   Errata        Build date
----------------------------------------------------------------------------------
kernel-3.10.0-957.10.1.el7	release/hotfix	3	  #39393	2019-02-07


As mentioned in the problem description, EMC have been trying to reproduce the problem on their test setup without success.
However after updating the kernel to match the customer 3.10.0-957.10.1.el7.x86_64  the problem can now be reproduced reliably!

So one of the code commits between 3.10.0-957.10.1.el7.x86_64 and 3.10.0-957.el7.x86_64 is triggering the problem.

EMC experimented with multipath.conf settings with the following settings obtaining the most stable behaviour where at one time EMC could stop the ping-pong style trespasses:


defaults {
        user_friendly_names yes
        find_multipaths yes
        verbosity 4                # RH wanted this added
        flush_on_last_del yes      # RH wanted this added
}

devices {
        device {
        vendor "DGC"
        product ".*"
        product_blacklist "LUNZ"
        path_grouping_policy group_by_prio
        path_selector "round-robin 0"
        path_checker "emc_clariion"
        features "0"                           # RH wanted this changed
        hardware_handler "1 alua"
        prio "alua"
        prio_args "exclusive_pref_bit"
        failback "immediate"
        rr_weight "uniform"
        no_path_retry "fail"                     # RH wanted this changed
        hwtable_regex_match no                 # pp added
        detect_prio no                         # pp added
        retain_attached_hw_handler no          # pp added
        detect_path_checker no                 # Redhat fix
        }
}


But this config then tested by the customer refuted that and the trespasses still continue.

The main lead is currently the code differences between kernels 3.10.0-957.10.1.el7.x86_64 and 3.10.0-957.el7.x86_64

Comment 10 Stan Saner 2019-04-08 11:25:56 UTC

Further details about EMC test setup behaviour with the kernel 3.10.0-957.10.1.el7.x86_64 and the modifications to multipathing.conf as seen in previous comment:

---- quote ----
I run further tests this morning in my lab and I got trespasses again. The parameter change helps with the issue but the issue is still not fixed, even in my lab. 3 snapshots are attached and detached …. And after the 3 are detached, I have excessive trespasses against 1 of the 3 snapshots … and not every time the same one … but every time, and curiously enough, just against 1 of them.

I didn’t have trespasses before updating to kernel 3.10.0-957.10.1.el7.x86_64. I have it now, even with the modified “multipath.conf” file (BTW, I also tried “failback followover” and it makes no difference).
--- unquote ---

Comment 11 Stan Saner 2019-04-08 14:27:36 UTC

We know that kernel-3.10.0-957.el7 does not have the problem and kernel-3.10.0-957.10.1.el7 does.

What may at first sight look like a tiny version difference between kernel-3.10.0-957.el7 and kernel-3.10.0-957.10.1.el7 is actually 165 code commits, so not a small jump.

$ git log --pretty=oneline kernel-3.10.0-957.el7..kernel-3.10.0-957.10.1.el7 | wc -l
165

Not all of them are applicable to SCSI, Fibre Channel or block layer.

Since EMC have a reproducer, for the sake of speed we have embarked on a bisect methodology to narrow down which code change provokes this VNX5400 array behaviour. I'll update once we narrowed down the range sufficiently.

Comment 12 Stan Saner 2019-04-10 12:09:45 UTC

Latest results of testing from EMC overturned previous claims in comments #9, #10, #11.

After reverting back to the kernel-3.10.0-957.el7 from kernel-3.10.0-957.5.1.el7 and getting ready for the next step of testing with .2.1.el7 kernel, they suddenly detected the excessive trespassing in the array even with the kernel-3.10.0-957.el7 that was previously deemed symptoms free.
This radically changes the whole story.

Write-up from EMC testing:

---- quote start ----
the result of the “emc_clariion” test is not always the same … and this is a concern. When replacing the path_checker with “readsector0”, all paths are marked down an that’s it: no trespass, nothing and everything is clean. When I reattach the snapshot, all the paths recover and are correctly differentiated


With path_checker readsector0


Attached :

# multipath -ll
mpathd (36006016013513600390e350aca56e911) dm-2 DGC     ,RAID 5
size=20G features='0' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=130 status=active
| `- 12:0:0:0 sdb 8:16 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  `- 12:0:1:0 sde 8:64 active ready running

mpathc (36006016029513600e8fe5029ca56e911) dm-3 DGC     ,RAID 5
size=20G features='0' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=130 status=active
| `- 12:0:1:2 sdg 8:96 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  `- 12:0:0:2 sdd 8:48 active ready running

mpathb (3600601601351360088b797acae54e911) dm-4 DGC     ,VRAID
size=20G features='0' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=130 status=active
| `- 12:0:0:1 sdc 8:32 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  `- 12:0:1:1 sdf 8:80 active ready running

 

Detached :

# multipath -ll
mpathd (36006016013513600390e350aca56e911) dm-2 DGC     ,RAID 5
size=20G features='0' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=0 status=enabled
| `- 12:0:0:0 sdb 8:16 failed faulty running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 12:0:1:0 sde 8:64 failed faulty running

mpathc (36006016029513600e8fe5029ca56e911) dm-3 DGC     ,RAID 5
size=20G features='0' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=0 status=enabled
| `- 12:0:1:2 sdg 8:96 failed faulty running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 12:0:0:2 sdd 8:48 failed faulty running

mpathb (3600601601351360088b797acae54e911) dm-4 DGC     ,VRAID
size=20G features='0' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=0 status=enabled
| `- 12:0:0:1 sdc 8:32 failed faulty running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 12:0:1:1 sdf 8:80 failed faulty running

 
Re-attached :

# multipath -ll
mpathd (36006016013513600390e350aca56e911) dm-2 DGC     ,RAID 5
size=20G features='0' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=130 status=active
| `- 12:0:0:0 sdb 8:16 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  `- 12:0:1:0 sde 8:64 active ready running

mpathc (36006016029513600e8fe5029ca56e911) dm-3 DGC     ,RAID 5
size=20G features='0' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=130 status=active
| `- 12:0:1:2 sdg 8:96 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  `- 12:0:0:2 sdd 8:48 active ready running

mpathb (3600601601351360088b797acae54e911) dm-4 DGC     ,VRAID
size=20G features='0' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=130 status=active
| `- 12:0:0:1 sdc 8:32 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  `- 12:0:1:1 sdf 8:80 active ready running

 

With path_checker emc_clariion:


Attached :
 
# multipath -ll
mpathd (36006016013513600390e350aca56e911) dm-2 DGC     ,RAID 5
size=20G features='0' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=130 status=active
| `- 12:0:0:0 sdb 8:16 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  `- 12:0:1:0 sde 8:64 active ready running

mpathc (36006016029513600e8fe5029ca56e911) dm-3 DGC     ,RAID 5
size=20G features='0' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=130 status=active
| `- 12:0:1:2 sdg 8:96 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  `- 12:0:0:2 sdd 8:48 active ready running

mpathb (3600601601351360088b797acae54e911) dm-4 DGC     ,VRAID
size=20G features='0' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=130 status=active
| `- 12:0:0:1 sdc 8:32 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  `- 12:0:1:1 sdf 8:80 active ready running


# multipath -ll
mpathd (36006016013513600390e350aca56e911) dm-4 DGC     ,RAID 5
size=20G features='0' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=0 status=enabled
| `- 12:0:0:0 sdb 8:16 active faulty running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 12:0:1:0 sde 8:64 failed faulty running

mpathc (36006016029513600e8fe5029ca56e911) dm-3 DGC     ,RAID 5
size=20G features='0' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=0 status=enabled
| `- 12:0:1:2 sdg 8:96 failed faulty running
`-+- policy='round-robin 0' prio=0 status=active
  `- 12:0:0:2 sdd 8:48 active faulty running

mpathb (3600601601351360088b797acae54e911) dm-2 DGC     ,VRAID
size=20G features='0' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=0 status=enabled
| `- 12:0:0:1 sdc 8:32 failed faulty running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 12:0:1:1 sdf 8:80 failed faulty running

 
As you can see it, on the 2 RAID 5 devices (thick devices), one of the path is not marked “failed”.  This is causing trespasses … until it stabilizes and I then get

 
# multipath -ll
mpathd (36006016013513600390e350aca56e911) dm-4 DGC     ,RAID 5
size=20G features='0' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=90 status=enabled
| `- 12:0:0:0 sdb 8:16 failed ready running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 12:0:1:0 sde 8:64 failed faulty running

mpathc (36006016029513600e8fe5029ca56e911) dm-3 DGC     ,RAID 5
size=20G features='0' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=90 status=enabled
| `- 12:0:1:2 sdg 8:96 failed ready running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 12:0:0:2 sdd 8:48 failed faulty running

mpathb (3600601601351360088b797acae54e911) dm-2 DGC     ,VRAID
size=20G features='0' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=90 status=enabled
| `- 12:0:0:1 sdc 8:32 failed ready running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 12:0:1:1 sdf 8:80 failed faulty running

 
And here, trespasses stop. But if I wait long enough, get the “active/failed” combination again and trespasses happen again.

Of course, if I re-attach, everything is back to normal (or at least the active path is marked “active ready” : the passive path is either “active ready” or “failed ready”

 
# multipath -ll
mpathd (36006016013513600390e350aca56e911) dm-4 DGC     ,RAID 5
size=20G features='0' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=130 status=active
| `- 12:0:0:0 sdb 8:16 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  `- 12:0:1:0 sde 8:64 active ready running

mpathc (36006016029513600e8fe5029ca56e911) dm-3 DGC     ,RAID 5
size=20G features='0' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=90 status=enabled
| `- 12:0:1:2 sdg 8:96 failed ready running
`-+- policy='round-robin 0' prio=50 status=active
  `- 12:0:0:2 sdd 8:48 active ready running

mpathb (3600601601351360088b797acae54e911) dm-2 DGC     ,VRAID
size=20G features='0' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=90 status=enabled
| `- 12:0:0:1 sdc 8:32 failed ready running
`-+- policy='round-robin 0' prio=50 status=active
  `- 12:0:1:1 sdf 8:80 active ready running
 

After switching back to the basic kernel release, kernel-3.10.0-957.el7, I was getting the ”excessive trespass” issue although during my previous tests, a couple of days earlier, I had found a multipath.conf configuration file which seemed to prevent the trespass.

At the end of the afternoon, yesterday, I was again under the impression that I had found a new configuration file which was preventing the trespasses. I run many tests, with snapshot detach / re-attach /reboot .. and I was convinced that I had found a solid solution ... at least at that kernel level.

Then I upgraded the kernel to the proposed level, kernel-3.10.0-957.2.1.el7. I immediately got the issue again.

I changed the path checker from “emc_clariion” to “readsector0” .. and the issue was gone (I had observed the same thing previously when I was having issue at the basic level). I run many tests and the fix was solid.

I changed the path checker back to “emc_clariion” and got the issue immediately.

I thought at this stage that the issue was in this increment between the basic kernel and this patched release … and to prove it, I downgraded to the basic level.

At the basic level, unfortunately, I got the issue immediately again. I then spent many hours, in vain, to get a good configuration file : this was impossible. The only working configuration is when I replace the path checker with “readsector0”. This means that the success of the test not only depends on the kernel level (at the end, I am no longer convinced that this is playing any role), but it also depends on which path is the preferred path when you run the tests, the default SP owner or the other SP because in the meantime there was a trespass. It is possible there is just 1 combination which works, and with 3 snapshots, the probability for having the right combination for all of them is not high ... and this probably explains what I observe.

I collected the logs from the lab. I now have the source code for the ”emc_clariion” path checker. I had a look at the code and I saw cases where the path checker can return “up” when, in my opinion, when the snapshot is detached, all the paths should return “down”. Nevertheless, I will let E-lab review this. 

For the time being, I would advise customer to use “readsector0” instead of “emc_clariion”. This is a good alternative which works better than “emc_clariion” although “emc_clariion” was written explicitly to take care of the detached snapshots! I am confident that it will work with the customer kernel as well (I am waiting for his confirmation).
---- quote end ----

Comment 13 Stan Saner 2019-04-28 14:03:54 UTC

We have a much better set of test data from EMC obtained from their reproducer when using emc_clariion path checker when the ping-pong style trespasses are happening

Attaching EMC_test2_sg_vpd.txt file with commented outputs from

- multipath -ll
- sg_vpd -H -l -v -p 0xc0 /dev/sdb

plus test2_messages.txt file which is /var/log/messages covering the time of the test

Comment 14 Stan Saner 2019-04-28 14:05:03 UTC

Created attachment 1559620 [details]
multipath -ll and sg_vpd for page 0xC0 during trespasses on EMC reproducer

Comment 15 Stan Saner 2019-04-28 14:11:47 UTC

Created attachment 1559621 [details]
messages from the EMC reproducer during excessive trespasses

gzipped test2_messages.txt from EMC reproducer to fit within the file size limit for Bugzilla attachments

Comment 16 Stan Saner 2019-04-28 14:19:54 UTC

A detailed look at the interaction of emc_clariion path checker with the VNX5400 array in EMC's reproducer with the sg_vpd command output to collect Vital Product Data (VPD) from page 0xC0.

In EMC's "multipath -ll" output, for the last dm, /dev/sdf changes between enabled/failed and active/active.
where the 2 commands were run within a 1 second interval.

mpathb (3600601601351360088b797acae54e911) dm-4 DGC     ,VRAID
size=20G features='0' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=90 status=enabled
| `- 12:0:0:1 sdc 8:32 failed ready running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 12:0:1:1 sdf 8:80 failed faulty running             <--- sdf failed


mpathb (3600601601351360088b797acae54e911) dm-4 DGC     ,VRAID
size=20G features='0' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=0 status=enabled
| `- 12:0:0:1 sdc 8:32 failed faulty running
`-+- policy='round-robin 0' prio=0 status=active
  `- 12:0:1:1 sdf 8:80 active faulty running            <-- sdf active


At the time of running sg_vpd command against /dev/sdf EMC got

# sg_vpd -H -l -v -p 0xc0 /dev/sdf
Firmware numbers (Seagate) VPD Page:
    inquiry cdb: 12 01 c0 00 fc 00
    inquiry: pass-through requested 252 bytes but got 66 bytes
 00     00 c0 00 3e 02 00 e4 0a  01 00 60 06 01 60 13 51    ...>......`..`.Q
 10     36 00 88 b7 97 ac ae 54  e9 11 03 03 06 00 04 00    6......T........
 20     00 00 00 00 00 00 00 00  00 00 00 00 0a 3c 23 99    .............<#.
 30     00 10 43 4b 4d 30 30 31  33 34 36 30 31 39 34 39    ..CKM00134601949
 40     00 00                                               ..


Tracing the code execution of emc_clariion path checker and corresponding messages printed by the driver. Most of the heavy lifting is done by libcheck_check() function

int libcheck_check (struct checker * c)
{
...

        if (ioctl(c->fd, SG_IO, &io_hdr) < 0) {
                MSG(c, "emc_clariion_checker: sending query command failed");
                return PATH_DOWN;
        }


>>> this passes we get a valid reply to the ioctl, so no error message "sending query command failed" is seen in the logs

        if (io_hdr.info & SG_INFO_OK_MASK) {
                MSG(c, "emc_clariion_checker: query command indicates error");
                return PATH_DOWN;
        }

>>> also passes for sdf  (does not pass for sda, but that is not a snaplun)

        if (/* Verify the code page - right page & revision */
            sense_buffer[1] != 0xc0 || sense_buffer[9] != 0x00) {
                MSG(c, "emc_clariion_checker: Path unit report page in "
                    "unknown format");
                return PATH_DOWN;

 00     00 c0 00 3e 02 00 e4 0a  01 00 60 06 01 60 13 51    ...>......`..`.Q
           ^^                       ^^
           OK, byte(1)==0xC0        OK, byte(9)==00

>>> this passes, it is indeed a VPD page 0xc0, so we do not see "emc_clariion_checker: Path unit report page in unknown format" message in the logs

        if ( /* Effective initiator type */
                sense_buffer[27] != 0x03
                /*
                 * Failover mode should be set to 1 (PNR failover mode)
                 * or 4 (ALUA failover mode).
                 */
                || (((sense_buffer[28] & 0x07) != 0x04) &&
                    ((sense_buffer[28] & 0x07) != 0x06))
                /* Arraycommpath should be set to 1 */
                || (sense_buffer[30] & 0x04) != 0x04) {
                MSG(c, "emc_clariion_checker: Path not correctly configured "
                    "for failover");
                return PATH_DOWN;
        }

 10     36 00 88 b7 97 ac ae 54  e9 11 03 03 06 00 04 00    6......T........
                                          ^^ ^^    ^^
                                          OK, byte(27)==0x03, byte(28)==06, byte(30)==04

>>> this passes, so the path is configured correctly

        if ( /* LUN operations should indicate normal operations */
                sense_buffer[48] != 0x00) {
                MSG(c, "emc_clariion_checker: Path not available for normal "
                    "operations");
                return PATH_SHAKY;
        }

 30     00 10 43 4b 4d 30 30 31  33 34 36 30 31 39 34 39    ..CKM00134601949
        ^^
        OK byte(48)==00

>>> pass, the path is available for normal operation

        if ( /* LUN should at least be bound somewhere and not be LUNZ */
                sense_buffer[4] == 0x00) {
                MSG(c, "emc_clariion_checker: Logical Unit is unbound "
                    "or LUNZ");
                return PATH_DOWN;
        }

 00     00 c0 00 3e 02 00 e4 0a  01 00 60 06 01 60 13 51    ...>......`..`.Q
                    ^^
                    byte(4)==02, active path


>>> here the LUN _is_ bound somewhere and isn't LUNZ, so we do NOT return with PATH_DOWN, we continue normally

[....]

        /*
         * Issue read on active path to determine if inactive snapshot.
         */
        if (sense_buffer[4] == 2) {/* if active path */

 00     00 c0 00 3e 02 00 e4 0a  01 00 60 06 01 60 13 51    ...>......`..`.Q
                    ^^
                    byte(4)==02, active path


>>>> byte(4)==2, so if this is TRUE, this is an active path


                unsigned char buf[4096];

                memset(buf, 0, 4096);
                ret = sg_read(c->fd, &buf[0], 4096,
                              sbb = &sb[0], SENSE_BUFF_LEN, c->timeout);
                if (ret == PATH_DOWN) {
                        hexadecimal_to_ascii(ct->wwn, wwnstr);

                        /*
                         * Check for inactive snapshot LU this way.  Must
                         * fail these.
                         */

                        if (((sbb[2]&0xf) == 5) && (sbb[12] == 0x25) &&    
                            (sbb[13]==1)) {

>>> this is where the inactive snaplun should reply with SK/ASC/ASCQ 0x5/0x25/0x1 
                                /*
                                 * Do this so that we can fail even the
                                 * passive paths which will return
                                 * 02/04/03 not 05/25/01 on read.
                                 */
                                SET_INACTIVE_SNAP(c);
                                MSG(c, "emc_clariion_checker: Active "
                                        "path to inactive snapshot WWN %s.",
                                        wwnstr);

>>> and indeed we see messages like that:

Apr  7 18:21:25 dellpr730-G kernel: sd 12:0:1:1: [sdf] tag#2 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Apr  7 18:21:25 dellpr730-G kernel: sd 12:0:1:1: [sdf] tag#2 Sense Key : Illegal Request [current]
Apr  7 18:21:25 dellpr730-G kernel: sd 12:0:1:1: [sdf] tag#2 ASC=0x25 ASCQ=0x1
Apr  7 18:21:25 dellpr730-G kernel: sd 12:0:1:1: [sdf] tag#2 CDB: Read(10) 28 00 02 7f ff 80 00 00 08 00


                        } else




But if at the time of running sg_vpd this path was inactive, the 'if' statement would be FALSE, continue to "else"


        } else {
                if (IS_INACTIVE_SNAP(c)) {                           <--- we WILL call IS_INACTIVE_SNAP
                        hexadecimal_to_ascii(ct->wwn, wwnstr);
                        MSG(c, "emc_clariion_checker: Passive "      <-- and we print the message
                                "path to inactive snapshot WWN %s.",
                                wwnstr);
                        ret = PATH_DOWN;
                } else {
                        MSG(c,
                            "emc_clariion_checker: Passive path is healthy.");
                        ret = PATH_UP;  /* not ghost */
                }
        }


>>> and indeed we see also messages like that:
>>> Apr  7 18:19:54 dellpr730-G multipathd: sdf: checker msg is "emc_clariion_checker: Passive path to inactive snapshot WWN 600601601351360088b797acae54e911."

without the preceding 

Sense Key : Illegal Request [current]
tag#0 ASC=0x25 ASCQ=0x1



so at certain time the sg_vpd would have returned something like this for /dev/sdf

 00     00 c0 00 3e 01 01 e4 0a  00 00 60 06 01 60 29 51    ...>......`..`)Q
                    ^^
                    byte(4)==01 inactive path


[....]


Summary:
--------
[1.] the key to the path checker whether it deems a given path as active or passive is what byte 4 in the VPD page 0xC0

00     00 c0 00 3e 01 01 e4 0a  00 00 60 06 01 60 29 51    ...>......`..`)Q
                   ^^
                   this byte(4)==01 inactive path   or 02 == active path

[2.] we can see this byte is changing its value between 01 (inactive path) and 02 (active path) based on the messages printed for sdf:

Apr  9 23:29:08 dellpr730-G multipathd: mpathb: sdf - emc_clariion_checker: Passive path is healthy.
- this one is printed only when the byte(4) indicates passive path

Apr  9 23:29:19 dellpr730-G multipathd: mpathb: sdf - emc_clariion_checker: Active path to inactive snapshot WWN 600601601351360088b797acae54e911.
- this one is printed when the byte(4) indicates active path

[3.] this byte(4) flipping from 01 to 02 and back is most likely the result of the trespass event

[4.] The emc_clariion path checker issues SCSI read on the path only if the path was seen as _active_  (i.e. the byte(4)==02) to check whether we are interacting with an inactive snapshot. If it is an inactive snapshot, we expect the read will fail with the response from the array of SK/ASC/ASCQ 0x5/0x25/0x1. Active snapshot would simply return data to that read.

[5.] For an inactive path, no SCSI read is sent. We only check whether previously this LUN was marked as inactive snapshot. If it was we print

Apr 12 09:51:24 dellpr730-G multipathd: mpathb: sdf - emc_clariion_checker: Passive path to inactive snapshot WWN 600601601351360088b797acae54e911.

If it wasn't marked as an inactive snapshot we print this kind of message:

Apr 12 09:54:33 dellpr730-G multipathd: mpathb: sdf - emc_clariion_checker: Passive path is healthy.

I think this is the key. Seeing both kinds of messages for a passive path, one for the LUN being marked as an inactive snapshot, the other for an active LUN (not necessarily a snapshot but any kind of LUN) suggests the path checker is losing track of the fact it is an inactive spanshot LUN.

[6.] I asked EMC whether it could be a problem when the emc_clariion path checker always uses READ(10) command to verify whether we are dealing with an inactive snapshot? How particular is the array about which flavour of SCSI READ we use?  (Read(6), Read(10), read(12) and Read(16))

sg_read() appears to only use READ(10)


int
sg_read (int sg_fd, unsigned char * buff, int buff_len,
         unsigned char * sense, int sense_len, unsigned int timeout)
{
        /* defaults */
        int blocks;
        long long start_block = 0;
        int bs = 512;
        int cdbsz = 10;
        int * diop = NULL;

        unsigned char rdCmd[cdbsz];
        unsigned char *sbb = sense;
        struct sg_io_hdr io_hdr;
        int res;
        int rd_opcode[] = {0x8, 0x28, 0xa8, 0x88};    <-- opcode array
        int sz_ind;
        struct stat filestatus;
        int retry_count = 3;

        if (fstat(sg_fd, &filestatus) != 0)
                return PATH_DOWN;
        bs = (filestatus.st_blksize > 4096)? 4096: filestatus.st_blksize;
        blocks = buff_len / bs;
        memset(rdCmd, 0, cdbsz);
        sz_ind = 1;                      <-- never changes
        rdCmd[0] = rd_opcode[sz_ind];    <--- it would pick {0x8, >>>0x28<<<, 0xa8, 0x88} from the opcode array
...
}

[7.] I asked EMC to share the definition of the VPD page 0xC0 with us so that we can be sure we are interpreting the sg_vpd output correctly.

Comment 21 Ben Marzinski 2019-08-27 23:12:41 UTC

I've made rhel-7.6 test packages which should hopefully fix this issue. They are available here:

http://people.redhat.com/~bmarzins/device-mapper-multipath/rpms/RHEL7/bz1696817/

If you could get EMC to test these packages with their reproducer, that would be great.

Comment 22 Stan Saner 2019-08-28 11:00:34 UTC

(In reply to Ben Marzinski from comment #21)
> I've made rhel-7.6 test packages which should hopefully fix this issue. They
> are available here:
> 
> http://people.redhat.com/~bmarzins/device-mapper-multipath/rpms/RHEL7/
> bz1696817/
> 
> If you could get EMC to test these packages with their reproducer, that
> would be great.


Hi Ben,

   I will ask EMC to test these packages.

Could you give a bit more context what is in these packages?
Do EMC need to install _all_ of these packages? I assume they do and I'll ask them accordingly.

Could you disclose what you changed in them? I can see you built these on top of the device-mapper-multipath-0.4.9-123
which is shown as the latest in dist-git, so I cannot see those changes if this is your private branch.

I'll let you know what EMC say.

Regards,
Stan Saner

Comment 23 Ben Marzinski 2019-08-28 14:38:20 UTC

(In reply to Stanislav Saner from comment #22)
> (In reply to Ben Marzinski from comment #21)
> > I've made rhel-7.6 test packages which should hopefully fix this issue. They
> > are available here:
> > 
> > http://people.redhat.com/~bmarzins/device-mapper-multipath/rpms/RHEL7/
> > bz1696817/
> > 
> > If you could get EMC to test these packages with their reproducer, that
> > would be great.
> 
> 
> Hi Ben,
> 
>    I will ask EMC to test these packages.
> 
> Could you give a bit more context what is in these packages?
> Do EMC need to install _all_ of these packages? I assume they do and I'll
> ask them accordingly.

Nope. They only need to upgrade the packages that they have installed.  If they aren't running rhel-7.6, I can make them test packages for whatever they are running.

> Could you disclose what you changed in them?

Sure. The EMC path checker is the only checker function that uses a shared checker context, where all the paths for a multipath device share a context. It uses this to remember if a path is an inactive snapshot. It turns out that when paths are discovered before a multipath device is created, the shared context isn't created correctly, causing it to not correctly track inactive snapshots.  This is a problem on, AFAICT, every version of the multipath, including the latest upstream code. The fix simply makes multipath initialize the context correctly in this case.

> I can see you built these on
> top of the device-mapper-multipath-0.4.9-123
> which is shown as the latest in dist-git, so I cannot see those changes if
> this is your private branch.

I just pushed by private branch to dist-git. It's private-bmarzins-bz1696817-rhel-7.6, if you want to take a look.

> I'll let you know what EMC say.
> 
> Regards,
> Stan Saner

Comment 24 Stan Saner 2019-08-28 16:11:31 UTC

(In reply to Ben Marzinski from comment #23)

> Nope. They only need to upgrade the packages that they have installed.  If
> they aren't running rhel-7.6, I can make them test packages for whatever
> they are running.

Understood.

> > Could you disclose what you changed in them?
> 
> Sure. The EMC path checker is the only checker function that uses a shared
> checker context, where all the paths for a multipath device share a context.
> It uses this to remember if a path is an inactive snapshot. It turns out
> that when paths are discovered before a multipath device is created, the
> shared context isn't created correctly, causing it to not correctly track
> inactive snapshots.  This is a problem on, AFAICT, every version of the
> multipath, including the latest upstream code. The fix simply makes
> multipath initialize the context correctly in this case.

Thanks, that's very useful in case I need to "sell" the testing to the EMC

> I just pushed by private branch to dist-git. It's
> private-bmarzins-bz1696817-rhel-7.6, if you want to take a look.

Excellent, I can see the changes now. They are quite extensive.

The EMC engineer controlling the reproduction setup has been on a holiday (based on his auto-reply) returning tomorrow (29th Aug), so this is perfectly timed. I'll get the packages to him when he sets up an ftp account or some other way as he has no access to the Red Hat support case for the end customer where I could attach the rpms.

Comment 25 Ben Marzinski 2019-08-29 20:23:16 UTC

I've build a package with a fix to setting the shared multipath context for the path checker.

Comment 27 Stan Saner 2019-10-08 11:19:31 UTC

Finally we have the results of the test packages in the EMC reproducer. The results are very positive. Bottom line the problem is solved with these packages along with “detect_path_checker no” statement in /etc/multipath.conf file 

--- start of quote ---
I first setup a configuration where I could recreate the issue. 1 snap lun was created which was presented to a test server. The snap could be activated and desactivated. IOs were run against the snap devices continuously while the snap was activated/desactivated and every 10 minutes, the host was rebooted. The multipath.conf was configured as per RedHat recommendations. With the regular kernel, Excessive trespasses were being observed every 15 minutes.

    I replaced device-mapper-multipath-0.4.9-123.el7.x86_64.rpm,   device-mapper-multipath-libs-0.4.9-123.el7. x86_64.rpm  and kpartx-0.4.9-123.el7.x86_64.rpm with the packages you had provided (these were the only 3 relevant packages). Because I had not received instructions to install these packages, I forced an installation of device-mapper-multipath-0.4.9-123.el7.bz1696817.x86_64.rpm,  device-mapper-multipath-libs-0.4.9-123.el7.bz1696817.x86_64.rpm  and kpartx-0.4.9-123.el7.bz1696817.x86_64.rpm, and I removed the regular corresponding packages. This is the only method I was able to use (normal update/installation failed because of conflicting packages)

    I then tested during many hours, still with the multipath.conf file configured as per Red Hat recommendations, and I could no longer recreate the issue. When the snap was desactivated, the paths quickly went to their final state :

From:

mpathg (360060160295136009494d149efe8e911) dm-2 DGC     ,VRAID
size=20G features='1 retain_attached_hw_handler' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| |- 14:0:1:3 sdq 65:0  active ready running
| `- 13:0:1:3 sdi 8:128 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  |- 14:0:0:3 sdm 8:192 active ready running
  `- 13:0:0:3 sde 8:64  active ready running

To:

mpathg (360060160295136009494d149efe8e911) dm-2 DGC     ,VRAID
size=20G features='1 retain_attached_hw_handler' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| |- 14:0:1:3 sdq 65:0  active faulty running
| `- 13:0:1:3 sdi 8:128 active faulty running
`-+- policy='round-robin 0' prio=10 status=enabled
  |- 14:0:0:3 sdm 8:192 active ready running
  `- 13:0:0:3 sde 8:64  active ready running


then

mpathg (360060160295136009494d149efe8e911) dm-2 DGC     ,VRAID

size=20G features='1 retain_attached_hw_handler' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=90 status=enabled
| |- 14:0:1:3 sdq 65:0  failed ready running
| `- 13:0:1:3 sdi 8:128 failed faulty running
`-+- policy='round-robin 0' prio=50 status=enabled
  |- 14:0:0:3 sdm 8:192 failed faulty running
  `- 13:0:0:3 sde 8:64  failed ready running


And finally

mpathg (360060160295136009494d149efe8e911) dm-2 DGC     ,VRAID
size=20G features='1 retain_attached_hw_handler' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=90 status=enabled
| |- 14:0:1:3 sdq 65:0  failed ready running
| `- 13:0:1:3 sdi 8:128 failed ready running
`-+- policy='round-robin 0' prio=0 status=enabled
  |- 14:0:0:3 sdm 8:192 failed faulty running
  `- 13:0:0:3 sde 8:64  failed faulty running
 

And after an ultimate trespass :

mpathg (360060160295136009494d149efe8e911) dm-2 DGC     ,VRAID
size=20G features='1 retain_attached_hw_handler' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=0 status=enabled
| |- 14:0:1:3 sdq 65:0  failed faulty running
| `- 13:0:1:3 sdi 8:128 failed faulty running
`-+- policy='round-robin 0' prio=10 status=enabled
  |- 14:0:0:3 sdm 8:192 failed ready running
  `- 13:0:0:3 sde 8:64  failed ready running

               

                In messages, the expected records are found every 5 seconds :

Oct  8 13:21:22 dellpr730-o multipathd: mpathg: sde - emc_clariion_checker: Passive path to inactive snapshot WWN 60060160295136009494d149efe8e911.
Oct  8 13:21:22 dellpr730-o multipathd: mpathg: sdi - emc_clariion_checker: Active path to inactive snapshot WWN 60060160295136009494d149efe8e911.
Oct  8 13:21:22 dellpr730-o multipathd: mpathg: sdm - emc_clariion_checker: Passive path to inactive snapshot WWN 60060160295136009494d149efe8e911.
Oct  8 13:21:22 dellpr730-o multipathd: mpathg: sdq - emc_clariion_checker: Active path to inactive snapshot WWN 60060160295136009494d149efe8e911.
Oct  8 13:21:27 dellpr730-o multipathd: mpathg: sde - emc_clariion_checker: Passive path to inactive snapshot WWN 60060160295136009494d149efe8e911.
Oct  8 13:21:27 dellpr730-o multipathd: mpathg: sdi - emc_clariion_checker: Active path to inactive snapshot WWN 60060160295136009494d149efe8e911.
Oct  8 13:21:27 dellpr730-o multipathd: mpathg: sdm - emc_clariion_checker: Passive path to inactive snapshot WWN 60060160295136009494d149efe8e911.
Oct  8 13:21:27 dellpr730-o multipathd: mpathg: sdq - emc_clariion_checker: Active path to inactive snapshot WWN 60060160295136009494d149efe8e911.


    I then changed the multipath.conf file and removed the all the parameters RedHat had suggested to change, including the “detect_path_checker no” statement …. And with this change, the issue immediately re-occurred! With that changed applied, when I tried to reboot the system with a snap deactivated, it caused the excessive trespasses messages on the VNX ( (71164000)Excessive trespasses detected. Lun: 6006016029513600:9494D149EFE8E911 . Contact Your Service Provider. Run HAVT on the host(s) connected to the LUN in question.   0000040007002c00d3040000004016a1004016a1000000000000000000000000000000000000000071164000  disktarg) and the server refused to boot. I then had a hard time to reactivate the snap because the SP owner was constantly changing, but this is only after this reactivation that I was able to reboot. I repeated the experiment 3 times with the same result. I just added back the “detect_path_checker no” statement in multipath.conf and once again, I could no longer recreate the issue (I didn’t apply the other RedHat suggested changes as they are less relevant for the issue).

 

As a conclusion, fixing the issue requires 2 changes: 
- apply the Red Hat fixes and 
- add the “detect_path_checker no” statement in /etc/multipath.conf file.

 

Here is the final multipath.conf file:

## Use user friendly names, instead of using WWIDs as names.

defaults {
        user_friendly_names yes
        find_multipaths yes
        verbosity 2
        flush_on_last_del yes     
}

devices {
        device {
        vendor "DGC"
        product ".*"
        product_blacklist "LUNZ"
        path_grouping_policy group_by_prio
        path_selector "round-robin 0"
        path_checker "emc_clariion"
        features "1 queue_if_no_path"
        hardware_handler "1 alua"
        prio "alua"
        prio_args "exclusive_pref_bit"
        failback "immediate"
        rr_weight "uniform"
        no_path_retry 60
        detect_path_checker no
        }

--- end of quote ---

Comment 30 errata-xmlrpc 2020-03-31 19:47:09 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1066

Note You need to log in before you can comment on or make changes to this bug.