Bug 1623601

Summary:	Cannot execute multipath -F
Product:	[Red Hat Storage] Red Hat Ceph Storage	Reporter:	Yevhenii Shapovalov <yshapova>
Component:	iSCSI	Assignee:	Mike Christie <mchristi>
Status:	CLOSED ERRATA	QA Contact:	Manohar Murthy <mmurthy>
Severity:	high	Docs Contact:	Bara Ancincova <bancinco>
Priority:	high
Version:	3.1	CC:	agunn, ceph-eng-bugs, ceph-qe-bugs, dn-infra-peta-pers, hnallurv, jdillama, mchristi, mkasturi, tchandra, tpetr, tserlin, vakulkar, vumrao
Target Milestone:	z1	Flags:	vakulkar: automate_bug+ mkasturi: needinfo+
Target Release:	3.1
Hardware:	Unspecified
OS:	Linux
Whiteboard:
Fixed In Version:	tcmu-runner-1.4.0-0.3.el7cp	Doc Type:	Bug Fix
Doc Text:	.The DM-Multipath device's path no longer bounces between the failed and active state causing I/O failures, hangs, and performance issues In Red Hat Enterprise Linux 7.5, the kernel's ALUA layer reduced the number of times an initiator retries the SCSI sense code `ALUA State Transition`. This code is returned from the target side by the `tcmu-runner` service when taking the RBD exclusive lock during a failover or failback scenario and during a device discovery. As a consequence, the maximum number of retries had occurred before the discovery process was completed, and the SCSI layer returned a failure to the multipath I/O layer. The multipath I/O layer tried the next available path, and the same problem occurred. This behavior caused a loop of path checking, resulting in failed I/O operations and management operations to the multipath device. In addition, the logs on the initiator node printed messages about devices being removed and then re-added. This bug has been fixed, and the aforementioned operations no longer fail.	Story Points:	---
Clone Of:
Clones:	1624040 (view as bug list)		Environment:
Last Closed:	2018-11-09 00:59:32 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1584264

Description Yevhenii Shapovalov 2018-08-29 17:46:13 UTC

Description of problem:
iscsi device constantly busy by systemd-udevd. Issue can be reproduced on ceph 3.0

Version-Release number of selected component (if applicable):
RHEL 7.5(all nodes), kernel version 3.10.0-862.el7.x86_64, Ceph version 3.0 or 3.1

How reproducible:
20-30%

Steps to Reproduce:
1. Deploy ceph storage
2. Config iscsi target using gwcli  
3. Config iscsi initiator
4. Connect initiator to target
5. Create ext4 fs on iscsi disks
6. Mount disks to initiator
7. Do some i/o operations
8. Unmount all disks
9. execute multipath -F
Actual results:
Flushing failed
At least one of disks busy
[root@ceph-yshap-1535553643452-node4-iscsi-clients log]# fuser -vam /dev/mapper/mpatha
                     USER        PID ACCESS COMMAND
/dev/dm-0:           root      29785 f.... systemd-udevd


Expected results:
Clean up multipath

Additional info:
Env topology: 4 ods, 1 installer+mon+mgr, 1 iscsi intiator

[root@ceph-yshap-1535553643452-node6-osd ~]# gwcli
/> ls /
o- / ......................................................................................................................... [...]
  o- clusters ........................................................................................................ [Clusters: 1]
  | o- ceph .......................................................................................................... [HEALTH_WARN]
  |   o- pools .......................................................................................................... [Pools: 1]
  |   | o- rbd ............................................................................. [(x3), Commit: 16G/59G (27%), Used: 9G]
  |   o- topology ............................................................................................... [OSDs: 12,MONs: 1]
  o- disks ......................................................................................................... [16G, Disks: 8]
  | o- rbd.test_image9350 .................................................................................... [test_image9350 (2G)]
  | o- rbd.test_image9351 .................................................................................... [test_image9351 (2G)]
  | o- rbd.test_image9352 .................................................................................... [test_image9352 (2G)]
  | o- rbd.test_image9353 .................................................................................... [test_image9353 (2G)]
  | o- rbd.test_image9354 .................................................................................... [test_image9354 (2G)]
  | o- rbd.test_image9355 .................................................................................... [test_image9355 (2G)]
  | o- rbd.test_image9356 .................................................................................... [test_image9356 (2G)]
  | o- rbd.test_image9357 .................................................................................... [test_image9357 (2G)]
  o- iscsi-target ..................................................................................................... [Targets: 1]
    o- iqn.2003-01.com.redhat.iscsi-gw:ceph-igw ...................................................................... [Gateways: 4]
      o- gateways ............................................................................................ [Up: 4/4, Portals: 4]
      | o- ceph-yshap-1535553643452-node2-osd ................................................................ [172.16.115.126 (UP)]
      | o- ceph-yshap-1535553643452-node3-osd ................................................................. [172.16.115.49 (UP)]
      | o- ceph-yshap-1535553643452-node5-osd ................................................................ [172.16.115.141 (UP)]
      | o- ceph-yshap-1535553643452-node6-osd ................................................................ [172.16.115.159 (UP)]
      o- host-groups .................................................................................................. [Groups : 0]
      o- hosts .......................................................................................................... [Hosts: 1]
        o- iqn.1994-05.com.redhat:a0c1ceb63ca1 .............................................. [LOGGED-IN, Auth: CHAP, Disks: 8(16G)]
          o- lun 0 ............................................. [rbd.test_image9350(2G), Owner: ceph-yshap-1535553643452-node2-osd]
          o- lun 1 ............................................. [rbd.test_image9351(2G), Owner: ceph-yshap-1535553643452-node5-osd]
          o- lun 2 ............................................. [rbd.test_image9352(2G), Owner: ceph-yshap-1535553643452-node6-osd]
          o- lun 3 ............................................. [rbd.test_image9353(2G), Owner: ceph-yshap-1535553643452-node3-osd]
          o- lun 4 ............................................. [rbd.test_image9354(2G), Owner: ceph-yshap-1535553643452-node2-osd]
          o- lun 5 ............................................. [rbd.test_image9355(2G), Owner: ceph-yshap-1535553643452-node5-osd]
          o- lun 6 ............................................. [rbd.test_image9356(2G), Owner: ceph-yshap-1535553643452-node6-osd]
          o- lun 7 ............................................. [rbd.test_image9357(2G), Owner: ceph-yshap-1535553643452-node3-osd]


After flushing:
[root@ceph-yshap-1535553643452-node4-iscsi-clients log]# multipath -ll
mpatha (36001405c882304bed544a8ea7277cf3d) dm-0 LIO-ORG ,TCMU device     
size=2.0G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='queue-length 0' prio=50 status=enabled
| `- 3:0:0:0 sdj  8:144  failed ready running
|-+- policy='queue-length 0' prio=10 status=enabled
| `- 2:0:0:0 sda  8:0    failed ready running
|-+- policy='queue-length 0' prio=10 status=enabled
| `- 4:0:0:0 sdi  8:128  failed ready running
`-+- policy='queue-length 0' prio=10 status=active
  `- 5:0:0:0 sdk  8:160  active ready running
mpathg (360014050d58eba023ec4cb2a61b1972b) dm-6 LIO-ORG ,TCMU device     
size=2.0G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='queue-length 0' prio=50 status=active
| `- 4:0:0:1 sdae 65:224 failed ready running
|-+- policy='queue-length 0' prio=10 status=enabled
| `- 3:0:0:1 sdad 65:208 failed ready running
|-+- policy='queue-length 0' prio=10 status=enabled
| `- 5:0:0:1 sdaf 65:240 failed ready running
`-+- policy='queue-length 0' prio=10 status=enabled
  `- 2:0:0:1 sdh  8:112  failed ready running

[root@ceph-yshap-1535553643452-node4-iscsi-clients log]# cat  messages | grep systemd-udevd
Aug 29 10:44:11 ceph-yshap-1535553643452-node4-iscsi-clients systemd-udevd: starting version 219
Aug 29 10:44:12 ceph-yshap-1535553643452-node4-iscsi-clients systemd-udevd: starting version 219
Aug 29 11:19:03 ceph-yshap-1535553643452-node4-iscsi-clients systemd-udevd: worker [11525] /devices/virtual/block/dm-7 is taking a long time
Aug 29 11:19:51 ceph-yshap-1535553643452-node4-iscsi-clients systemd-udevd: worker [11523] /devices/virtual/block/dm-2 is taking a long time
Aug 29 11:20:53 ceph-yshap-1535553643452-node4-iscsi-clients systemd-udevd: worker [11525] /devices/virtual/block/dm-7 is taking a long time
Aug 29 11:22:19 ceph-yshap-1535553643452-node4-iscsi-clients systemd-udevd: worker [11526] /devices/virtual/block/dm-3 is taking a long time
Aug 29 11:24:52 ceph-yshap-1535553643452-node4-iscsi-clients systemd-udevd: worker [11525] /devices/virtual/block/dm-7 is taking a long time
Aug 29 11:26:41 ceph-yshap-1535553643452-node4-iscsi-clients systemd-udevd: worker [11523] /devices/virtual/block/dm-5 is taking a long time
Aug 29 11:29:57 ceph-yshap-1535553643452-node4-iscsi-clients systemd-udevd: worker [11523] /devices/virtual/block/dm-7 is taking a long time
Aug 29 11:46:45 ceph-yshap-1535553643452-node4-iscsi-clients systemd-udevd: worker [29079] /devices/virtual/block/dm-6 is taking a long time
Aug 29 11:52:25 ceph-yshap-1535553643452-node4-iscsi-clients systemd-udevd: worker [29785] /devices/virtual/block/dm-0 is taking a long time
Aug 29 11:53:28 ceph-yshap-1535553643452-node4-iscsi-clients systemd-udevd: worker [29885] /devices/virtual/block/dm-6 is taking a long time
Aug 29 11:54:10 ceph-yshap-1535553643452-node4-iscsi-clients systemd-udevd: worker [29785] /devices/virtual/block/dm-0 is taking a long time
Aug 29 11:56:25 ceph-yshap-1535553643452-node4-iscsi-clients systemd-udevd: worker [29785] /devices/virtual/block/dm-0 is taking a long time
Aug 29 11:56:25 ceph-yshap-1535553643452-node4-iscsi-clients systemd-udevd: worker [29885] /devices/virtual/block/dm-6 is taking a long time
Aug 29 12:01:48 ceph-yshap-1535553643452-node4-iscsi-clients systemd-udevd: worker [29785] /devices/virtual/block/dm-0 is taking a long time
Aug 29 12:03:13 ceph-yshap-1535553643452-node4-iscsi-clients systemd-udevd: timeout '/sbin/multipath -T 1535555852: -c /dev/dm-0'
Aug 29 12:03:13 ceph-yshap-1535553643452-node4-iscsi-clients systemd-udevd: slow: '/sbin/multipath -T 1535555852: -c /dev/dm-0' [31158]
Aug 29 12:03:13 ceph-yshap-1535553643452-node4-iscsi-clients systemd-udevd: timeout '/sbin/kpartx -an /dev/dm-0'
Aug 29 12:03:13 ceph-yshap-1535553643452-node4-iscsi-clients systemd-udevd: slow: '/sbin/kpartx -an /dev/dm-0' [31159]
Aug 29 12:04:27 ceph-yshap-1535553643452-node4-iscsi-clients systemd-udevd: worker [29785] /devices/virtual/block/dm-0 is taking a long time
Aug 29 12:04:27 ceph-yshap-1535553643452-node4-iscsi-clients systemd-udevd: worker [29885] /devices/virtual/block/dm-6 is taking a long time
Aug 29 12:06:06 ceph-yshap-1535553643452-node4-iscsi-clients systemd-udevd: worker [29885] /devices/virtual/block/dm-6 timeout; kill it
Aug 29 12:06:06 ceph-yshap-1535553643452-node4-iscsi-clients systemd-udevd: seq 4925 '/devices/virtual/block/dm-6' killed
Aug 29 12:06:17 ceph-yshap-1535553643452-node4-iscsi-clients systemd-udevd: worker [29885] terminated by signal 9 (Killed)
Aug 29 12:06:57 ceph-yshap-1535553643452-node4-iscsi-clients systemd-udevd: worker [29785] /devices/virtual/block/dm-0 is taking a long time
Aug 29 12:11:15 ceph-yshap-1535553643452-node4-iscsi-clients systemd-udevd: worker [29967] /devices/virtual/block/dm-6 is taking a long time
Aug 29 12:13:50 ceph-yshap-1535553643452-node4-iscsi-clients systemd-udevd: worker [29785] /devices/virtual/block/dm-0 is taking a long time
Aug 29 12:13:50 ceph-yshap-1535553643452-node4-iscsi-clients systemd-udevd: worker [29967] /devices/virtual/block/dm-6 is taking a long time
Aug 29 12:15:21 ceph-yshap-1535553643452-node4-iscsi-clients systemd-udevd: worker [29967] /devices/virtual/block/dm-6 is taking a long time
Aug 29 12:22:30 ceph-yshap-1535553643452-node4-iscsi-clients systemd-udevd: worker [29785] /devices/virtual/block/dm-0 is taking a long time
Aug 29 12:22:54 ceph-yshap-1535553643452-node4-iscsi-clients systemd-udevd: worker [29967] /devices/virtual/block/dm-6 is taking a long time


[root@ceph-yshap-1535553643452-node4-iscsi-clients log]# fuser -vam /dev/mapper/mpatha
                     USER        PID ACCESS COMMAND
/dev/dm-0:           root      29785 f.... systemd-udevd
[root@ceph-yshap-1535553643452-node4-iscsi-clients log]# ps -aux | grep 29785
root      5390  0.0  0.0 112704   980 pts/1    S+   12:45   0:00 grep --color=auto 29785
root     29785  0.0  0.1  48408  4296 ?        D    11:50   0:02 /usr/lib/systemd/systemd-udevd

Comment 4 Vasu Kulkarni 2018-08-29 18:17:36 UTC

Complete sequence of logs can be viewed from Reportportal (login as ceph/ceph)

http://cistatus.ceph.redhat.com/ui/#cephci/launches/New_filter%7Cpage.page=1&page.size=50&filter.cnt.name=iscsi&page.sort=start_time%2Cnumber%2CDESC/5b841e83b3cb1f000162d78d?page.page=1&page.size=50&page.sort=start_time%2CASC


One can click on each test case and there will be logs link that will tell the exact steps run like , setup cluster using ceph ansible -> create iscsi gateway using gw cli -> issue mpath commands and exercise IO etc

Comment 5 Yevhenii Shapovalov 2018-08-29 18:55:30 UTC

jenkins run with same issue https://ceph-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/RHCS%203.x%20RHEL%207.5/job/ceph-iscsi-sanity-3.x-rhel7.5/31/consoleFull

Comment 7 Mike Christie 2018-08-29 21:52:36 UTC

How can I get the target and initiator side logs for these?

I was going to say it looks like that recurring multipathd udev/systemd hung issue, but in the multpipath -ll output you pasted one of the devices has a APD and other device only has one path up.

Are you doing path down tests during this test? Did multipathd crash by any chance?

Comment 8 Mike Christie 2018-08-29 21:59:49 UTC

(In reply to Yevhenii Shapovalov from comment #5)
> jenkins run with same issue
> https://ceph-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/RHCS%203.
> x%20RHEL%207.5/job/ceph-iscsi-sanity-3.x-rhel7.5/31/consoleFull

This is a different issue right? BZ

https://bugzilla.redhat.com/show_bug.cgi?id=1623650

Comment 9 Mike Christie 2018-08-29 22:05:36 UTC

(In reply to Yevhenii Shapovalov from comment #5)
> jenkins run with same issue
> https://ceph-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/RHCS%203.
> x%20RHEL%207.5/job/ceph-iscsi-sanity-3.x-rhel7.5/31/consoleFull

This looks like it is BZ:

https://bugzilla.redhat.com/show_bug.cgi?id=1623650

Comment 10 Vasu Kulkarni 2018-08-29 22:36:53 UTC

Mike,

The failover test happens after this one, we purge the cluser and recreate the setup again for failover test but the IP's remain same, I have given you the iscsi client IP and login details along with Ceph cluster IP details, we can keep the system in same state for debug.

Thanks

Comment 11 Mike Christie 2018-08-30 01:11:47 UTC

It looks like this is caused by a RHEL 7.5 change to the alua driver:

commit a553c7f7a6001c17d6575f182772add9a113ca92
Author: Mike Snitzer <snitzer>
Date:   Fri Oct 27 16:11:45 2017 -0400

    [scsi] scsi_dh_alua: Recheck state on unit attention
    
    Message-id: <1509120727-26247-37-git-send-email-snitzer>
    Patchwork-id: 195279
    O-Subject: [RHEL7.5 PATCH v2 36/58] scsi_dh_alua: Recheck state on unit atte
ntion
    Bugzilla: 1499107
    RH-Acked-by: Jerry Snitselaar <jsnitsel>
    RH-Acked-by: Ewan Milne <emilne>
    
    BZ: 1499107
    
    commit 2b35865e7a290d313c3d156c0c2074b4c4ffaf52
    Author: Hannes Reinecke <hare>
    Date:   Fri Feb 19 09:17:13 2016 +0100
    
        scsi_dh_alua: Recheck state on unit attention
    
        When we receive a unit attention code of 'ALUA state changed'
        we should recheck the state, as it might be due to an implicit
        ALUA state transition. This allows us to return NEEDS_RETRY
        instead of ADD_TO_MLQUEUE, allowing to terminate the retries
        after a certain time.
        At the same time a workqueue item might already be queued, which
        should be started immediately to avoid any delays.
    
        Reviewed-by: Bart Van Assche <bart.vanassche>
        Reviewed-by: Christoph Hellwig <hch>
        Signed-off-by: Hannes Reinecke <hare>
        Signed-off-by: Martin K. Petersen <martin.petersen>
    
    Signed-off-by: Rafael Aquini <aquini>



The problem is that we return "alua state transition" while grabbing the lock. This can take a second or two. In RHEL 7.4 the alua layer handled this error by retrying for up to around 5 minutes. With the change above we get 5 retries but they are burned very quickly in a matter of hundreds of ms.

So what happens is the IO is failed, the path is failed, other paths are tried and this repeats. multipathd then gets backed up handling path renablement, and this can go on for the entire test.

I can fix this in a one line patch in tcmu-runner for 3.1.z.

I am also going to work on a upstream/RHEL fix.

Comment 14 Mike Christie 2018-09-05 05:31:15 UTC

*** Bug 1623650 has been marked as a duplicate of this bug. ***

Comment 15 Harish NV Rao 2018-09-05 07:28:02 UTC

@Mike, should this be in the 3.1 Release notes as a known issue?

Comment 17 Mike Christie 2018-09-05 20:31:05 UTC

*** Bug 1623650 has been marked as a duplicate of this bug. ***

Comment 38 Vikhyat Umrao 2018-10-02 17:37:00 UTC

*** Bug 1626836 has been marked as a duplicate of this bug. ***

Comment 45 errata-xmlrpc 2018-11-09 00:59:32 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3530