Bug 1950172 - multipath device should be explicitly removed to avoid delay in mutlipathd
Summary: multipath device should be explicitly removed to avoid delay in mutlipathd
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-os-brick
Version: 16.1 (Train)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: z7
: 16.1 (Train on RHEL 8.2)
Assignee: Gorka Eguileor
QA Contact: Tzach Shefi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-04-15 23:21 UTC by Takashi Kajinami
Modified: 2021-12-09 20:19 UTC (History)
10 users (show)

Fixed In Version: python-os-brick-2.10.5-1.20201114041632.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-12-09 20:18:39 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1924652 0 None None None 2021-04-16 00:02:44 UTC
OpenStack gerrit 785818 0 None MERGED multipath/iscsi: remove devices from multipath monitoring 2021-05-03 15:52:00 UTC
OpenStack gerrit 788795 0 None MERGED multipath/iscsi: remove devices from multipath monitoring 2021-05-09 23:28:24 UTC
Red Hat Bugzilla 1949369 1 low CLOSED path devices are not suddenly removed after flushing a multipath device 2023-10-09 08:23:32 UTC
Red Hat Issue Tracker OSP-3064 0 None None None 2021-11-18 11:36:18 UTC
Red Hat Product Errata RHBA-2021:3762 0 None None None 2021-12-09 20:19:00 UTC

Description Takashi Kajinami 2021-04-15 23:21:57 UTC
Description of problem:

Please refer to bz1949369 to find discussion about multipathd.

We noticed that after hard rebooting an instance, sometimes the instance uses a single iscsi device instead of mutlipath device, with the following warning message.
 No dm was created, connection to volume is probably bad and will perform poorly.

We confirmed that there are no errors or problems with iscsi device attachment.
However mutlipath device(dm-X) is not created even though "multipathd add" command succeeds, and os-brick decides to use a single path device because dm-X is not available.

After investigation and discussion with engineers covering multipathd, we found the following situation.
 - Recent multipathd delays path removal when it receives burst of udev events.
 - When os-brick detaches a multipath volume, it flushes a multipath device then removes the path devices directly in a short time. This is likely to cause "burst" of udev events
 - Multipathd delays path removal, but volume attachment process started very shortly. A multipath device is again created but because old orphan paths are not removed at this moment multipathd rejects to create a multipath device.

Because os-brick requires very timely device removal, it should not rely on multipathd to remove device paths based on udev events but explicitly request to remove paths when detaching a device.


Version-Release number of selected component (if applicable):


How reproducible:
The issue is frequently reproduced

Steps to Reproduce:
1. Create an instance with a multipath volume attached
2. Stop and start an instance

Actual results:
The instance sometimes uses a single device instead of a multipath device

Expected results:
The instance always uses a multipath device


Additional info:

Comment 23 Tzach Shefi 2021-08-16 14:17:38 UTC
Verified on:
python3-os-brick-2.10.5-1.20210706143310.634fb4a.el8ost.noarch

Deployed a system with netapp iSCSI as Cinder backend, multipath enabled.

Booted an instance:
(overcloud) [stack@undercloud-0 ~]$ nova list
+--------------------------------------+-------+--------+------------+-------------+-----------------------------------+
| ID                                   | Name  | Status | Task State | Power State | Networks                          |
+--------------------------------------+-------+--------+------------+-------------+-----------------------------------+
| ec2833f7-551d-4da0-823e-5fbebca580ae | inst1 | ACTIVE | -          | Running     | internal=192.168.0.28, 10.0.0.230 |
+--------------------------------------+-------+--------+------------+-------------+-----------------------------------+

Create a cinder volume on netapp:
(overcloud) [stack@undercloud-0 ~]$ cinder create 4 --volume-type netapp --name netapp_vol1
+--------------------------------+--------------------------------------+
| Property                       | Value                                |
+--------------------------------+--------------------------------------+
| attachments                    | []                                   |
| availability_zone              | nova                                 |
| bootable                       | false                                |
| consistencygroup_id            | None                                 |
| created_at                     | 2021-08-16T13:04:33.000000           |
| description                    | None                                 |
| encrypted                      | False                                |
| id                             | f1da9a20-2b04-4b3f-aaf7-5a215a427e4d |
| metadata                       | {}                                   |
| migration_status               | None                                 |
| multiattach                    | False                                |
| name                           | netapp_vol1                          |
| os-vol-host-attr:host          | None                                 |
| os-vol-mig-status-attr:migstat | None                                 |
| os-vol-mig-status-attr:name_id | None                                 |
| os-vol-tenant-attr:tenant_id   | df930227ed194c069ca864faad9226e4     |
| replication_status             | None                                 |
| size                           | 4                                    |
| snapshot_id                    | None                                 |
| source_volid                   | None                                 |
| status                         | creating                             |
| updated_at                     | None                                 |
| user_id                        | cacbcf58b6914d69b68082f254c1d9ed     |
| volume_type                    | netapp                               |
+--------------------------------+--------------------------------------+

(overcloud) [stack@undercloud-0 ~]$ cinder list
+--------------------------------------+-----------+-------------+------+-------------+----------+-------------+
| ID                                   | Status    | Name        | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+-----------+-------------+------+-------------+----------+-------------+
| f1da9a20-2b04-4b3f-aaf7-5a215a427e4d | available | netapp_vol1 | 4    | netapp      | false    |             |
+--------------------------------------+-----------+-------------+------+-------------+----------+-------------+

Attach volume to instance:
(overcloud) [stack@undercloud-0 ~]$ nova volume-attach inst1 f1da9a20-2b04-4b3f-aaf7-5a215a427e4d
+-----------------------+--------------------------------------+
| Property              | Value                                |
+-----------------------+--------------------------------------+
| delete_on_termination | False                                |
| device                | /dev/vdb                             |
| id                    | f1da9a20-2b04-4b3f-aaf7-5a215a427e4d |
| serverId              | ec2833f7-551d-4da0-823e-5fbebca580ae |
| tag                   | -                                    |
| volumeId              | f1da9a20-2b04-4b3f-aaf7-5a215a427e4d |
+-----------------------+--------------------------------------+

(overcloud) [stack@undercloud-0 ~]$ cinder show f1da9a20-2b04-4b3f-aaf7-5a215a427e4d
+--------------------------------+------------------------------------------+
| Property                       | Value                                    |
+--------------------------------+------------------------------------------+
| attached_servers               | ['ec2833f7-551d-4da0-823e-5fbebca580ae'] |
| attachment_ids                 | ['1db4c768-d96e-4b1d-86ba-0003a601c057'] |
| availability_zone              | nova                                     |
| bootable                       | false                                    |
| consistencygroup_id            | None                                     |
| created_at                     | 2021-08-16T13:04:33.000000               |
| description                    | None                                     |
| encrypted                      | False                                    |
| id                             | f1da9a20-2b04-4b3f-aaf7-5a215a427e4d     |
| metadata                       |                                          |
| migration_status               | None                                     |
| multiattach                    | False                                    |
| name                           | netapp_vol1                              |
| os-vol-host-attr:host          | hostgroup@tripleo_netapp#cinder_volumes  |
| os-vol-mig-status-attr:migstat | None                                     |
| os-vol-mig-status-attr:name_id | None                                     |
| os-vol-tenant-attr:tenant_id   | df930227ed194c069ca864faad9226e4         |
| replication_status             | None                                     |
| size                           | 4                                        |
| snapshot_id                    | None                                     |
| source_volid                   | None                                     |
| status                         | in-use                                   |
| updated_at                     | 2021-08-16T13:05:45.000000               |
| user_id                        | cacbcf58b6914d69b68082f254c1d9ed         |
| volume_type                    | netapp                                   |
+--------------------------------+------------------------------------------+


Lets confirm multipath is enabled/in-use:
On compute-0 where inst1 is hosted:

[root@compute-0 ~]# multipath -ll
3600a0980383146486f2b524858793352 dm-0 NETAPP,LUN C-Mode
size=4.0G features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| |- 2:0:0:0 sda 8:0  active ready running
| `- 3:0:0:0 sdb 8:16 active ready running
`-+- policy='service-time 0' prio=10 status=enabled
  |- 5:0:0:0 sdd 8:48 active ready running
  `- 4:0:0:0 sdc 8:32 active ready running


Virsh dump of disk device:
()[root@compute-0 /]# virsh dumpxml instance-00000002 | grep -A 3 -B 5 f1da 
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='none' io='native'/>
      <source dev='/dev/dm-0' index='4'/>
      <backingStore/>
      <target dev='vdb' bus='virtio'/>
      <serial>f1da9a20-2b04-4b3f-aaf7-5a215a427e4d</serial>
      <alias name='virtio-disk1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </disk>


We're good,as we see cinder volume attached using "dm" multipath.

Now lets stop and start the instance a few times, 
After every start, I'll recheck the disk again. 


(overcloud) [stack@undercloud-0 ~]$ #cycle .1
(overcloud) [stack@undercloud-0 ~]$ nova stop inst1
Request to stop server inst1 has been accepted.
(overcloud) [stack@undercloud-0 ~]$ nova list
+--------------------------------------+-------+---------+------------+-------------+-----------------------------------+
| ID                                   | Name  | Status  | Task State | Power State | Networks                          |
+--------------------------------------+-------+---------+------------+-------------+-----------------------------------+
| ec2833f7-551d-4da0-823e-5fbebca580ae | inst1 | SHUTOFF | -          | Shutdown    | internal=192.168.0.28, 10.0.0.230 |
+--------------------------------------+-------+---------+------------+-------------+-----------------------------------+
(overcloud) [stack@undercloud-0 ~]$ nova start inst1
Request to start server inst1 has been accepted.
(overcloud) [stack@undercloud-0 ~]$ nova list
+--------------------------------------+-------+--------+------------+-------------+-----------------------------------+
| ID                                   | Name  | Status | Task State | Power State | Networks                          |
+--------------------------------------+-------+--------+------------+-------------+-----------------------------------+
| ec2833f7-551d-4da0-823e-5fbebca580ae | inst1 | ACTIVE | -          | Running     | internal=192.168.0.28, 10.0.0.230 |
+--------------------------------------+-------+--------+------------+-------------+-----------------------------------

()[root@compute-0 /]# virsh dumpxml instance-00000002 | grep -A 3 -B 5 f1da 
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='none' io='native'/>
      <source dev='/dev/dm-0' index='1'/>
      <backingStore/>
      <target dev='vdb' bus='virtio'/>
      <serial>f1da9a20-2b04-4b3f-aaf7-5a215a427e4d</serial>
      <alias name='virtio-disk1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </disk>


(overcloud) [stack@undercloud-0 ~]$ #cycle .2
(overcloud) [stack@undercloud-0 ~]$ nova stop inst1
Request to stop server inst1 has been accepted.
(overcloud) [stack@undercloud-0 ~]$ nova list
+--------------------------------------+-------+---------+------------+-------------+-----------------------------------+
| ID                                   | Name  | Status  | Task State | Power State | Networks                          |
+--------------------------------------+-------+---------+------------+-------------+-----------------------------------+
| ec2833f7-551d-4da0-823e-5fbebca580ae | inst1 | SHUTOFF | -          | Shutdown    | internal=192.168.0.28, 10.0.0.230 |
+--------------------------------------+-------+---------+------------+-------------+-----------------------------------+
(overcloud) [stack@undercloud-0 ~]$ nova start inst1
Request to start server inst1 has been accepted.
(overcloud) [stack@undercloud-0 ~]$ nova list
+--------------------------------------+-------+--------+------------+-------------+-----------------------------------+
| ID                                   | Name  | Status | Task State | Power State | Networks                          |
+--------------------------------------+-------+--------+------------+-------------+-----------------------------------+
| ec2833f7-551d-4da0-823e-5fbebca580ae | inst1 | ACTIVE | -          | Running     | internal=192.168.0.28, 10.0.0.230 |
+--------------------------------------+-------+--------+------------+-------------+-----------------------------------+

()[root@compute-0 /]# virsh dumpxml instance-00000002 | grep -A 3 -B 5 f1da 
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='none' io='native'/>
      <source dev='/dev/dm-0' index='1'/>
      <backingStore/>
      <target dev='vdb' bus='virtio'/>
      <serial>f1da9a20-2b04-4b3f-aaf7-5a215a427e4d</serial>
      <alias name='virtio-disk1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </disk>



(overcloud) [stack@undercloud-0 ~]$ #cycle .3 
(overcloud) [stack@undercloud-0 ~]$ nova stop inst1
Request to stop server inst1 has been accepted.
(overcloud) [stack@undercloud-0 ~]$ nova list
+--------------------------------------+-------+---------+------------+-------------+-----------------------------------+
| ID                                   | Name  | Status  | Task State | Power State | Networks                          |
+--------------------------------------+-------+---------+------------+-------------+-----------------------------------+
| ec2833f7-551d-4da0-823e-5fbebca580ae | inst1 | SHUTOFF | -          | Shutdown    | internal=192.168.0.28, 10.0.0.230 |
+--------------------------------------+-------+---------+------------+-------------+-----------------------------------+
(overcloud) [stack@undercloud-0 ~]$ nova start inst1
Request to start server inst1 has been accepted.
(overcloud) [stack@undercloud-0 ~]$ nova list
+--------------------------------------+-------+--------+------------+-------------+-----------------------------------+
| ID                                   | Name  | Status | Task State | Power State | Networks                          |
+--------------------------------------+-------+--------+------------+-------------+-----------------------------------+
| ec2833f7-551d-4da0-823e-5fbebca580ae | inst1 | ACTIVE | -          | Running     | internal=192.168.0.28, 10.0.0.230 |
+--------------------------------------+-------+--------+------------+-------------+-----------------------------------+

()[root@compute-0 /]# virsh dumpxml instance-00000002 | grep -A 3 -B 5 f1da 
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='none' io='native'/>
      <source dev='/dev/dm-0' index='1'/>
      <backingStore/>
      <target dev='vdb' bus='virtio'/>
      <serial>f1da9a20-2b04-4b3f-aaf7-5a215a427e4d</serial>
      <alias name='virtio-disk1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </disk>


(overcloud) [stack@undercloud-0 ~]$ #cycle .4
(overcloud) [stack@undercloud-0 ~]$ nova stop inst1
Request to stop server inst1 has been accepted.
(overcloud) [stack@undercloud-0 ~]$ nova list
+--------------------------------------+-------+---------+------------+-------------+-----------------------------------+
| ID                                   | Name  | Status  | Task State | Power State | Networks                          |
+--------------------------------------+-------+---------+------------+-------------+-----------------------------------+
| ec2833f7-551d-4da0-823e-5fbebca580ae | inst1 | SHUTOFF | -          | Shutdown    | internal=192.168.0.28, 10.0.0.230 |
+--------------------------------------+-------+---------+------------+-------------+-----------------------------------+
(overcloud) [stack@undercloud-0 ~]$ nova start inst1
Request to start server inst1 has been accepted.
(overcloud) [stack@undercloud-0 ~]$ nova list
+--------------------------------------+-------+--------+------------+-------------+-----------------------------------+
| ID                                   | Name  | Status | Task State | Power State | Networks                          |
+--------------------------------------+-------+--------+------------+-------------+-----------------------------------+
| ec2833f7-551d-4da0-823e-5fbebca580ae | inst1 | ACTIVE | -          | Running     | internal=192.168.0.28, 10.0.0.230 |
+--------------------------------------+-------+--------+------------+-------------+-----------------------------------+

()[root@compute-0 /]# virsh dumpxml instance-00000002 | grep -A 3 -B 5 f1da 
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='none' io='native'/>
      <source dev='/dev/dm-0' index='1'/>
      <backingStore/>
      <target dev='vdb' bus='virtio'/>
      <serial>f1da9a20-2b04-4b3f-aaf7-5a215a427e4d</serial>
      <alias name='virtio-disk1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </disk>


Looks good to verify, tested 4 stop/start cycles,
all of which resulted in an expected dm/multipathed attachment of cinder volume. 

Just to be extra sure, used the below bash loop to retest 20 cycles:
set -x
for i in {1..20}
do
   echo cycle$i
   openstack server stop inst1
   sleep 3
   openstack server list
   openstack server start inst1
   sleep 10
   openstack server list
   ssh heat-admin.24.12 sudo podman exec -it nova_libvirt virsh dumpxml instance-00000002 | grep -A 3 -B 5 f1da >> log.txt
done

Resulting log.txt indicated use of dm/multipath on all 20 attempts(see below)
We are good to verify.

(undercloud) [stack@undercloud-0 ~]$ grep source log.txt 
      <source dev='/dev/dm-0' index='1'/>
      <source dev='/dev/dm-0' index='1'/>
      <source dev='/dev/dm-0' index='1'/>
      <source dev='/dev/dm-0' index='1'/>
      <source dev='/dev/dm-0' index='1'/>
      <source dev='/dev/dm-0' index='1'/>
      <source dev='/dev/dm-0' index='1'/>
      <source dev='/dev/dm-0' index='1'/>
      <source dev='/dev/dm-0' index='1'/>
      <source dev='/dev/dm-0' index='1'/>
      <source dev='/dev/dm-0' index='1'/>
      <source dev='/dev/dm-0' index='1'/>
      <source dev='/dev/dm-0' index='1'/>
      <source dev='/dev/dm-0' index='1'/>
      <source dev='/dev/dm-0' index='1'/>
      <source dev='/dev/dm-0' index='1'/>
      <source dev='/dev/dm-0' index='1'/>
      <source dev='/dev/dm-0' index='1'/>
      <source dev='/dev/dm-0' index='1'/>
      <source dev='/dev/dm-0' index='1'/>

Comment 35 errata-xmlrpc 2021-12-09 20:18:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.7 (Train) bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3762


Note You need to log in before you can comment on or make changes to this bug.