Description of problem: Please refer to bz1949369 to find discussion about multipathd. We noticed that after hard rebooting an instance, sometimes the instance uses a single iscsi device instead of mutlipath device, with the following warning message. No dm was created, connection to volume is probably bad and will perform poorly. We confirmed that there are no errors or problems with iscsi device attachment. However mutlipath device(dm-X) is not created even though "multipathd add" command succeeds, and os-brick decides to use a single path device because dm-X is not available. After investigation and discussion with engineers covering multipathd, we found the following situation. - Recent multipathd delays path removal when it receives burst of udev events. - When os-brick detaches a multipath volume, it flushes a multipath device then removes the path devices directly in a short time. This is likely to cause "burst" of udev events - Multipathd delays path removal, but volume attachment process started very shortly. A multipath device is again created but because old orphan paths are not removed at this moment multipathd rejects to create a multipath device. Because os-brick requires very timely device removal, it should not rely on multipathd to remove device paths based on udev events but explicitly request to remove paths when detaching a device. Version-Release number of selected component (if applicable): How reproducible: The issue is frequently reproduced Steps to Reproduce: 1. Create an instance with a multipath volume attached 2. Stop and start an instance Actual results: The instance sometimes uses a single device instead of a multipath device Expected results: The instance always uses a multipath device Additional info:
Verified on: python3-os-brick-2.10.5-1.20210706143310.634fb4a.el8ost.noarch Deployed a system with netapp iSCSI as Cinder backend, multipath enabled. Booted an instance: (overcloud) [stack@undercloud-0 ~]$ nova list +--------------------------------------+-------+--------+------------+-------------+-----------------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+-------+--------+------------+-------------+-----------------------------------+ | ec2833f7-551d-4da0-823e-5fbebca580ae | inst1 | ACTIVE | - | Running | internal=192.168.0.28, 10.0.0.230 | +--------------------------------------+-------+--------+------------+-------------+-----------------------------------+ Create a cinder volume on netapp: (overcloud) [stack@undercloud-0 ~]$ cinder create 4 --volume-type netapp --name netapp_vol1 +--------------------------------+--------------------------------------+ | Property | Value | +--------------------------------+--------------------------------------+ | attachments | [] | | availability_zone | nova | | bootable | false | | consistencygroup_id | None | | created_at | 2021-08-16T13:04:33.000000 | | description | None | | encrypted | False | | id | f1da9a20-2b04-4b3f-aaf7-5a215a427e4d | | metadata | {} | | migration_status | None | | multiattach | False | | name | netapp_vol1 | | os-vol-host-attr:host | None | | os-vol-mig-status-attr:migstat | None | | os-vol-mig-status-attr:name_id | None | | os-vol-tenant-attr:tenant_id | df930227ed194c069ca864faad9226e4 | | replication_status | None | | size | 4 | | snapshot_id | None | | source_volid | None | | status | creating | | updated_at | None | | user_id | cacbcf58b6914d69b68082f254c1d9ed | | volume_type | netapp | +--------------------------------+--------------------------------------+ (overcloud) [stack@undercloud-0 ~]$ cinder list +--------------------------------------+-----------+-------------+------+-------------+----------+-------------+ | ID | Status | Name | Size | Volume Type | Bootable | Attached to | +--------------------------------------+-----------+-------------+------+-------------+----------+-------------+ | f1da9a20-2b04-4b3f-aaf7-5a215a427e4d | available | netapp_vol1 | 4 | netapp | false | | +--------------------------------------+-----------+-------------+------+-------------+----------+-------------+ Attach volume to instance: (overcloud) [stack@undercloud-0 ~]$ nova volume-attach inst1 f1da9a20-2b04-4b3f-aaf7-5a215a427e4d +-----------------------+--------------------------------------+ | Property | Value | +-----------------------+--------------------------------------+ | delete_on_termination | False | | device | /dev/vdb | | id | f1da9a20-2b04-4b3f-aaf7-5a215a427e4d | | serverId | ec2833f7-551d-4da0-823e-5fbebca580ae | | tag | - | | volumeId | f1da9a20-2b04-4b3f-aaf7-5a215a427e4d | +-----------------------+--------------------------------------+ (overcloud) [stack@undercloud-0 ~]$ cinder show f1da9a20-2b04-4b3f-aaf7-5a215a427e4d +--------------------------------+------------------------------------------+ | Property | Value | +--------------------------------+------------------------------------------+ | attached_servers | ['ec2833f7-551d-4da0-823e-5fbebca580ae'] | | attachment_ids | ['1db4c768-d96e-4b1d-86ba-0003a601c057'] | | availability_zone | nova | | bootable | false | | consistencygroup_id | None | | created_at | 2021-08-16T13:04:33.000000 | | description | None | | encrypted | False | | id | f1da9a20-2b04-4b3f-aaf7-5a215a427e4d | | metadata | | | migration_status | None | | multiattach | False | | name | netapp_vol1 | | os-vol-host-attr:host | hostgroup@tripleo_netapp#cinder_volumes | | os-vol-mig-status-attr:migstat | None | | os-vol-mig-status-attr:name_id | None | | os-vol-tenant-attr:tenant_id | df930227ed194c069ca864faad9226e4 | | replication_status | None | | size | 4 | | snapshot_id | None | | source_volid | None | | status | in-use | | updated_at | 2021-08-16T13:05:45.000000 | | user_id | cacbcf58b6914d69b68082f254c1d9ed | | volume_type | netapp | +--------------------------------+------------------------------------------+ Lets confirm multipath is enabled/in-use: On compute-0 where inst1 is hosted: [root@compute-0 ~]# multipath -ll 3600a0980383146486f2b524858793352 dm-0 NETAPP,LUN C-Mode size=4.0G features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 alua' wp=rw |-+- policy='service-time 0' prio=50 status=active | |- 2:0:0:0 sda 8:0 active ready running | `- 3:0:0:0 sdb 8:16 active ready running `-+- policy='service-time 0' prio=10 status=enabled |- 5:0:0:0 sdd 8:48 active ready running `- 4:0:0:0 sdc 8:32 active ready running Virsh dump of disk device: ()[root@compute-0 /]# virsh dumpxml instance-00000002 | grep -A 3 -B 5 f1da <disk type='block' device='disk'> <driver name='qemu' type='raw' cache='none' io='native'/> <source dev='/dev/dm-0' index='4'/> <backingStore/> <target dev='vdb' bus='virtio'/> <serial>f1da9a20-2b04-4b3f-aaf7-5a215a427e4d</serial> <alias name='virtio-disk1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </disk> We're good,as we see cinder volume attached using "dm" multipath. Now lets stop and start the instance a few times, After every start, I'll recheck the disk again. (overcloud) [stack@undercloud-0 ~]$ #cycle .1 (overcloud) [stack@undercloud-0 ~]$ nova stop inst1 Request to stop server inst1 has been accepted. (overcloud) [stack@undercloud-0 ~]$ nova list +--------------------------------------+-------+---------+------------+-------------+-----------------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+-------+---------+------------+-------------+-----------------------------------+ | ec2833f7-551d-4da0-823e-5fbebca580ae | inst1 | SHUTOFF | - | Shutdown | internal=192.168.0.28, 10.0.0.230 | +--------------------------------------+-------+---------+------------+-------------+-----------------------------------+ (overcloud) [stack@undercloud-0 ~]$ nova start inst1 Request to start server inst1 has been accepted. (overcloud) [stack@undercloud-0 ~]$ nova list +--------------------------------------+-------+--------+------------+-------------+-----------------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+-------+--------+------------+-------------+-----------------------------------+ | ec2833f7-551d-4da0-823e-5fbebca580ae | inst1 | ACTIVE | - | Running | internal=192.168.0.28, 10.0.0.230 | +--------------------------------------+-------+--------+------------+-------------+----------------------------------- ()[root@compute-0 /]# virsh dumpxml instance-00000002 | grep -A 3 -B 5 f1da <disk type='block' device='disk'> <driver name='qemu' type='raw' cache='none' io='native'/> <source dev='/dev/dm-0' index='1'/> <backingStore/> <target dev='vdb' bus='virtio'/> <serial>f1da9a20-2b04-4b3f-aaf7-5a215a427e4d</serial> <alias name='virtio-disk1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </disk> (overcloud) [stack@undercloud-0 ~]$ #cycle .2 (overcloud) [stack@undercloud-0 ~]$ nova stop inst1 Request to stop server inst1 has been accepted. (overcloud) [stack@undercloud-0 ~]$ nova list +--------------------------------------+-------+---------+------------+-------------+-----------------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+-------+---------+------------+-------------+-----------------------------------+ | ec2833f7-551d-4da0-823e-5fbebca580ae | inst1 | SHUTOFF | - | Shutdown | internal=192.168.0.28, 10.0.0.230 | +--------------------------------------+-------+---------+------------+-------------+-----------------------------------+ (overcloud) [stack@undercloud-0 ~]$ nova start inst1 Request to start server inst1 has been accepted. (overcloud) [stack@undercloud-0 ~]$ nova list +--------------------------------------+-------+--------+------------+-------------+-----------------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+-------+--------+------------+-------------+-----------------------------------+ | ec2833f7-551d-4da0-823e-5fbebca580ae | inst1 | ACTIVE | - | Running | internal=192.168.0.28, 10.0.0.230 | +--------------------------------------+-------+--------+------------+-------------+-----------------------------------+ ()[root@compute-0 /]# virsh dumpxml instance-00000002 | grep -A 3 -B 5 f1da <disk type='block' device='disk'> <driver name='qemu' type='raw' cache='none' io='native'/> <source dev='/dev/dm-0' index='1'/> <backingStore/> <target dev='vdb' bus='virtio'/> <serial>f1da9a20-2b04-4b3f-aaf7-5a215a427e4d</serial> <alias name='virtio-disk1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </disk> (overcloud) [stack@undercloud-0 ~]$ #cycle .3 (overcloud) [stack@undercloud-0 ~]$ nova stop inst1 Request to stop server inst1 has been accepted. (overcloud) [stack@undercloud-0 ~]$ nova list +--------------------------------------+-------+---------+------------+-------------+-----------------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+-------+---------+------------+-------------+-----------------------------------+ | ec2833f7-551d-4da0-823e-5fbebca580ae | inst1 | SHUTOFF | - | Shutdown | internal=192.168.0.28, 10.0.0.230 | +--------------------------------------+-------+---------+------------+-------------+-----------------------------------+ (overcloud) [stack@undercloud-0 ~]$ nova start inst1 Request to start server inst1 has been accepted. (overcloud) [stack@undercloud-0 ~]$ nova list +--------------------------------------+-------+--------+------------+-------------+-----------------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+-------+--------+------------+-------------+-----------------------------------+ | ec2833f7-551d-4da0-823e-5fbebca580ae | inst1 | ACTIVE | - | Running | internal=192.168.0.28, 10.0.0.230 | +--------------------------------------+-------+--------+------------+-------------+-----------------------------------+ ()[root@compute-0 /]# virsh dumpxml instance-00000002 | grep -A 3 -B 5 f1da <disk type='block' device='disk'> <driver name='qemu' type='raw' cache='none' io='native'/> <source dev='/dev/dm-0' index='1'/> <backingStore/> <target dev='vdb' bus='virtio'/> <serial>f1da9a20-2b04-4b3f-aaf7-5a215a427e4d</serial> <alias name='virtio-disk1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </disk> (overcloud) [stack@undercloud-0 ~]$ #cycle .4 (overcloud) [stack@undercloud-0 ~]$ nova stop inst1 Request to stop server inst1 has been accepted. (overcloud) [stack@undercloud-0 ~]$ nova list +--------------------------------------+-------+---------+------------+-------------+-----------------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+-------+---------+------------+-------------+-----------------------------------+ | ec2833f7-551d-4da0-823e-5fbebca580ae | inst1 | SHUTOFF | - | Shutdown | internal=192.168.0.28, 10.0.0.230 | +--------------------------------------+-------+---------+------------+-------------+-----------------------------------+ (overcloud) [stack@undercloud-0 ~]$ nova start inst1 Request to start server inst1 has been accepted. (overcloud) [stack@undercloud-0 ~]$ nova list +--------------------------------------+-------+--------+------------+-------------+-----------------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+-------+--------+------------+-------------+-----------------------------------+ | ec2833f7-551d-4da0-823e-5fbebca580ae | inst1 | ACTIVE | - | Running | internal=192.168.0.28, 10.0.0.230 | +--------------------------------------+-------+--------+------------+-------------+-----------------------------------+ ()[root@compute-0 /]# virsh dumpxml instance-00000002 | grep -A 3 -B 5 f1da <disk type='block' device='disk'> <driver name='qemu' type='raw' cache='none' io='native'/> <source dev='/dev/dm-0' index='1'/> <backingStore/> <target dev='vdb' bus='virtio'/> <serial>f1da9a20-2b04-4b3f-aaf7-5a215a427e4d</serial> <alias name='virtio-disk1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </disk> Looks good to verify, tested 4 stop/start cycles, all of which resulted in an expected dm/multipathed attachment of cinder volume. Just to be extra sure, used the below bash loop to retest 20 cycles: set -x for i in {1..20} do echo cycle$i openstack server stop inst1 sleep 3 openstack server list openstack server start inst1 sleep 10 openstack server list ssh heat-admin.24.12 sudo podman exec -it nova_libvirt virsh dumpxml instance-00000002 | grep -A 3 -B 5 f1da >> log.txt done Resulting log.txt indicated use of dm/multipath on all 20 attempts(see below) We are good to verify. (undercloud) [stack@undercloud-0 ~]$ grep source log.txt <source dev='/dev/dm-0' index='1'/> <source dev='/dev/dm-0' index='1'/> <source dev='/dev/dm-0' index='1'/> <source dev='/dev/dm-0' index='1'/> <source dev='/dev/dm-0' index='1'/> <source dev='/dev/dm-0' index='1'/> <source dev='/dev/dm-0' index='1'/> <source dev='/dev/dm-0' index='1'/> <source dev='/dev/dm-0' index='1'/> <source dev='/dev/dm-0' index='1'/> <source dev='/dev/dm-0' index='1'/> <source dev='/dev/dm-0' index='1'/> <source dev='/dev/dm-0' index='1'/> <source dev='/dev/dm-0' index='1'/> <source dev='/dev/dm-0' index='1'/> <source dev='/dev/dm-0' index='1'/> <source dev='/dev/dm-0' index='1'/> <source dev='/dev/dm-0' index='1'/> <source dev='/dev/dm-0' index='1'/> <source dev='/dev/dm-0' index='1'/>
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 16.1.7 (Train) bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:3762