This bug was initially created as a copy of Bug #1883351 I am copying this bug because: Description of problem: Customer rebooted 3 instances and all of them can't boot with the BIOS error message: "Boot failed: not a bootable disk" The following is the work that was done for 1 of those instance. (more info to come in private comments after) From the storage perspective all looks good. The volume is attached to the instance and then the instance is booted. This instance is Linux with only 1 disk. Multipath shows the path active and up I dd the volume and got data. I've compared the domain (xml) from a working instance and the non working instance and couldn't see anything wrong. We need your help to figure out what is wrong. Version-Release number of selected component (if applicable): OSP13 z11 openstack-nova-compute:13.0-129 How reproducible: 100% for last 3 instances rebooted Steps to Reproduce: 1. Reboot an instance 2. 3. Actual results: Instance won't boot Expected results: Instance boot Additional info: More info to come below in private comments
Looking at the detaching of the reboot ( req-2c678262-2d40-4ab4-885c-c7739660cce3 ) I noticed that we didn't flush the volume during the detach process. Looking at the code I've found that there is a tricky bug in os-brick's where FC won't flush on disconnect if we failed to create a multipath and returned a single path instead (it will also affect encrypted volumes, but these are unencrypted so it doesn't affect us). This issue is not present if we are just using a single path.
*** Bug 1941749 has been marked as a duplicate of this bug. ***
Hey, Confirming verification/triggering steps with you, An OSP 13z16 deployment with HPE 3par FC with multipath enabled. Boot an instance from boot Cinder volume, Cirros or rhel doesn't matter, reboot instance and confirm that OS still boots up correctly? Then I should break one of the mapths and reboot a second time. Plus for good measure swap broken FC path and reboot a third time. Would this cover testing? Thanks
Verified on: python2-os-brick-2.3.9-10.el7ost.noarch A deployment with Cinder using 3par iscsi multipath backend. Enable nova privsep debugging on compute node Boot an instance from an image create boot vol/device: (overcloud) [stack@undercloud-0 ~]$ nova boot instA --flavor tiny --block-device source=image,id=94611c9f-3614-4131-b614-2dce71f89014,dest=volume,size=1,shutdown=preserve,bootindex=0 +--------------------------------------+-------------------------------------------------+ | Property | Value | +--------------------------------------+-------------------------------------------------+ | OS-DCF:diskConfig | MANUAL | | OS-EXT-AZ:availability_zone | | | OS-EXT-SRV-ATTR:host | - | | OS-EXT-SRV-ATTR:hostname | insta | | OS-EXT-SRV-ATTR:hypervisor_hostname | - | | OS-EXT-SRV-ATTR:instance_name | | | OS-EXT-SRV-ATTR:kernel_id | | | OS-EXT-SRV-ATTR:launch_index | 0 | | OS-EXT-SRV-ATTR:ramdisk_id | | | OS-EXT-SRV-ATTR:reservation_id | r-9i6v9dpq | | OS-EXT-SRV-ATTR:root_device_name | - | | OS-EXT-SRV-ATTR:user_data | - | | OS-EXT-STS:power_state | 0 | | OS-EXT-STS:task_state | scheduling | | OS-EXT-STS:vm_state | building | | OS-SRV-USG:launched_at | - | | OS-SRV-USG:terminated_at | - | | accessIPv4 | | | accessIPv6 | | | adminPass | BUe77kSB9KWz | | config_drive | | | created | 2021-04-25T12:22:17Z | | description | - | | flavor:disk | 1 | | flavor:ephemeral | 0 | | flavor:extra_specs | {} | | flavor:original_name | tiny | | flavor:ram | 64 | | flavor:swap | 0 | | flavor:vcpus | 1 | | hostId | | | host_status | | | id | ea970b3b-0cf7-483f-955d-92180e6f3489 | | image | Attempt to boot from volume - no image supplied | | key_name | - | | locked | False | | metadata | {} | | name | instA | | os-extended-volumes:volumes_attached | [] | | progress | 0 | | security_groups | default | | status | BUILD | | tags | [] | | tenant_id | 7a03b9e1a60b4addbed4d7f05cbbfbf2 | | updated | 2021-04-25T12:22:17Z | | user_id | d02169b8feb847e6860a3248a96694dc | +--------------------------------------+-------------------------------------------------+ On nova compute log, as expected we only see a "target_iqn", rather than target_iqns keys: 2021-04-25 12:22:42.552 6 DEBUG os_brick.initiator.connectors.iscsi [req-b6718507-cbb4-4900-813b-846280eb8edd d02169b8feb847e6860a3248a96694dc 7a03b9e1a60b4addbed4d7f05cbbfbf2 - default default] ==> connect_volume: call u"{'args': (<os_brick.initiator.connectors.iscsi.ISCSIConnector object at 0x7f0e98cd3d90>, {u'target_discovered': True, u'encrypted': False, u'qos_specs': None, u'target_iqn': u'iqn.2000-05.com.3pardata:21220002ac021f6b', u'target_portal': u'10.35.146.4:3260', u'target_lun': 3, u'access_mode': u'rw'}), 'kwargs': {}}" trace_logging_wrapper /usr/lib/python2.7/site-packages/os_brick/utils.py:146 Now lets update nova.conf's volume_use_multipath=true, restart nova compute Hard reboot instace: (overcloud) [stack@undercloud-0 ~]$ nova reboot instA --hard --poll Request to reboot server <Server: instA> has been accepted. Server rebooting... Finished Wait for server <Server: instA> reboot. (overcloud) [stack@undercloud-0 ~]$ nova list +--------------------------------------+-------+--------+------------+-------------+--------------------------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+-------+--------+------------+-------------+--------------------------------------------+ | e0fed6a2-3137-44af-b528-dffbb3a1f862 | instA | ACTIVE | - | Running | public=10.0.0.181, 2620:52:0:13b8::1000:25 | +--------------------------------------+-------+--------+------------+-------------+--------------------------------------------+ After the config change plus hard rebooting instance, first of all the instance recovered from reboot which is good, it failed before. We also notice the block device flushing: Disconnect_volume: 1686 2021-04-25 12:25:17.863 6 DEBUG os_brick.initiator.connectors.iscsi [req-7e6864e0-6833-42c5-9482-8b03de192729 d02169b8feb847e6860a3248a96694dc 7a03b9e1a60b4addbed4d7f05cbbfbf2 - default default] ==> d isconnect_volume: call u"{'args': (<os_brick.initiator.connectors.iscsi.ISCSIConnector object at 0x7fba41058dd0>, {u'device_path': u'/dev/sdc', u'target_discovered': True, u'encrypted': False, u'qos_s pecs': None, u'target_iqn': u'iqn.2000-05.com.3pardata:21220002ac021f6b', u'target_portal': u'10.35.146.4:3260', u'target_lun': 3, u'access_mode': u'rw'}, None), 'kwargs': {}}" trace_logging_wrapper / usr/lib/python2.7/site-packages/os_brick/utils.py:146 1687 2021-04-25 12:25:17.864 6 DEBUG oslo_concurrency.lockutils [req-7e6864e0-6833-42c5-9482-8b03de192729 d02169b8feb847e6860a3248a96694dc 7a03b9e1a60b4addbed4d7f05cbbfbf2 - default default] Lock "connect_ volume" acquired by "os_brick.initiator.connectors.iscsi.disconnect_volume" :: waited 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:273 1688 2021-04-25 12:25:17.866 6 INFO oslo.privsep.daemon [req-7e6864e0-6833-42c5-9482-8b03de192729 d02169b8feb847e6860a3248a96694dc 7a03b9e1a60b4addbed4d7f05cbbfbf2 - default default] Running privsep helper : ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf', 'privsep-helper', '--config-file', '/usr/share/nova/nova-dist.conf', '--config-file', '/etc/nova/nova.conf', '--privsep_context', 'os_brick.privi leged.default', '--privsep_sock_path', '/tmp/tmpIQUgQp/privsep.sock'] 168 blockdev flushbufs: 1729 2021-04-25 12:25:18.822 6 DEBUG os_brick.initiator.connectors.iscsi [req-7e6864e0-6833-42c5-9482-8b03de192729 d02169b8feb847e6860a3248a96694dc 7a03b9e1a60b4addbed4d7f05cbbfbf2 - default default] Resul ting device map defaultdict(<function <lambda> at 0x7fba3853e758>, {(u'10.35.146.4:3260', u'iqn.2000-05.com.3pardata:21220002ac021f6b'): (set([u'sdc']), set([u'sda', u'sdb']))}) _get_connection_device s /usr/lib/python2.7/site-packages/os_brick/initiator/connectors/iscsi.py:837 1730 2021-04-25 12:25:18.822 6 DEBUG os_brick.initiator.linuxscsi [req-7e6864e0-6833-42c5-9482-8b03de192729 d02169b8feb847e6860a3248a96694dc 7a03b9e1a60b4addbed4d7f05cbbfbf2 - default default] Removing mul tipathed devices sdc remove_connection /usr/lib/python2.7/site-packages/os_brick/initiator/linuxscsi.py:271 1731 2021-04-25 12:25:18.823 6 DEBUG os_brick.initiator.linuxscsi [req-7e6864e0-6833-42c5-9482-8b03de192729 d02169b8feb847e6860a3248a96694dc 7a03b9e1a60b4addbed4d7f05cbbfbf2 - default default] Flushing IO for device /dev/sdc flush_device_io /usr/lib/python2.7/site-packages/os_brick/initiator/linuxscsi.py:320 1732 2021-04-25 12:25:18.823 105 DEBUG oslo.privsep.daemon [-] privsep: request[140437786170224]: (3, 'os_brick.privileged.rootwrap.execute_root', ('blockdev', '--flushbufs', u'/dev/sdc'), {'attempts': 3, 'interval': 10, 'timeout': 300}) loop /usr/lib/python2.7/site-packages/oslo_privsep/daemon.py:443 1733 2021-04-25 12:25:18.824 105 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): blockdev --flushbufs /dev/sdc execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:37 2 1734 2021-04-25 12:25:18.869 105 DEBUG oslo_concurrency.processutils [-] CMD "blockdev --flushbufs /dev/sdc" returned: 0 in 0.045s execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py: 409 re-connect_volume: 1761 2021-04-25 12:25:19.016 6 DEBUG nova.virt.libvirt.volume.iscsi [req-7e6864e0-6833-42c5-9482-8b03de192729 d02169b8feb847e6860a3248a96694dc 7a03b9e1a60b4addbed4d7f05cbbfbf2 - default default] Calling os -brick to attach iSCSI Volume connect_volume /usr/lib/python2.7/site-packages/nova/virt/libvirt/volume/iscsi.py:63 Do note just a remined, while this instance (instance-00000006) is up and running after a reboot, its boot cinder volume remains in single path, this depends on backend/driver returing multiptah information. I also booted a second instance (instance-00000007) from Cinder volume, this instance booted after nova config change, as execpted is already multipath aware on it's boot volume. [root@compute-0 nova]# virsh list Id Name State ---------------------------------------------------- 9 instance-00000006 running 10 instance-00000007 running [root@compute-0 nova]# virsh domblklist instance-00000006 -> the original instance. Target Source ------------------------------------------------ vda /dev/sdc [root@compute-0 nova]# virsh domblklist instance-00000007 -> the second instance. Target Source ------------------------------------------------ vda /dev/dm-0 Again this is expected behaviour for 3par backend/driver implemantion, other drivers might vary. To reslove this on the orginal instance, it should be shelve then unshelved.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 13.0 bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2385