Bug 1883551 - [OSP13] Multiple instances can't boot with "Boot failed: not a bootable disk" after simple reboot
Summary: [OSP13] Multiple instances can't boot with "Boot failed: not a bootable disk"...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-os-brick
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: z16
: 13.0 (Queens)
Assignee: Gorka Eguileor
QA Contact: Tzach Shefi
URL:
Whiteboard:
: 1941749 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-09-29 14:37 UTC by Gorka Eguileor
Modified: 2022-10-03 14:45 UTC (History)
4 users (show)

Fixed In Version: python-os-brick-2.3.9-10.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-06-16 10:58:54 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1897787 0 None None None 2020-10-01 10:31:48 UTC
OpenStack gerrit 755478 0 None MERGED FC: Fix not flushing on detach 2021-05-03 08:51:14 UTC
Red Hat Issue Tracker OSP-411 0 None None None 2022-10-03 14:45:33 UTC
Red Hat Product Errata RHBA-2021:2385 0 None None None 2021-06-16 10:59:26 UTC

Description Gorka Eguileor 2020-09-29 14:37:22 UTC
This bug was initially created as a copy of Bug #1883351

I am copying this bug because: 



Description of problem:
Customer rebooted 3 instances and all of them can't boot with the BIOS error message: "Boot failed: not a bootable disk" 

The following is the work that was done for 1 of those instance. (more info to come in private comments after)

From the storage perspective all looks good.
The volume is attached to the instance and then the instance is booted.

This instance is Linux with only 1 disk.

Multipath shows the path active and up
I dd the volume and got data.

I've compared the domain (xml) from a working instance and the non working instance and couldn't see anything wrong.

We need your help to figure out what is wrong.


Version-Release number of selected component (if applicable):
OSP13 z11
openstack-nova-compute:13.0-129

How reproducible:
100% for last 3 instances rebooted

Steps to Reproduce:
1. Reboot an instance
2.
3.

Actual results:
Instance won't boot

Expected results:
Instance boot

Additional info:
More info to come below in private comments

Comment 1 Gorka Eguileor 2020-09-29 14:38:10 UTC
Looking at the detaching of the reboot ( req-2c678262-2d40-4ab4-885c-c7739660cce3 ) I noticed that we didn't flush the volume during the detach process.

Looking at the code I've found that there is a tricky bug in os-brick's where FC won't flush on disconnect if we failed to create a multipath and returned a single path instead (it will also affect encrypted volumes, but these are unencrypted so it doesn't affect us).

This issue is not present if we are just using a single path.

Comment 4 Gorka Eguileor 2021-03-22 17:07:58 UTC
*** Bug 1941749 has been marked as a duplicate of this bug. ***

Comment 12 Tzach Shefi 2021-04-12 08:17:14 UTC
Hey, 

Confirming verification/triggering steps with you,
An OSP 13z16 deployment with HPE 3par FC with multipath enabled.

Boot an instance from boot Cinder volume,  Cirros or rhel doesn't matter, 
reboot instance and confirm that OS still boots up correctly? 

Then I should break one of the mapths and reboot a second time. 
Plus for good measure swap broken FC path and reboot a third time. 

Would this cover testing? 
Thanks

Comment 14 Tzach Shefi 2021-04-27 10:08:00 UTC
Verified on:
python2-os-brick-2.3.9-10.el7ost.noarch


A deployment with Cinder using 3par iscsi multipath backend. 
Enable nova privsep debugging on compute node

Boot an instance from an image create boot vol/device:

(overcloud) [stack@undercloud-0 ~]$ nova boot instA --flavor tiny  --block-device source=image,id=94611c9f-3614-4131-b614-2dce71f89014,dest=volume,size=1,shutdown=preserve,bootindex=0
+--------------------------------------+-------------------------------------------------+
| Property                             | Value                                           |
+--------------------------------------+-------------------------------------------------+
| OS-DCF:diskConfig                    | MANUAL                                          |
| OS-EXT-AZ:availability_zone          |                                                 |
| OS-EXT-SRV-ATTR:host                 | -                                               |
| OS-EXT-SRV-ATTR:hostname             | insta                                           |
| OS-EXT-SRV-ATTR:hypervisor_hostname  | -                                               |
| OS-EXT-SRV-ATTR:instance_name        |                                                 |
| OS-EXT-SRV-ATTR:kernel_id            |                                                 |
| OS-EXT-SRV-ATTR:launch_index         | 0                                               |
| OS-EXT-SRV-ATTR:ramdisk_id           |                                                 |
| OS-EXT-SRV-ATTR:reservation_id       | r-9i6v9dpq                                      |
| OS-EXT-SRV-ATTR:root_device_name     | -                                               |
| OS-EXT-SRV-ATTR:user_data            | -                                               |
| OS-EXT-STS:power_state               | 0                                               |
| OS-EXT-STS:task_state                | scheduling                                      |
| OS-EXT-STS:vm_state                  | building                                        |
| OS-SRV-USG:launched_at               | -                                               |
| OS-SRV-USG:terminated_at             | -                                               |
| accessIPv4                           |                                                 |
| accessIPv6                           |                                                 |
| adminPass                            | BUe77kSB9KWz                                    |
| config_drive                         |                                                 |
| created                              | 2021-04-25T12:22:17Z                            |
| description                          | -                                               |
| flavor:disk                          | 1                                               |
| flavor:ephemeral                     | 0                                               |
| flavor:extra_specs                   | {}                                              |
| flavor:original_name                 | tiny                                            |
| flavor:ram                           | 64                                              |
| flavor:swap                          | 0                                               |
| flavor:vcpus                         | 1                                               |
| hostId                               |                                                 |
| host_status                          |                                                 |
| id                                   | ea970b3b-0cf7-483f-955d-92180e6f3489            |
| image                                | Attempt to boot from volume - no image supplied |
| key_name                             | -                                               |
| locked                               | False                                           |
| metadata                             | {}                                              |
| name                                 | instA                                           |
| os-extended-volumes:volumes_attached | []                                              |
| progress                             | 0                                               |
| security_groups                      | default                                         |
| status                               | BUILD                                           |
| tags                                 | []                                              |
| tenant_id                            | 7a03b9e1a60b4addbed4d7f05cbbfbf2                |
| updated                              | 2021-04-25T12:22:17Z                            |
| user_id                              | d02169b8feb847e6860a3248a96694dc                |
+--------------------------------------+-------------------------------------------------+

On nova compute log, as expected we only see a "target_iqn", rather than target_iqns keys:

2021-04-25 12:22:42.552 6 DEBUG os_brick.initiator.connectors.iscsi [req-b6718507-cbb4-4900-813b-846280eb8edd d02169b8feb847e6860a3248a96694dc 7a03b9e1a60b4addbed4d7f05cbbfbf2 - default default] ==> connect_volume: call u"{'args': (<os_brick.initiator.connectors.iscsi.ISCSIConnector object at 0x7f0e98cd3d90>, {u'target_discovered': True, u'encrypted': False, u'qos_specs': None, u'target_iqn': u'iqn.2000-05.com.3pardata:21220002ac021f6b', u'target_portal': u'10.35.146.4:3260', u'target_lun': 3, u'access_mode': u'rw'}), 'kwargs': {}}" trace_logging_wrapper /usr/lib/python2.7/site-packages/os_brick/utils.py:146


Now lets update nova.conf's volume_use_multipath=true,
restart nova compute

Hard reboot instace:
(overcloud) [stack@undercloud-0 ~]$ nova reboot instA --hard --poll                                                                                                                                                                          
Request to reboot server <Server: instA> has been accepted.

Server rebooting...
Finished
Wait for server <Server: instA> reboot.


(overcloud) [stack@undercloud-0 ~]$ nova list
+--------------------------------------+-------+--------+------------+-------------+--------------------------------------------+
| ID                                   | Name  | Status | Task State | Power State | Networks                                   |
+--------------------------------------+-------+--------+------------+-------------+--------------------------------------------+
| e0fed6a2-3137-44af-b528-dffbb3a1f862 | instA | ACTIVE | -          | Running     | public=10.0.0.181, 2620:52:0:13b8::1000:25 |
+--------------------------------------+-------+--------+------------+-------------+--------------------------------------------+


After the config change plus hard rebooting instance, 
first of all the instance recovered from reboot which is good, it failed before. 

We also notice the block device flushing: 

Disconnect_volume:
  1686 2021-04-25 12:25:17.863 6 DEBUG os_brick.initiator.connectors.iscsi [req-7e6864e0-6833-42c5-9482-8b03de192729 d02169b8feb847e6860a3248a96694dc 7a03b9e1a60b4addbed4d7f05cbbfbf2 - default default] ==> d        isconnect_volume: call u"{'args': (<os_brick.initiator.connectors.iscsi.ISCSIConnector object at 0x7fba41058dd0>, {u'device_path': u'/dev/sdc', u'target_discovered': True, u'encrypted': False, u'qos_s        pecs': None, u'target_iqn': u'iqn.2000-05.com.3pardata:21220002ac021f6b', u'target_portal': u'10.35.146.4:3260', u'target_lun': 3, u'access_mode': u'rw'}, None), 'kwargs': {}}" trace_logging_wrapper /        usr/lib/python2.7/site-packages/os_brick/utils.py:146
   1687 2021-04-25 12:25:17.864 6 DEBUG oslo_concurrency.lockutils [req-7e6864e0-6833-42c5-9482-8b03de192729 d02169b8feb847e6860a3248a96694dc 7a03b9e1a60b4addbed4d7f05cbbfbf2 - default default] Lock "connect_        volume" acquired by "os_brick.initiator.connectors.iscsi.disconnect_volume" :: waited 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:273
   1688 2021-04-25 12:25:17.866 6 INFO oslo.privsep.daemon [req-7e6864e0-6833-42c5-9482-8b03de192729 d02169b8feb847e6860a3248a96694dc 7a03b9e1a60b4addbed4d7f05cbbfbf2 - default default] Running privsep helper        : ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf', 'privsep-helper', '--config-file', '/usr/share/nova/nova-dist.conf', '--config-file', '/etc/nova/nova.conf', '--privsep_context', 'os_brick.privi        leged.default', '--privsep_sock_path', '/tmp/tmpIQUgQp/privsep.sock']
   168



blockdev flushbufs:
   1729 2021-04-25 12:25:18.822 6 DEBUG os_brick.initiator.connectors.iscsi [req-7e6864e0-6833-42c5-9482-8b03de192729 d02169b8feb847e6860a3248a96694dc 7a03b9e1a60b4addbed4d7f05cbbfbf2 - default default] Resul        ting device map defaultdict(<function <lambda> at 0x7fba3853e758>, {(u'10.35.146.4:3260', u'iqn.2000-05.com.3pardata:21220002ac021f6b'): (set([u'sdc']), set([u'sda', u'sdb']))}) _get_connection_device        s /usr/lib/python2.7/site-packages/os_brick/initiator/connectors/iscsi.py:837
   1730 2021-04-25 12:25:18.822 6 DEBUG os_brick.initiator.linuxscsi [req-7e6864e0-6833-42c5-9482-8b03de192729 d02169b8feb847e6860a3248a96694dc 7a03b9e1a60b4addbed4d7f05cbbfbf2 - default default] Removing mul        tipathed devices sdc remove_connection /usr/lib/python2.7/site-packages/os_brick/initiator/linuxscsi.py:271
   1731 2021-04-25 12:25:18.823 6 DEBUG os_brick.initiator.linuxscsi [req-7e6864e0-6833-42c5-9482-8b03de192729 d02169b8feb847e6860a3248a96694dc 7a03b9e1a60b4addbed4d7f05cbbfbf2 - default default] Flushing IO         for device /dev/sdc flush_device_io /usr/lib/python2.7/site-packages/os_brick/initiator/linuxscsi.py:320
   1732 2021-04-25 12:25:18.823 105 DEBUG oslo.privsep.daemon [-] privsep: request[140437786170224]: (3, 'os_brick.privileged.rootwrap.execute_root', ('blockdev', '--flushbufs', u'/dev/sdc'), {'attempts': 3,         'interval': 10, 'timeout': 300}) loop /usr/lib/python2.7/site-packages/oslo_privsep/daemon.py:443
   1733 2021-04-25 12:25:18.824 105 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): blockdev --flushbufs /dev/sdc execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:37        2
   1734 2021-04-25 12:25:18.869 105 DEBUG oslo_concurrency.processutils [-] CMD "blockdev --flushbufs /dev/sdc" returned: 0 in 0.045s execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:        409


re-connect_volume:
1761 2021-04-25 12:25:19.016 6 DEBUG nova.virt.libvirt.volume.iscsi [req-7e6864e0-6833-42c5-9482-8b03de192729 d02169b8feb847e6860a3248a96694dc 7a03b9e1a60b4addbed4d7f05cbbfbf2 - default default] Calling os        -brick to attach iSCSI Volume connect_volume /usr/lib/python2.7/site-packages/nova/virt/libvirt/volume/iscsi.py:63


Do note just a remined, while this instance (instance-00000006) is up and running after a reboot,
its boot cinder volume remains in single path, this depends on backend/driver returing multiptah information.

I also booted a second instance (instance-00000007) from Cinder volume, 
this instance booted after nova config change, as execpted is already multipath aware on it's boot volume.

[root@compute-0 nova]# virsh list
 Id    Name                           State
----------------------------------------------------
 9     instance-00000006              running
 10    instance-00000007              running

[root@compute-0 nova]# virsh domblklist instance-00000006   -> the original instance.
Target     Source
------------------------------------------------
vda        /dev/sdc

[root@compute-0 nova]# virsh domblklist instance-00000007   -> the second instance.
Target     Source
------------------------------------------------
vda        /dev/dm-0

Again this is expected behaviour for 3par backend/driver implemantion, other drivers might vary.  
To reslove this on the orginal instance, it should be shelve then unshelved.

Comment 20 errata-xmlrpc 2021-06-16 10:58:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 13.0 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2385


Note You need to log in before you can comment on or make changes to this bug.