The Fibre channel connector is not flushing the data on disconnect/detach causing partial data to be written on the volume.
Created attachment 1782639 [details] Nova compute log Verified on: python3-os-brick-2.10.5-1.20201114041631.el8ost.noarch On a 3par FC multipath system, enabled privsep logs on compute node. On one of the compute nodes I intently switched back to volume_use_multipath=false plus restarted nova docker. Then I booted an instance from a boot volume instA. As expected instance booted-up using single path mode. Then I reverted back to volume_use_multipath=true, restarted nova docker. After which I'd successfully hard rebooted the instance. On attached compute log we notice: And 2 lines below the reboot begins 2021-05-09 14:12:42.276 7 INFO nova.compute.manager [req-b15f3264-7bc3-43ee-9519-f9fcfc5467fa b6d4233590524903bb53fb2c5d751b82 e61d0920577f4abb9254ee80a7fdb600 - default default] [instance: bf18e05e-234a-4dd9-9c99-e7465311f712] Rebooting instance A little later os-brick starts disconnecting: 2021-05-09 14:12:43.604 7 DEBUG os_brick.initiator.connectors.fibre_channel [req-b15f3264-7bc3-43ee-9519-f9fcfc5467fa b6d4233590524903bb53fb2c5d751b82 e61d0920577f4abb9254ee80a7fdb600 - default default] ==> disconnect_volume: ... And then after trying to find the multipath (which cannot find) it finally gives up and just flushes the individual device /dev/sdb 2021-05-09 14:12:56.456 112 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): blockdev --flushbufs /dev/sdb execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:372 Which confirms this bug is now resolved, A compute node originally running in single path mode, was switched over to using multipath mode. The instance running on it survived a reboot and didn't lose access to boot volume.
Adding extra bit of post verification info, I had also booted a second instB on compute-1 node after re-enabling volume_use_multipath=true. This instance as expected bootedup using mpath: [root@compute-1 ~]# multipath -ll 360002ac0000000000000155300021f6b dm-0 3PARdata,VV size=1.0G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw `-+- policy='service-time 0' prio=50 status=active |- 16:0:1:1 sdc 8:32 active ready running |- 16:0:0:1 sdd 8:48 active ready running |- 16:0:2:1 sde 8:64 active ready running |- 7:0:1:1 sdf 8:80 active ready running |- 7:0:0:1 sdg 8:96 active ready running `- 7:0:2:1 sdh 8:112 active ready running We don't see DM device for instA, as it's original connection state wasn't yet refreshed thus it remains in single path mode. To refresh instA's connection info we must use: nova shelved/nshelved, Doing so refres connection info and moves instA to MP state. (overcloud) [stack@seal41 ~]$ nova shelve instA +--------------------------------------+-------+-------------------+------------+-------------+-----------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+-------+-------------------+------------+-------------+-----------------------+ | bf18e05e-234a-4dd9-9c99-e7465311f712 | instA | SHELVED_OFFLOADED | spawning | Shutdown | internal=192.168.0.16 | | 1ad69879-450b-4b85-aed5-f44d7519805e | instB | ACTIVE | - | Running | internal=192.168.0.20 | +--------------------------------------+-------+-------------------+------------+-------------+-----------------------+ (overcloud) [stack@seal41 ~]$ nova unshelve instA (overcloud) [stack@seal41 ~]$ nova list +--------------------------------------+-------+--------+------------+-------------+-----------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+-------+--------+------------+-------------+-----------------------+ | bf18e05e-234a-4dd9-9c99-e7465311f712 | instA | ACTIVE | - | Running | internal=192.168.0.16 | | 1ad69879-450b-4b85-aed5-f44d7519805e | instB | ACTIVE | - | Running | internal=192.168.0.20 | +--------------------------------------+-------+--------+------------+-------------+-----------------------+ InstA is now also running in multipath, it just startedup/moved to compute-0 due shelve operation. instA: [heat-admin@compute-0 ~]$ sudo -i [root@compute-0 ~]# multipath -ll May 13 07:38:31 | /etc/multipath.conf line 13, duplicate keyword: skip_kpartx 360002ac0000000000000155200021f6b dm-0 3PARdata,VV size=1.0G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw `-+- policy='service-time 0' prio=50 status=active |- 15:0:1:0 sdb 8:16 active ready running |- 15:0:2:0 sdc 8:32 active ready running |- 15:0:0:0 sdd 8:48 active ready running |- 16:0:0:0 sde 8:64 active ready running |- 16:0:2:0 sdf 8:80 active ready running `- 16:0:1:0 sdg 8:96 active ready running instB, remains on compute-1 [root@compute-1 ~]# multipath -ll 360002ac0000000000000155300021f6b dm-0 3PARdata,VV size=1.0G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw `-+- policy='service-time 0' prio=50 status=active |- 16:0:1:1 sdc 8:32 active ready running |- 16:0:0:1 sdd 8:48 active ready running |- 16:0:2:1 sde 8:64 active ready running |- 7:0:1:1 sdf 8:80 active ready running |- 7:0:0:1 sdg 8:96 active ready running `- 7:0:2:1 sdh 8:112 active ready running [root@compute-1 ~]# And thus after switching compute-1 from SP to MP plus shelving/unshelving of InstA, Both instances now use multipathing.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 16.1.6 bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2097