Bug 1940548
Summary: | Fibre channel not flushing data on disconnect | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Rajat Dhasmana <rdhasman> | ||||
Component: | python-os-brick | Assignee: | Rajat Dhasmana <rdhasman> | ||||
Status: | CLOSED ERRATA | QA Contact: | Tzach Shefi <tshefi> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 16.1 (Train) | CC: | alonare, apevec, astupnik, geguileo, jschluet, jvisser, lhh, pgrist, pmannidi, sputhenp, udesale | ||||
Target Milestone: | z6 | Keywords: | Triaged | ||||
Target Release: | 16.1 (Train on RHEL 8.2) | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | python-os-brick-2.10.5-1.20201114041630.el8ost | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2021-05-26 13:52:17 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1936314 | ||||||
Attachments: |
|
Description
Rajat Dhasmana
2021-03-18 15:34:54 UTC
Created attachment 1782639 [details]
Nova compute log
Verified on:
python3-os-brick-2.10.5-1.20201114041631.el8ost.noarch
On a 3par FC multipath system, enabled privsep logs on compute node.
On one of the compute nodes I intently switched back to
volume_use_multipath=false plus restarted nova docker.
Then I booted an instance from a boot volume instA.
As expected instance booted-up using single path mode.
Then I reverted back to volume_use_multipath=true, restarted nova docker.
After which I'd successfully hard rebooted the instance.
On attached compute log we notice:
And 2 lines below the reboot begins
2021-05-09 14:12:42.276 7 INFO nova.compute.manager [req-b15f3264-7bc3-43ee-9519-f9fcfc5467fa b6d4233590524903bb53fb2c5d751b82 e61d0920577f4abb9254ee80a7fdb600 - default default] [instance: bf18e05e-234a-4dd9-9c99-e7465311f712] Rebooting instance
A little later os-brick starts disconnecting:
2021-05-09 14:12:43.604 7 DEBUG os_brick.initiator.connectors.fibre_channel [req-b15f3264-7bc3-43ee-9519-f9fcfc5467fa b6d4233590524903bb53fb2c5d751b82 e61d0920577f4abb9254ee80a7fdb600 - default default] ==> disconnect_volume: ...
And then after trying to find the multipath (which cannot find) it
finally gives up and just flushes the individual device /dev/sdb
2021-05-09 14:12:56.456 112 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): blockdev --flushbufs /dev/sdb execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:372
Which confirms this bug is now resolved,
A compute node originally running in single path mode, was switched over to using multipath mode. The instance running on it survived a reboot and didn't lose access to boot volume.
Adding extra bit of post verification info, I had also booted a second instB on compute-1 node after re-enabling volume_use_multipath=true. This instance as expected bootedup using mpath: [root@compute-1 ~]# multipath -ll 360002ac0000000000000155300021f6b dm-0 3PARdata,VV size=1.0G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw `-+- policy='service-time 0' prio=50 status=active |- 16:0:1:1 sdc 8:32 active ready running |- 16:0:0:1 sdd 8:48 active ready running |- 16:0:2:1 sde 8:64 active ready running |- 7:0:1:1 sdf 8:80 active ready running |- 7:0:0:1 sdg 8:96 active ready running `- 7:0:2:1 sdh 8:112 active ready running We don't see DM device for instA, as it's original connection state wasn't yet refreshed thus it remains in single path mode. To refresh instA's connection info we must use: nova shelved/nshelved, Doing so refres connection info and moves instA to MP state. (overcloud) [stack@seal41 ~]$ nova shelve instA +--------------------------------------+-------+-------------------+------------+-------------+-----------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+-------+-------------------+------------+-------------+-----------------------+ | bf18e05e-234a-4dd9-9c99-e7465311f712 | instA | SHELVED_OFFLOADED | spawning | Shutdown | internal=192.168.0.16 | | 1ad69879-450b-4b85-aed5-f44d7519805e | instB | ACTIVE | - | Running | internal=192.168.0.20 | +--------------------------------------+-------+-------------------+------------+-------------+-----------------------+ (overcloud) [stack@seal41 ~]$ nova unshelve instA (overcloud) [stack@seal41 ~]$ nova list +--------------------------------------+-------+--------+------------+-------------+-----------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+-------+--------+------------+-------------+-----------------------+ | bf18e05e-234a-4dd9-9c99-e7465311f712 | instA | ACTIVE | - | Running | internal=192.168.0.16 | | 1ad69879-450b-4b85-aed5-f44d7519805e | instB | ACTIVE | - | Running | internal=192.168.0.20 | +--------------------------------------+-------+--------+------------+-------------+-----------------------+ InstA is now also running in multipath, it just startedup/moved to compute-0 due shelve operation. instA: [heat-admin@compute-0 ~]$ sudo -i [root@compute-0 ~]# multipath -ll May 13 07:38:31 | /etc/multipath.conf line 13, duplicate keyword: skip_kpartx 360002ac0000000000000155200021f6b dm-0 3PARdata,VV size=1.0G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw `-+- policy='service-time 0' prio=50 status=active |- 15:0:1:0 sdb 8:16 active ready running |- 15:0:2:0 sdc 8:32 active ready running |- 15:0:0:0 sdd 8:48 active ready running |- 16:0:0:0 sde 8:64 active ready running |- 16:0:2:0 sdf 8:80 active ready running `- 16:0:1:0 sdg 8:96 active ready running instB, remains on compute-1 [root@compute-1 ~]# multipath -ll 360002ac0000000000000155300021f6b dm-0 3PARdata,VV size=1.0G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw `-+- policy='service-time 0' prio=50 status=active |- 16:0:1:1 sdc 8:32 active ready running |- 16:0:0:1 sdd 8:48 active ready running |- 16:0:2:1 sde 8:64 active ready running |- 7:0:1:1 sdf 8:80 active ready running |- 7:0:0:1 sdg 8:96 active ready running `- 7:0:2:1 sdh 8:112 active ready running [root@compute-1 ~]# And thus after switching compute-1 from SP to MP plus shelving/unshelving of InstA, Both instances now use multipathing. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 16.1.6 bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2097 |