1940548 – Fibre channel not flushing data on disconnect

Bug 1940548 - Fibre channel not flushing data on disconnect

Summary: Fibre channel not flushing data on disconnect

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	python-os-brick
Sub Component:
Version:	16.1 (Train)
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	z6
Target Release:	16.1 (Train on RHEL 8.2)
Assignee:	Rajat Dhasmana
QA Contact:	Tzach Shefi
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1936314
TreeView+	depends on / blocked

Reported:	2021-03-18 15:34 UTC by Rajat Dhasmana
Modified:	2024-06-14 00:53 UTC (History)
CC List:	11 users (show)
Fixed In Version:	python-os-brick-2.10.5-1.20201114041630.el8ost
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-05-26 13:52:17 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Nova compute log (1.40 MB, text/plain) 2021-05-13 07:28 UTC, Tzach Shefi	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Launchpad	1897787	None	None	None	2021-03-19 10:02:54 UTC
OpenStack gerrit	777409	None	MERGED	FC: Fix not flushing on detach	2021-03-19 10:02:54 UTC
Red Hat Issue Tracker	OSP-414	None	None	None	2022-10-03 14:47:37 UTC
Red Hat Product Errata	RHBA-2021:2097	None	None	None	2021-05-26 13:53:03 UTC

Description Rajat Dhasmana 2021-03-18 15:34:54 UTC

The Fibre channel connector is not flushing the data on disconnect/detach causing partial data to be written on the volume.

Comment 9 Tzach Shefi 2021-05-13 07:28:47 UTC

Created attachment 1782639 [details]
Nova compute log

Verified on:
python3-os-brick-2.10.5-1.20201114041631.el8ost.noarch

On a 3par FC multipath system, enabled privsep logs on compute node. 

On one of the compute nodes I intently switched back to 
volume_use_multipath=false plus restarted nova docker. 

Then I booted an instance from a boot volume  instA. 
As expected instance booted-up using single path mode. 

Then I reverted back to volume_use_multipath=true, restarted nova docker.
After which I'd successfully hard rebooted the instance.

On attached compute log we notice:

And 2 lines below the reboot begins

  2021-05-09 14:12:42.276 7 INFO nova.compute.manager [req-b15f3264-7bc3-43ee-9519-f9fcfc5467fa b6d4233590524903bb53fb2c5d751b82 e61d0920577f4abb9254ee80a7fdb600 - default default] [instance: bf18e05e-234a-4dd9-9c99-e7465311f712] Rebooting instance

A little later os-brick starts disconnecting:

  2021-05-09 14:12:43.604 7 DEBUG os_brick.initiator.connectors.fibre_channel [req-b15f3264-7bc3-43ee-9519-f9fcfc5467fa b6d4233590524903bb53fb2c5d751b82 e61d0920577f4abb9254ee80a7fdb600 - default default] ==> disconnect_volume: ...

And then after trying to find the multipath (which cannot find) it
finally gives up and just flushes the individual device /dev/sdb

  2021-05-09 14:12:56.456 112 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): blockdev --flushbufs /dev/sdb execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:372


Which confirms this bug is now resolved,
A compute node originally running in single path mode, was switched over to using multipath mode. The instance running on it survived a reboot and didn't lose access to boot volume.

Comment 10 Tzach Shefi 2021-05-13 07:55:55 UTC

Adding extra bit of post verification info, 
I had also booted a second instB on compute-1 node after re-enabling volume_use_multipath=true. 

This instance as expected bootedup using mpath:
[root@compute-1 ~]# multipath -ll
360002ac0000000000000155300021f6b dm-0 3PARdata,VV
size=1.0G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=50 status=active
  |- 16:0:1:1 sdc 8:32  active ready running
  |- 16:0:0:1 sdd 8:48  active ready running
  |- 16:0:2:1 sde 8:64  active ready running
  |- 7:0:1:1  sdf 8:80  active ready running
  |- 7:0:0:1  sdg 8:96  active ready running
  `- 7:0:2:1  sdh 8:112 active ready running

We don't see DM device for instA, as it's original connection state wasn't yet refreshed 
thus it remains in single path mode. 

To refresh instA's connection info we must use: nova shelved/nshelved,
Doing so refres connection info and moves instA to MP state.

(overcloud) [stack@seal41 ~]$ nova shelve instA
+--------------------------------------+-------+-------------------+------------+-------------+-----------------------+
| ID                                   | Name  | Status            | Task State | Power State | Networks              |
+--------------------------------------+-------+-------------------+------------+-------------+-----------------------+
| bf18e05e-234a-4dd9-9c99-e7465311f712 | instA | SHELVED_OFFLOADED | spawning   | Shutdown    | internal=192.168.0.16 |
| 1ad69879-450b-4b85-aed5-f44d7519805e | instB | ACTIVE            | -          | Running     | internal=192.168.0.20 |
+--------------------------------------+-------+-------------------+------------+-------------+-----------------------+

(overcloud) [stack@seal41 ~]$ nova unshelve instA
(overcloud) [stack@seal41 ~]$ nova list
+--------------------------------------+-------+--------+------------+-------------+-----------------------+
| ID                                   | Name  | Status | Task State | Power State | Networks              |
+--------------------------------------+-------+--------+------------+-------------+-----------------------+
| bf18e05e-234a-4dd9-9c99-e7465311f712 | instA | ACTIVE | -          | Running     | internal=192.168.0.16 |
| 1ad69879-450b-4b85-aed5-f44d7519805e | instB | ACTIVE | -          | Running     | internal=192.168.0.20 |
+--------------------------------------+-------+--------+------------+-------------+-----------------------+

InstA is now also running in multipath, it just startedup/moved to compute-0 due shelve operation.

instA:
[heat-admin@compute-0 ~]$ sudo -i                                                                                                                                                                           
[root@compute-0 ~]# multipath -ll                                                                                                                                                                           
May 13 07:38:31 | /etc/multipath.conf line 13, duplicate keyword: skip_kpartx                                                                                                                               
360002ac0000000000000155200021f6b dm-0 3PARdata,VV                                                                                                                                                          
size=1.0G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=50 status=active
  |- 15:0:1:0 sdb 8:16 active ready running
  |- 15:0:2:0 sdc 8:32 active ready running
  |- 15:0:0:0 sdd 8:48 active ready running
  |- 16:0:0:0 sde 8:64 active ready running
  |- 16:0:2:0 sdf 8:80 active ready running
  `- 16:0:1:0 sdg 8:96 active ready running

instB, remains on compute-1
[root@compute-1 ~]# multipath -ll
360002ac0000000000000155300021f6b dm-0 3PARdata,VV
size=1.0G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=50 status=active
  |- 16:0:1:1 sdc 8:32  active ready running
  |- 16:0:0:1 sdd 8:48  active ready running
  |- 16:0:2:1 sde 8:64  active ready running
  |- 7:0:1:1  sdf 8:80  active ready running
  |- 7:0:0:1  sdg 8:96  active ready running
  `- 7:0:2:1  sdh 8:112 active ready running
[root@compute-1 ~]# 

And thus after switching compute-1 from SP to MP plus shelving/unshelving of InstA,
Both instances now use multipathing.

Comment 16 errata-xmlrpc 2021-05-26 13:52:17 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.6 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2097

Note You need to log in before you can comment on or make changes to this bug.