1665554 – [OSP14] Compute-node gets stalled during OS shutdown when using iSCSI-backend volume

Bug 1665554 - [OSP14] Compute-node gets stalled during OS shutdown when using iSCSI-backend volume

Summary: [OSP14] Compute-node gets stalled during OS shutdown when using iSCSI-backend...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-tripleo-heat-templates
Sub Component:
Version:	14.0 (Rocky)
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	z1
Target Release:	14.0 (Rocky)
Assignee:	Alan Bishop
QA Contact:	Gurenko Alex
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1655815
TreeView+	depends on / blocked

Reported:	2019-01-11 18:33 UTC by Alan Bishop
Modified:	2023-09-14 04:44 UTC (History)
CC List:	6 users (show)
Fixed In Version:	openstack-tripleo-heat-templates-9.2.1-0.20190119154856.fe11ade.el7ost
Doc Type:	Bug Fix
Doc Text:	Previously, iSCSI connections that were created by the containerized nova service were not visible to the host. This caused the connections to fail to shut down during the shutdown sequence causing hosts to hang. After this fix, iSCSI connection information is now available to the host and the connections can gracefully shut down.
Clone Of:	1655815
Environment:
Last Closed:	2019-03-18 13:03:28 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Launchpad	1810338	None	None	None	2019-01-11 18:37:38 UTC
OpenStack gerrit	630321	None	None	None	2019-01-15 21:46:43 UTC
Red Hat Product Errata	RHBA-2019:0446	None	None	None	2019-03-18 13:03:37 UTC

Description Alan Bishop 2019-01-11 18:33:19 UTC

+++ This bug was initially created as a clone of Bug #1655815 +++

Description of problem:

iSCSI connections created by nova may cause a Compute node to hang on system shutdown. Nova leaves an instance's iSCSI volume connection open, even after the instance has been stopped. However, nova's iSCSI connections are not "visible" to the Compute host. Later, when the Compute host attempts to shutdown, the lingering nova iSCSI connection will cause the host's shutdown procedure to hang.

Steps to Reproduce:

 1. Deploy RHOSP with iSCSI-backend cinder volume
 2. Create an instance on Compute Node and attach a volume
 3. Stop (but don't delete) the instance on Compute Node 
 4. Shutdown Compute Node

Actual results:

 Compute-node gets stalled during OS shutdown when using iSCSI-backend volume

Expected results:

 Compute-node doesn't get stalled during OS shutdown when using iSCSI-backend volume

Comment 1 Alan Bishop 2019-01-15 21:46:43 UTC

The patch has merged on stable/rocky.

Comment 4 Tzach Shefi 2019-02-24 11:42:52 UTC

Verified on:
openstack-tripleo-heat-templates-9.2.1-0.20190119154856.fe11ade.el7ost.noarch

Booted an instance, created+attached two cinder (lvm) volumes to it

#cinder list
+--------------------------------------+--------+--------------+------+-------------+----------+--------------------------------------+
| ID                                   | Status | Name         | Size | Volume Type | Bootable | Attached to                          |
+--------------------------------------+--------+--------------+------+-------------+----------+--------------------------------------+
| 14b52d3a-c11a-465b-aad0-ea3634412c30 | in-use | Pansible_vol | 1    | tripleo     | true     | f6c6635e-a89b-407b-b99d-46093d7cebce |
| d2769423-beaa-4f9b-81ee-c8979f65ff35 | in-use | RO           | 1    | tripleo     | false    | f6c6635e-a89b-407b-b99d-46093d7cebce |
+--------------------------------------+--------+--------------+------+-------------+----------+--------------------------------------+

#nova stop f6c6635e-a89b-407b-b99d-46093d7cebce
Request to stop server f6c6635e-a89b-407b-b99d-46093d7cebce has been accepted.

#nova list
+--------------------------------------+-------+---------+------------+-------------+-----------------------------------+
| ID                                   | Name  | Status  | Task State | Power State | Networks                          |
+--------------------------------------+-------+---------+------------+-------------+-----------------------------------+
| f6c6635e-a89b-407b-b99d-46093d7cebce | inst1 | SHUTOFF | -          | Shutdown    | internal=192.168.0.21, 10.0.0.235 |
+--------------------------------------+-------+---------+------------+-------------+-----------------------------------+


ssh into compute, issue shutdown command
[root@compute-0 ~]# shutdown 

Using virsh console (virt OSPD deployment) monitor shutdown progress.
The compute node had shutdown within a few seconds as excepted, no stalling observed.
Looks good to verify. 

BTW suspect i'd previously hit this same issue on overcloud delete cases
where deletion of the whole overcloud would fail, due to stalled compute nodes deletion does shutdown, suspect root cause same as this bz.  

I've retested on a second deployment with Cinder volume backed by iscsi 3par storage. 
Again shutdown of compute node wasn't stalled.

Comment 6 errata-xmlrpc 2019-03-18 13:03:28 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0446

Comment 7 Red Hat Bugzilla 2023-09-14 04:44:53 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days

Note You need to log in before you can comment on or make changes to this bug.