Bug 1665554

Summary: [OSP14] Compute-node gets stalled during OS shutdown when using iSCSI-backend volume
Product: Red Hat OpenStack Reporter: Alan Bishop <abishop>
Component: openstack-tripleo-heat-templatesAssignee: Alan Bishop <abishop>
Status: CLOSED ERRATA QA Contact: Gurenko Alex <agurenko>
Severity: high Docs Contact:
Priority: high    
Version: 14.0 (Rocky)CC: mariel, mburns, pgrist, rheslop, tshefi, tvignaud
Target Milestone: z1Keywords: Triaged, ZStream
Target Release: 14.0 (Rocky)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-9.2.1-0.20190119154856.fe11ade.el7ost Doc Type: Bug Fix
Doc Text:
Previously, iSCSI connections that were created by the containerized nova service were not visible to the host. This caused the connections to fail to shut down during the shutdown sequence causing hosts to hang. After this fix, iSCSI connection information is now available to the host and the connections can gracefully shut down.
Story Points: ---
Clone Of: 1655815 Environment:
Last Closed: 2019-03-18 13:03:28 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1655815    

Description Alan Bishop 2019-01-11 18:33:19 UTC
+++ This bug was initially created as a clone of Bug #1655815 +++

Description of problem:

iSCSI connections created by nova may cause a Compute node to hang on system shutdown. Nova leaves an instance's iSCSI volume connection open, even after the instance has been stopped. However, nova's iSCSI connections are not "visible" to the Compute host. Later, when the Compute host attempts to shutdown, the lingering nova iSCSI connection will cause the host's shutdown procedure to hang.

Steps to Reproduce:

 1. Deploy RHOSP with iSCSI-backend cinder volume
 2. Create an instance on Compute Node and attach a volume
 3. Stop (but don't delete) the instance on Compute Node 
 4. Shutdown Compute Node

Actual results:

 Compute-node gets stalled during OS shutdown when using iSCSI-backend volume

Expected results:

 Compute-node doesn't get stalled during OS shutdown when using iSCSI-backend volume

Comment 1 Alan Bishop 2019-01-15 21:46:43 UTC
The patch has merged on stable/rocky.

Comment 4 Tzach Shefi 2019-02-24 11:42:52 UTC
Verified on:
openstack-tripleo-heat-templates-9.2.1-0.20190119154856.fe11ade.el7ost.noarch

Booted an instance, created+attached two cinder (lvm) volumes to it

#cinder list
+--------------------------------------+--------+--------------+------+-------------+----------+--------------------------------------+
| ID                                   | Status | Name         | Size | Volume Type | Bootable | Attached to                          |
+--------------------------------------+--------+--------------+------+-------------+----------+--------------------------------------+
| 14b52d3a-c11a-465b-aad0-ea3634412c30 | in-use | Pansible_vol | 1    | tripleo     | true     | f6c6635e-a89b-407b-b99d-46093d7cebce |
| d2769423-beaa-4f9b-81ee-c8979f65ff35 | in-use | RO           | 1    | tripleo     | false    | f6c6635e-a89b-407b-b99d-46093d7cebce |
+--------------------------------------+--------+--------------+------+-------------+----------+--------------------------------------+

#nova stop f6c6635e-a89b-407b-b99d-46093d7cebce
Request to stop server f6c6635e-a89b-407b-b99d-46093d7cebce has been accepted.

#nova list
+--------------------------------------+-------+---------+------------+-------------+-----------------------------------+
| ID                                   | Name  | Status  | Task State | Power State | Networks                          |
+--------------------------------------+-------+---------+------------+-------------+-----------------------------------+
| f6c6635e-a89b-407b-b99d-46093d7cebce | inst1 | SHUTOFF | -          | Shutdown    | internal=192.168.0.21, 10.0.0.235 |
+--------------------------------------+-------+---------+------------+-------------+-----------------------------------+


ssh into compute, issue shutdown command
[root@compute-0 ~]# shutdown 

Using virsh console (virt OSPD deployment) monitor shutdown progress.
The compute node had shutdown within a few seconds as excepted, no stalling observed.
Looks good to verify. 

BTW suspect i'd previously hit this same issue on overcloud delete cases
where deletion of the whole overcloud would fail, due to stalled compute nodes deletion does shutdown, suspect root cause same as this bz.  

I've retested on a second deployment with Cinder volume backed by iscsi 3par storage. 
Again shutdown of compute node wasn't stalled.

Comment 6 errata-xmlrpc 2019-03-18 13:03:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0446

Comment 7 Red Hat Bugzilla 2023-09-14 04:44:53 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days