+++ This bug was initially created as a clone of Bug #1655815 +++ Description of problem: iSCSI connections created by nova may cause a Compute node to hang on system shutdown. Nova leaves an instance's iSCSI volume connection open, even after the instance has been stopped. However, nova's iSCSI connections are not "visible" to the Compute host. Later, when the Compute host attempts to shutdown, the lingering nova iSCSI connection will cause the host's shutdown procedure to hang. Steps to Reproduce: 1. Deploy RHOSP with iSCSI-backend cinder volume 2. Create an instance on Compute Node and attach a volume 3. Stop (but don't delete) the instance on Compute Node 4. Shutdown Compute Node Actual results: Compute-node gets stalled during OS shutdown when using iSCSI-backend volume Expected results: Compute-node doesn't get stalled during OS shutdown when using iSCSI-backend volume
The patch has merged on stable/rocky.
Verified on: openstack-tripleo-heat-templates-9.2.1-0.20190119154856.fe11ade.el7ost.noarch Booted an instance, created+attached two cinder (lvm) volumes to it #cinder list +--------------------------------------+--------+--------------+------+-------------+----------+--------------------------------------+ | ID | Status | Name | Size | Volume Type | Bootable | Attached to | +--------------------------------------+--------+--------------+------+-------------+----------+--------------------------------------+ | 14b52d3a-c11a-465b-aad0-ea3634412c30 | in-use | Pansible_vol | 1 | tripleo | true | f6c6635e-a89b-407b-b99d-46093d7cebce | | d2769423-beaa-4f9b-81ee-c8979f65ff35 | in-use | RO | 1 | tripleo | false | f6c6635e-a89b-407b-b99d-46093d7cebce | +--------------------------------------+--------+--------------+------+-------------+----------+--------------------------------------+ #nova stop f6c6635e-a89b-407b-b99d-46093d7cebce Request to stop server f6c6635e-a89b-407b-b99d-46093d7cebce has been accepted. #nova list +--------------------------------------+-------+---------+------------+-------------+-----------------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+-------+---------+------------+-------------+-----------------------------------+ | f6c6635e-a89b-407b-b99d-46093d7cebce | inst1 | SHUTOFF | - | Shutdown | internal=192.168.0.21, 10.0.0.235 | +--------------------------------------+-------+---------+------------+-------------+-----------------------------------+ ssh into compute, issue shutdown command [root@compute-0 ~]# shutdown Using virsh console (virt OSPD deployment) monitor shutdown progress. The compute node had shutdown within a few seconds as excepted, no stalling observed. Looks good to verify. BTW suspect i'd previously hit this same issue on overcloud delete cases where deletion of the whole overcloud would fail, due to stalled compute nodes deletion does shutdown, suspect root cause same as this bz. I've retested on a second deployment with Cinder volume backed by iscsi 3par storage. Again shutdown of compute node wasn't stalled.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0446
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days