Bug 1316791 - Instance was deleted successfully without detaching its volume, if nova-compute was killed during running "nova delete"
Summary: Instance was deleted successfully without detaching its volume, if nova-compu...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-cinder
Version: 7.0 (Kilo)
Hardware: x86_64
OS: Linux
high
medium
Target Milestone: async
: 7.0 (Kilo)
Assignee: Eric Harney
QA Contact: lkuchlan
URL:
Whiteboard:
Depends On:
Blocks: 1362682 1363625
TreeView+ depends on / blocked
 
Reported: 2016-03-11 05:53 UTC by ykawada
Modified: 2022-07-09 08:39 UTC (History)
23 users (show)

Fixed In Version: openstack-cinder-2015.1.3-9.el7ost
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1344177 1362682 1363625 1395996 (view as bug list)
Environment:
Last Closed: 2017-02-15 22:53:55 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-16805 0 None None None 2022-07-09 08:39:50 UTC
Red Hat Product Errata RHSA-2017:0282 0 normal SHIPPED_LIVE Moderate: openstack-cinder, openstack-glance, and openstack-nova security update 2017-02-16 03:52:44 UTC

Comment 2 ykawada 2016-03-11 06:12:06 UTC
Created attachment 1135118 [details]
The script for workaround the issue

Comment 3 ykawada 2016-03-11 06:33:50 UTC
Additional information:

NEC who is reporter of this issue shared the investigation result for the workaround with us.

--------------------------------------------------------------
We have been thought calling "os-detach" API is workaround for this problem.
But after reinvestigating the source code, we found that the workaround is insufficient if cinder uses EMC as backend.

In _shutdown_instance(), nova calls terminate_connection() before calling detach().

    ------------------nova/compute/manager.py----------------
    def _shutdown_instance(self, context, instance,
        ......
        for bdm in vol_bdms:
            try:
                # NOTE(vish): actual driver detach done in driver.destroy, so
                #             just tell cinder that we are done with it.
                connector = self.driver.get_volume_connector(instance)
                self.volume_api.terminate_connection(context,
                                                     bdm.volume_id,
                                                     connector)
                self.volume_api.detach(context, bdm.volume_id)
    ---------------------------------------------------------

We have to call "os-terminate_connection" API before calling "os-detach", because EMC-backend executes some operations including the removing an export for a volume when terminate_connection() is called.

    ------------cinder/volume/drivers/emc/emc_vnx_cli.py-----
    def terminate_connection(self, volume, connector):
        """Disallow connection from connector.""" 
        @lockutils.synchronized('emc-connection-' + connector['host'],
                                "emc-connection-", True)
        def do_terminate_connection():
            hostname = connector['host']
            volume_name = volume['name']
            lun_id = self.get_lun_id(volume)
            lun_map = None
            conn_info = None
        ...
                if lun_id in lun_map:
                    self._client.remove_hlu_from_storagegroup(
                        lun_map[lun_id], hostname)
                    lun_map.pop(lun_id)
        ...
            if self.destroy_empty_sg and not lun_map:
                try:
                    LOG.info(_LI("Storage Group %s was empty."), hostname)
                    self._client.disconnect_host_from_storage_group(
                        hostname, hostname)
                    self._client.delete_storage_group(hostname)
                    if self.itor_auto_dereg:
                        self._deregister_initiators(connector)
    ---------------------------------------------------------

So we think we should call "os-terminate_connection" before calling os-detach.

But "os-terminate_connection" API requires the connecting information generated by nova-compute.

  2016-01-29 10:39:42,643.643 68373 DEBUG cinder.api.openstack.wsgi
  ...
  Action body: {"os-terminate_connection": {"connector":
  {"initiator": "iqn.1994-05.com.redhat:c1e01032345d",
  "wwnns": ["20000090fa8c0e10", "20000090fa8c0e11", "20000090fa818640", "20000090fa818641"],
  "ip": "172.20.238.37",
  "wwpns": ["10000090fa8c0e10", "10000090fa8c0e11", "10000090fa818640", "10000090fa818641"],
  "platform": "x86_64", "host": "texsl153", "os_type": "linux2"}}}
  get_method /usr/lib/python2.7/site-packages/cinder/api/openstack/wsgi.py:1096

According to our investigation, we can generate the same information on compute-node using "systool".

  # systool -c fc_host -v | grep -e node_name -e port_name -e Online
    node_name           = "0xAAAAAAAAAAAAAAAA" ... wwnns of path 1
    port_name           = "0xBBBBBBBBBBBBBBBB" ... wwpns of path 1
    port_state          = "Online" 
    node_name           = "0xCCCCCCCCCCCCCCCC" ... wwnns of path 2
    port_name           = "0xDDDDDDDDDDDDDDDD" ... wwpns of path 2
    port_state          = "Online" 

  Note: nova gets the connecting information using "systool" in get_fc_hbas_info().
        Please refer to nova/virt/libvirt/utils.py.

We made the shell script that calls "os-terminate_connection" and "os-detach" to detach an "in-use" volume.
(Before executing the script, the user needs to set environment variables by the OpenStack RC file such as "keystone-admin")

Usage: bash terminate_and_detach.sh <name of computenode which is wanted to detach the volume> <volume id>

How does Red Hat think our investigation and the script? Whether is it possible to workaround the problem or not?
--------------------------------------------------------------

Comment 5 Lee Yarwood 2016-03-22 16:25:44 UTC
(In reply to ykawada from comment #0)
> Description of problem:
> 
> When nova-compute stopped during deleting an instance, the instance would be
> deleted successfully, but its volume wasn't detached. They are unable to
> re-use the volume anymore because the volume is still recorded as "in-use"
> in DB.
> 
> NEC checked the log from customer's system, the following WARNING message
> was logged in nova-compute.log.
> 
>     ----------------nova-compute.log-----------------
>     2016-02-26 16:33:22,137.137 145980 WARNING nova.compute.manager
> [req-02b101d4-5f69-4b8e-986c-d7ba636a9816 - - - - -]
>     [instance: 0da5286a-5c70-4ac5-82a9-356208affed0] Ignoring
> EndpointNotFound: The service catalog is empty.
>     -------------------------------------------------

How could this be logged by nova-compute if the service is down? 

This error actually suggests that cinder-api was down when nova-compute attempted to call terminate_connection for a volume.

I'm reassigning this bug to the cinder team to see if they agree with the following answers :
    
>   1. Could Red Hat fix this issue in OSP7?

If cinder-api was down when nova-compute attempted to call terminate_connection then there really isn't anything to fix on the Nova side.

Should Cinder be checking for volumes marked as attached, with initialised connections that are no longer used by Nova and cleaning these up once it restarts? Or should this be a manual admin operation using os-force_detach?

>   2. Could you tell me what operation the customer can do in order to re-use the volume on they current system?

This is more of a question for the Cinder team but AFAIK it would be better if the customer could use a single call to os-force_detach.

Unfortunately the version provided in OSP 7 / Kilo does not call terminate_connection with a valid connector, as required by the customers backend to unmap the previous compute host from the backing LUN. The following change allowed for the connector to be provided to os-force_detach in Miktaka :

force_detach terminate_connection needs connector
https://review.openstack.org/#/c/213867/

> How reproducible:
> It is a timing problem, but it's able to frequently reproduce.
> 
> Steps to Reproduce:
> 1. Boot an instance from a bootable volume.
> 2. Delete the instance, meanwhile, kill the process of nova-compute.
> 
>    eg.)
>    # nova delete <VM>
> 
>    # kill -9 <nova-compute PID>
> 
>    (*) The customer suppose a probable emergency that nova-compute get down
> during deleting an instance.

As I've said above I don't think these are the actual steps used here.

> Additional info:
> 
> NEC found a similar problem with following steps.
> 
> 1. Boot an instance from bootable volume.
> 
> 2. Delete the instance, meanwhile, kill cinder-volume PID.
> 
>    The following WARNING message was output from nova-compute.log.
> 
>    --------------------nova-compute.log---------------
>    WARNING nova.compute.manager [req-...] [instance: ...] 
>    Ignoring Unknown cinder exception: The server has either erred or is
> incapable of performing the requested operation. (HTTP 500) (Request-ID: ..)
>    -------------------------------------------------

This is essentially the same as before, this time the cinder-api is up and responds with a HTTP 500 error code as the volume backend is down. Again I can't see anything for Nova to do at this point.

Comment 25 lkuchlan 2016-12-08 07:04:16 UTC
Tested using:
python-cinderclient-1.2.1-4.el7ost.noarch
openstack-cinder-2015.1.3-11.el7ost.noarch
python-cinder-2015.1.3-11.el7ost.noarch

Verification flow:

[root@dhcp70-45 ~(keystone_admin)]# cinder create 1 --image-id 34453f2c-3845-4963-91b3-e442589a4528
+---------------------+--------------------------------------+
|       Property      |                Value                 |
+---------------------+--------------------------------------+
|     attachments     |                  []                  |
|  availability_zone  |                 nova                 |
|       bootable      |                false                 |
|      created_at     |      2016-12-07T08:57:39.927578      |
| display_description |                 None                 |
|     display_name    |                 None                 |
|      encrypted      |                False                 |
|          id         | 19df71bf-0426-49e6-8a11-2d57edcf24a0 |
|       image_id      | 34453f2c-3845-4963-91b3-e442589a4528 |
|       metadata      |                  {}                  |
|     multiattach     |                false                 |
|         size        |                  1                   |
|     snapshot_id     |                 None                 |
|     source_volid    |                 None                 |                                                                                                                                                     
|        status       |               creating               |                                                                                                                                                     
|     volume_type     |                 None                 |                                                                                                                                                     
+---------------------+--------------------------------------+   
                                                                                                                                                  
[root@dhcp70-45 ~(keystone_admin)]# cinder list                                                                                                                                                                    
+--------------------------------------+-------------+--------------+------+-------------+----------+-------------+                                                                                                
|                  ID                  |    Status   | Display Name | Size | Volume Type | Bootable | Attached to |                                                                                                
+--------------------------------------+-------------+--------------+------+-------------+----------+-------------+                                                                                                
| 19df71bf-0426-49e6-8a11-2d57edcf24a0 | downloading |      -       |  1   |      -      |  false   |             |                                                                                                
+--------------------------------------+-------------+--------------+------+-------------+----------+-------------+ 
                                                                                               
[root@dhcp70-45 ~(keystone_admin)]# cinder list                                                                                                                                                                    
+--------------------------------------+-----------+--------------+------+-------------+----------+-------------+                                                                                                  
|                  ID                  |   Status  | Display Name | Size | Volume Type | Bootable | Attached to |                                                                                                  
+--------------------------------------+-----------+--------------+------+-------------+----------+-------------+                                                                                                  
| 19df71bf-0426-49e6-8a11-2d57edcf24a0 | available |      -       |  1   |      -      |   true   |             |                                                                                                  
+--------------------------------------+-----------+--------------+------+-------------+----------+-------------+     

[root@dhcp70-45 ~(keystone_admin)]# nova boot --flavor 1 --boot-volume 19df71bf-0426-49e6-8a11-2d57edcf24a0 --nic net-id=6c000e50-030b-4e46-ab39-625d2150e063 vm
+--------------------------------------+--------------------------------------------------+
| Property                             | Value                                            |
+--------------------------------------+--------------------------------------------------+
| OS-DCF:diskConfig                    | MANUAL                                           |
| OS-EXT-AZ:availability_zone          |                                                  |
| OS-EXT-SRV-ATTR:host                 | -                                                |
| OS-EXT-SRV-ATTR:hypervisor_hostname  | -                                                |
| OS-EXT-SRV-ATTR:instance_name        | instance-00000001                                |
| OS-EXT-STS:power_state               | 0                                                |
| OS-EXT-STS:task_state                | scheduling                                       |
| OS-EXT-STS:vm_state                  | building                                         |
| OS-SRV-USG:launched_at               | -                                                |
| OS-SRV-USG:terminated_at             | -                                                |
| accessIPv4                           |                                                  |
| accessIPv6                           |                                                  |
| adminPass                            | Z56TqCCyKrfX                                     |
| config_drive                         |                                                  |
| created                              | 2016-12-07T09:06:05Z                             |
| flavor                               | m1.tiny (1)                                      |
| hostId                               |                                                  |
| id                                   | aaa0683d-97af-4b92-abeb-5d05e45dea33             |
| image                                | Attempt to boot from volume - no image supplied  |
| key_name                             | -                                                |
| metadata                             | {}                                               |
| name                                 | vm                                               |
| os-extended-volumes:volumes_attached | [{"id": "19df71bf-0426-49e6-8a11-2d57edcf24a0"}] |
| progress                             | 0                                                |
| security_groups                      | default                                          |
| status                               | BUILD                                            |
| tenant_id                            | e6214874b0b94356a196e505ab0ebaf8                 |
| updated                              | 2016-12-07T09:06:14Z                             |
| user_id                              | 84a2a411b7b842eeb9ecdefc3eeb3e8f                 |
+--------------------------------------+--------------------------------------------------+

[root@dhcp70-45 ~(keystone_admin)]# nova list
+--------------------------------------+------+--------+------------+-------------+------------------+
| ID                                   | Name | Status | Task State | Power State | Networks         |
+--------------------------------------+------+--------+------------+-------------+------------------+
| aaa0683d-97af-4b92-abeb-5d05e45dea33 | vm   | ACTIVE | -          | Running     | private=10.0.0.3 |
+--------------------------------------+------+--------+------------+-------------+------------------+

[root@dhcp70-45 ~(keystone_admin)]# cinder list
+--------------------------------------+--------+--------------+------+-------------+----------+--------------------------------------+
|                  ID                  | Status | Display Name | Size | Volume Type | Bootable |             Attached to              |
+--------------------------------------+--------+--------------+------+-------------+----------+--------------------------------------+
| 19df71bf-0426-49e6-8a11-2d57edcf24a0 | in-use |      -       |  1   |      -      |   true   | aaa0683d-97af-4b92-abeb-5d05e45dea33 |
+--------------------------------------+--------+--------------+------+-------------+----------+--------------------------------------+

[root@dhcp70-45 ~(keystone_admin)]# openstack-service stop nova-compute
[root@dhcp70-45 ~(keystone_admin)]# openstack-service status nova-compute
MainPID=0 Id=openstack-nova-compute.service ActiveState=inactive

[root@dhcp70-45 ~(keystone_admin)]# nova delete aaa0683d-97af-4b92-abeb-5d05e45dea33
Request to delete server aaa0683d-97af-4b92-abeb-5d05e45dea33 has been accepted.
[root@dhcp70-45 ~(keystone_admin)]# nova list
+--------------------------------------+------+--------+------------+-------------+------------------+
| ID                                   | Name | Status | Task State | Power State | Networks         |
+--------------------------------------+------+--------+------------+-------------+------------------+
| aaa0683d-97af-4b92-abeb-5d05e45dea33 | vm   | ACTIVE | deleting   | Running     | private=10.0.0.3 |
+--------------------------------------+------+--------+------------+-------------+------------------+

[root@dhcp70-45 ~(keystone_admin)]# openstack-service start nova-compute
[root@dhcp70-45 ~(keystone_admin)]# openstack-service status nova-compute
MainPID=7027 Id=openstack-nova-compute.service ActiveState=active

[root@dhcp70-45 ~(keystone_admin)]# nova list
+----+------+--------+------------+-------------+----------+
| ID | Name | Status | Task State | Power State | Networks |
+----+------+--------+------------+-------------+----------+
+----+------+--------+------------+-------------+----------+

[root@dhcp70-45 ~(keystone_admin)]# cinder list
+--------------------------------------+--------+--------------+------+-------------+----------+--------------------------------------+
|                  ID                  | Status | Display Name | Size | Volume Type | Bootable |             Attached to              |
+--------------------------------------+--------+--------------+------+-------------+----------+--------------------------------------+
| 19df71bf-0426-49e6-8a11-2d57edcf24a0 | in-use |      -       |  1   |      -      |   true   | aaa0683d-97af-4b92-abeb-5d05e45dea33 |
+--------------------------------------+--------+--------------+------+-------------+----------+--------------------------------------+

[root@dhcp70-45 ~(keystone_admin)]# cinder --os-volume-api-version 2 reset-state --state available --attach-status detached 19df71bf-0426-49e6-8a11-2d57edcf24a0

[root@dhcp70-45 ~(keystone_admin)]# cinder list
+--------------------------------------+-----------+--------------+------+-------------+----------+-------------+
|                  ID                  |   Status  | Display Name | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+-----------+--------------+------+-------------+----------+-------------+
| 19df71bf-0426-49e6-8a11-2d57edcf24a0 | available |      -       |  1   |      -      |   true   |             |
+--------------------------------------+-----------+--------------+------+-------------+----------+-------------+

[root@dhcp70-45 ~(keystone_admin)]# nova boot --flavor 1 --boot-volume 19df71bf-0426-49e6-8a11-2d57edcf24a0 --nic net-id=6c000e50-030b-4e46-ab39-625d2150e063 vm
+--------------------------------------+--------------------------------------------------+
| Property                             | Value                                            |
+--------------------------------------+--------------------------------------------------+
| OS-DCF:diskConfig                    | MANUAL                                           |
| OS-EXT-AZ:availability_zone          |                                                  |
| OS-EXT-SRV-ATTR:host                 | -                                                |
| OS-EXT-SRV-ATTR:hypervisor_hostname  | -                                                |
| OS-EXT-SRV-ATTR:instance_name        | instance-00000002                                |
| OS-EXT-STS:power_state               | 0                                                |
| OS-EXT-STS:task_state                | scheduling                                       |
| OS-EXT-STS:vm_state                  | building                                         |
| OS-SRV-USG:launched_at               | -                                                |
| OS-SRV-USG:terminated_at             | -                                                |
| accessIPv4                           |                                                  |
| accessIPv6                           |                                                  |
| adminPass                            | z7RRh5UMQkxe                                     |
| config_drive                         |                                                  |
| created                              | 2016-12-07T09:21:16Z                             |
| flavor                               | m1.tiny (1)                                      |
| hostId                               |                                                  |
| id                                   | 909a9e8a-7453-42d4-81bb-cf9d6c629532             |
| image                                | Attempt to boot from volume - no image supplied  |
| key_name                             | -                                                |
| metadata                             | {}                                               |
| name                                 | vm                                               |
| os-extended-volumes:volumes_attached | [{"id": "19df71bf-0426-49e6-8a11-2d57edcf24a0"}] |
| progress                             | 0                                                |
| security_groups                      | default                                          |
| status                               | BUILD                                            |
| tenant_id                            | e6214874b0b94356a196e505ab0ebaf8                 |
| updated                              | 2016-12-07T09:21:25Z                             |
| user_id                              | 84a2a411b7b842eeb9ecdefc3eeb3e8f                 |
+--------------------------------------+--------------------------------------------------+

[root@dhcp70-45 ~(keystone_admin)]# nova list
+--------------------------------------+------+--------+------------+-------------+------------------+
| ID                                   | Name | Status | Task State | Power State | Networks         |
+--------------------------------------+------+--------+------------+-------------+------------------+
| 909a9e8a-7453-42d4-81bb-cf9d6c629532 | vm   | ACTIVE | -          | Running     | private=10.0.0.4 |
+--------------------------------------+------+--------+------------+-------------+------------------+

[root@dhcp70-45 ~(keystone_admin)]# cinder list
+--------------------------------------+--------+--------------+------+-------------+----------+--------------------------------------+
|                  ID                  | Status | Display Name | Size | Volume Type | Bootable |             Attached to              |
+--------------------------------------+--------+--------------+------+-------------+----------+--------------------------------------+
| 19df71bf-0426-49e6-8a11-2d57edcf24a0 | in-use |      -       |  1   |      -      |   true   | 909a9e8a-7453-42d4-81bb-cf9d6c629532 |
+--------------------------------------+--------+--------------+------+-------------+----------+--------------------------------------+

Comment 28 errata-xmlrpc 2017-02-15 22:53:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0282.html


Note You need to log in before you can comment on or make changes to this bug.