Bug 1304683

Summary: Cannot delete volume after cinder-volume has moved to another pcmk controller node
Product: Red Hat OpenStack Reporter: Christian Horn <chorn>
Component: openstack-tripleo-heat-templatesAssignee: Giulio Fidente <gfidente>
Status: CLOSED ERRATA QA Contact: Udi Shkalim <ushkalim>
Severity: high Docs Contact:
Priority: high    
Version: 7.0 (Kilo)CC: athomas, chorn, dbecker, dmacpher, dmesser, eharney, fdinitto, gfidente, jslagle, mburns, morazi, mtessun, nlevinki, ralf.boernemeier, rhel-osp-director-maint, sgotliv, yeylon
Target Milestone: y3Keywords: ZStream
Target Release: 7.0 (Kilo)   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-0.8.6-119.el7ost Doc Type: Bug Fix
Doc Text:
In an Overcloud with HA Controller nodes, the 'cinder-volume' service might move to a new node. This causes problems modifying and deleting volumes due to a different hostname for the volume service. This fix sets a consistent hostname for the 'cinder-volume' service on all Controller nodes. Users can now modify and delete volumes on a HA Overcloud without issue.
Story Points: ---
Clone Of: 1303843 Environment:
Last Closed: 2016-02-18 16:52:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1303843    
Bug Blocks: 1290377    

Comment 1 Sergey Gotliv 2016-02-05 09:29:36 UTC
See my comment on the original bug, https://bugzilla.redhat.com/show_bug.cgi?id=1303843#c7

Comment 2 Angus Thomas 2016-02-07 15:53:52 UTC
Hi Fabio,

What's the correct path to address the issue that Sergey is referring to in the bug linked from his comment above?

Thanks

Comment 3 Fabio Massimo Di Nitto 2016-02-07 17:25:37 UTC
(In reply to Angus Thomas from comment #2)
> Hi Fabio,
> 
> What's the correct path to address the issue that Sergey is referring to in
> the bug linked from his comment above?
> 
> Thanks

OSPd needs to set:

[DEFAULT]
 
host=$somevalue

in cinder.conf and it has to be same on all controller nodes.

that some value could be anything really, my suggestion would be to keep simple:

overcloud-$overcloudname-cinder-host

Comment 4 Fabio Massimo Di Nitto 2016-02-07 17:37:43 UTC
(In reply to Fabio Massimo Di Nitto from comment #3)
> (In reply to Angus Thomas from comment #2)
> > Hi Fabio,
> > 
> > What's the correct path to address the issue that Sergey is referring to in
> > the bug linked from his comment above?
> > 
> > Thanks
> 
> OSPd needs to set:
> 
> [DEFAULT]
>  
> host=$somevalue
> 
> in cinder.conf and it has to be same on all controller nodes.
> 
> that some value could be anything really, my suggestion would be to keep
> simple:
> 
> overcloud-$overcloudname-cinder-host

Correcting my reply. This above is ok for new installs.

For updates and upgrades, this is going to be very complex depending on how/where volumes have been created before the process and current setup.

On updates/upgrades, if host is set and it´s the same on all controllers, then we should be ok.

but if it´s not the same, then we can´t just sanitize it otherwise current volumes will be unavailable.

I think the best would be to have Sergey´s team involved to define a proper action plan.

Comment 5 Angus Thomas 2016-02-08 15:12:59 UTC
Hi Giulio,

Can you work up a patch for the simpler new-install case, to be backported to 7.3?

Thanks,

Angus

Comment 7 Giulio Fidente 2016-02-08 18:13:37 UTC
The Cinder config option was also renamed from 'host' into 'backend_host' in Kilo. Other backends in addition to NFS could be affected by the same problem, including dellsc, eqlx and netapp.

Comment 8 Ralf Boernemeier 2016-02-09 09:55:02 UTC
(In reply to Giulio Fidente from comment #7)
> The Cinder config option was also renamed from 'host' into 'backend_host' in
> Kilo. Other backends in addition to NFS could be affected by the same
> problem, including dellsc, eqlx and netapp.

Hi Giulio,

can you please tell me the option name in regard to the different OSP releases?

Kilo (OSP 7.x): backend_host = <unique value>
Liberty (OSP 8.x) host = <unique value> or backend_host = <unique value>

Background of this question:

I have deployed OSP 8 Beta 4 with a Controller Pre-Configuration yaml script which configures the following line in "/etc/cinder/cinder.conf" on all 3 Controller nodes:

[DEFAULT]
backend_host = osp8br2-controller.localdomain

The value "backend_host = osp8br2-controller.localdomain" seems to be ignored --> check cinder service-list:

[stack@osp8bdr2 ~(UC)]$ cinder service-list
+------------------+------------------------------------------------+------+---------+-------+----------------------------+-----------------+
|      Binary      |                      Host                      | Zone |  Status | State |         Updated_at         | Disabled Reason |
+------------------+------------------------------------------------+------+---------+-------+----------------------------+-----------------+
| cinder-scheduler |        osp8br2-controller-0.localdomain        | nova | enabled |   up  | 2016-02-09T08:01:22.000000 |        -        |
| cinder-scheduler |        osp8br2-controller-1.localdomain        | nova | enabled |   up  | 2016-02-09T08:01:23.000000 |        -        |
| cinder-scheduler |        osp8br2-controller-2.localdomain        | nova | enabled |   up  | 2016-02-09T08:01:22.000000 |        -        |
|  cinder-volume   | osp8br2-controller-0.localdomain@tripleo_iscsi | nova | enabled |   up  | 2016-02-09T08:01:23.000000 |        -        |
|  cinder-volume   |  osp8br2-controller-0.localdomain@tripleo_nfs  | nova | enabled |   up  | 2016-02-09T08:01:22.000000 |        -        |
+------------------+------------------------------------------------+------+---------+-------+----------------------------+-----------------+

Thanks for your help!

Regards,

Ralf

Comment 9 Giulio Fidente 2016-02-09 10:33:28 UTC
hi Ralf, it's only the per-backend setting which got renamed from host to backend_host. In the DEFAULT stanza host is still the right key to use.

In the change at [1] I am trying to set this globally, using host, as one can always override the backend_host via hiera for a particular backend. Do you think that could work?

Comment 10 Ralf Boernemeier 2016-02-09 12:42:47 UTC
Hi Giulio,

Thanks for the clarification. I'm not a developer, so I can't really answer your question ;-) Anyway, from what I understood, it should be OK to have "host" set in the [DEFAULT] stanza of cinder.conf and have the possibility to override this for a specific backend using extradata. Sounds reasonable for me.

Regards,

Ralf

Comment 11 Mike Orazi 2016-02-10 16:30:42 UTC
Fabio,

I think we understand the initial setup case.  How do we cover the failover case?

Comment 12 Fabio Massimo Di Nitto 2016-02-10 16:54:09 UTC
(In reply to Mike Orazi from comment #11)
> Fabio,
> 
> I think we understand the initial setup case.  How do we cover the failover
> case?

As long as host, backend_host are the same across all the nodes, then there is nothing else you need to do.

ctrl1:

host=overcloud-foo
backend_host=overcloud-foo

ctrl2:

host=overcloud-foo
backend_host=overcloud-foo

... etc ...

then you are all set. cinder-volume will use those values consistently independently on which node it is started/migrated/running.

Comment 13 Giulio Fidente 2016-02-10 17:09:47 UTC
Fabio, the proposed patch drops use of backend_host and always sets:

  host=hostgroup

on all controllers, which I think fixes the 'new deployment' scenario for all backend drivers. Eric, can you confirm this?

backend_host is left unset, deployments which were using it will continue to see it set to what it was on update because puppet won't clean it

it will still be possible to override on update any backend_host previously set by providing a new value via hieradata (ExtraConfig)

does it looks like a viable plan?

Comment 14 Fabio Massimo Di Nitto 2016-02-10 17:26:05 UTC
(In reply to Giulio Fidente from comment #13)
> Fabio, the proposed patch drops use of backend_host and always sets:
> 
>   host=hostgroup
> 
> on all controllers, which I think fixes the 'new deployment' scenario for
> all backend drivers. Eric, can you confirm this?
> 
> backend_host is left unset, deployments which were using it will continue to
> see it set to what it was on update because puppet won't clean it
> 
> it will still be possible to override on update any backend_host previously
> set by providing a new value via hieradata (ExtraConfig)
> 
> does it looks like a viable plan?

I think so, but I am really not an expert in cinder stuff. So whatever the cinder guys ACK is good for me.

Comment 15 James Slagle 2016-02-10 19:41:45 UTC
we still need confirmation from cinder folks what the right fix is here for the update scenario

Comment 16 James Slagle 2016-02-10 19:45:41 UTC
The assumption we need confirmation of is:

For new installs we will set:
[DEFAULT]
host=overcloud

and will not set any backend_host values for any configured backends, this will fix the problem for new installs.

For updates, where the host/backend_host are already configured differently and volumes have been created (or not, we don't really know), we will still set:

[DEFAULT]
host=overcloud

but not unset/change any configured backend_host values. The assumption is that this means existing volumes will still continue to function, unless cinder-volume needs to fail over for some reason.

Comment 17 James Slagle 2016-02-10 21:23:30 UTC
after discussing with giulio, we're going to go with the patches as-is, they should be update safe for everything but LVM, which has no failover anyway since the volumes aren't replicated.

giulio also confirmed that backend_host will always override [DEFAULT]/host:

https://github.com/openstack/cinder/blob/c9eef31820dc385a2c9f4ba24dd1d194f9e7d088/cinder/cmd/all.py#L89

Comment 19 Udi Shkalim 2016-02-14 16:03:27 UTC
I have recreated the steps in original steps to reproduce and hit the same error.
Using openstack-tripleo-heat-templates-0.8.6-119.el7ost

Am i missing any steps to reproduce here?

Thanks,
Udi

Comment 22 Udi Shkalim 2016-02-16 11:12:33 UTC
Verified on : openstack-tripleo-heat-templates-0.8.6-120.el7ost.noarch

Using the original steps to reproduce:


[stack@undercloud ~]$ cinder list
+--------------------------------------+-----------+--------------+------+-------------+----------+-------------+
|                  ID                  |   Status  | Display Name | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+-----------+--------------+------+-------------+----------+-------------+
| fd603d0e-e45b-4155-b14a-bf08909b3aea | available |      -       |  1   |      -      |  false   |             |
+--------------------------------------+-----------+--------------+------+-------------+----------+-------------+
[stack@undercloud ~]$ cinder delete fd603d0e-e45b-4155-b14a-bf08909b3aea
[stack@undercloud ~]$ cinder list
+----+--------+--------------+------+-------------+----------+-------------+
| ID | Status | Display Name | Size | Volume Type | Bootable | Attached to |
+----+--------+--------------+------+-------------+----------+-------------+
+----+--------+--------------+------+-------------+----------+-------------+
[stack@undercloud ~]$ cinder create --display-name vol1 1
+---------------------+--------------------------------------+
|       Property      |                Value                 |
+---------------------+--------------------------------------+
|     attachments     |                  []                  |
|  availability_zone  |                 nova                 |
|       bootable      |                false                 |
|      created_at     |      2016-02-16T11:04:52.827961      |
| display_description |                 None                 |
|     display_name    |                 vol1                 |
|      encrypted      |                False                 |
|          id         | e3dc4b2a-e68c-44ac-84a5-1c3206f976df |
|       metadata      |                  {}                  |
|     multiattach     |                false                 |
|         size        |                  1                   |
|     snapshot_id     |                 None                 |
|     source_volid    |                 None                 |
|        status       |               creating               |
|     volume_type     |                 None                 |
+---------------------+--------------------------------------+
[stack@undercloud ~]$ cinder list
+--------------------------------------+-----------+--------------+------+-------------+----------+-------------+
|                  ID                  |   Status  | Display Name | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+-----------+--------------+------+-------------+----------+-------------+
| e3dc4b2a-e68c-44ac-84a5-1c3206f976df | available |     vol1     |  1   |      -      |  false   |             |
+--------------------------------------+-----------+--------------+------+-------------+----------+-------------+
[stack@undercloud ~]$ cinder show vol1
+---------------------------------------+--------------------------------------+
|                Property               |                Value                 |
+---------------------------------------+--------------------------------------+
|              attachments              |                  []                  |
|           availability_zone           |                 nova                 |
|                bootable               |                false                 |
|               created_at              |      2016-02-16T11:04:52.000000      |
|          display_description          |                 None                 |
|              display_name             |                 vol1                 |
|               encrypted               |                False                 |
|                   id                  | e3dc4b2a-e68c-44ac-84a5-1c3206f976df |
|                metadata               |                  {}                  |
|              multiattach              |                false                 |
|         os-vol-host-attr:host         |  hostgroup@tripleo_nfs#tripleo_nfs   |
|     os-vol-mig-status-attr:migstat    |                 None                 |
|     os-vol-mig-status-attr:name_id    |                 None                 |
|      os-vol-tenant-attr:tenant_id     |   ade04b2ae4f643ab8537074700757e8e   |
|   os-volume-replication:driver_data   |                 None                 |
| os-volume-replication:extended_status |                 None                 |
|                  size                 |                  1                   |
|              snapshot_id              |                 None                 |
|              source_volid             |                 None                 |
|                 status                |              available               |
|              volume_type              |                 None                 |
+---------------------------------------+--------------------------------------+


[root@overcloud-controller-0 ~]# crm_resource --resource openstack-cinder-volume --locate
resource openstack-cinder-volume is running on: overcloud-controller-0 
[root@overcloud-controller-0 ~]# crm_resource --resource openstack-cinder-volume --move
WARNING: Creating rsc_location constraint 'cli-ban-openstack-cinder-volume-on-overcloud-controller-0' with a score of -INFINITY for resource openstack-cinder-volume on overcloud-controller-0.
	This will prevent openstack-cinder-volume from running on overcloud-controller-0 until the constraint is removed using the 'crm_resource --clear' command or manually with cibadmin
	This will be the case even if overcloud-controller-0 is the last node in the cluster
	This message can be disabled with --quiet

[root@overcloud-controller-0 ~]# crm_resource --resource openstack-cinder-volume --locate
resource openstack-cinder-volume is NOT running
[root@overcloud-controller-0 ~]# crm_resource --resource openstack-cinder-volume --locate
resource openstack-cinder-volume is running on: overcloud-controller-1 

[stack@undercloud ~]$ . overcloudrc 
[stack@undercloud ~]$ cinder service-list
+------------------+-----------------------+------+---------+-------+----------------------------+-----------------+
|      Binary      |          Host         | Zone |  Status | State |         Updated_at         | Disabled Reason |
+------------------+-----------------------+------+---------+-------+----------------------------+-----------------+
| cinder-scheduler |       hostgroup       | nova | enabled |   up  | 2016-02-16T11:07:31.000000 |        -        |
|  cinder-volume   | hostgroup@tripleo_nfs | nova | enabled |   up  | 2016-02-16T11:07:31.000000 |        -        |
+------------------+-----------------------+------+---------+-------+----------------------------+-----------------+
[stack@undercloud ~]$ cinder show vol1
+---------------------------------------+--------------------------------------+
|                Property               |                Value                 |
+---------------------------------------+--------------------------------------+
|              attachments              |                  []                  |
|           availability_zone           |                 nova                 |
|                bootable               |                false                 |
|               created_at              |      2016-02-16T11:04:52.000000      |
|          display_description          |                 None                 |
|              display_name             |                 vol1                 |
|               encrypted               |                False                 |
|                   id                  | e3dc4b2a-e68c-44ac-84a5-1c3206f976df |
|                metadata               |                  {}                  |
|              multiattach              |                false                 |
|         os-vol-host-attr:host         |  hostgroup@tripleo_nfs#tripleo_nfs   |
|     os-vol-mig-status-attr:migstat    |                 None                 |
|     os-vol-mig-status-attr:name_id    |                 None                 |
|      os-vol-tenant-attr:tenant_id     |   ade04b2ae4f643ab8537074700757e8e   |
|   os-volume-replication:driver_data   |                 None                 |
| os-volume-replication:extended_status |                 None                 |
|                  size                 |                  1                   |
|              snapshot_id              |                 None                 |
|              source_volid             |                 None                 |
|                 status                |              available               |
|              volume_type              |                 None                 |
+---------------------------------------+--------------------------------------+
[stack@undercloud ~]$ cinder list
+--------------------------------------+-----------+--------------+------+-------------+----------+-------------+
|                  ID                  |   Status  | Display Name | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+-----------+--------------+------+-------------+----------+-------------+
| e3dc4b2a-e68c-44ac-84a5-1c3206f976df | available |     vol1     |  1   |      -      |  false   |             |
+--------------------------------------+-----------+--------------+------+-------------+----------+-------------+
[stack@undercloud ~]$ cinder delete vol1
[stack@undercloud ~]$ cinder list
+----+--------+--------------+------+-------------+----------+-------------+
| ID | Status | Display Name | Size | Volume Type | Bootable | Attached to |
+----+--------+--------------+------+-------------+----------+-------------+
+----+--------+--------------+------+-------------+----------+-------------+

Comment 24 errata-xmlrpc 2016-02-18 16:52:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0264.html