This bug has been migrated to another issue tracking site. It has been closed here and may no longer be being monitored.

If you would like to get updates for this issue, or to participate in it, you may do so at Red Hat Issue Tracker .
Bug 2279757 - Server create request leaves volume in reserved state when server group quota is exceeded
Summary: Server create request leaves volume in reserved state when server group quota...
Keywords:
Status: CLOSED MIGRATED
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 16.2 (Train)
Hardware: x86_64
OS: Unspecified
low
low
Target Milestone: ---
: ---
Assignee: OSP DFG:Compute
QA Contact: OSP DFG:Compute
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2024-05-08 15:20 UTC by Eric Nothen
Modified: 2025-01-17 15:12 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2025-01-17 15:12:28 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker   OSP-32069 0 None None None 2025-01-17 15:12:27 UTC
Red Hat Issue Tracker OSP-33478 0 None None None 2025-01-17 15:12:57 UTC

Description Eric Nothen 2024-05-08 15:20:51 UTC
Description of problem:

At a customer's shift-on-stack production environment, an attempt to scale-out OCP worker nodes failed, leaving cinder root volumes in "reserved" state. Because policy prevents users with "member" role from deleting or resetting the volume state, the volume is left unusable until an administrator can fix them.

See reproduction steps without OCP being a factor.


Version-Release number of selected component (if applicable):
RHOSP 16.2, 17.1

How reproducible:
Always reproducible

Steps to Reproduce:

All of the following steps are executed as a user with "member" role on the project.

1. Create a server group:
~~~
(projectmember) (overcloud) [stack.lab ~]$ openstack server group create --policy soft-anti-affinity servers-anti-1 --os-compute-api-version 2.15
+------------+--------------------------------------+
| Field      | Value                                |
+------------+--------------------------------------+
| id         | 4af0bb95-6eb0-47d7-a6fa-94992730848d |
| members    |                                      |
| name       | servers-anti-1                       |
| policies   | soft-anti-affinity                   |
| project_id | 0ab3f39165d54d85b9cecc0e5347617a     |
| user_id    | 721a956e24de48b9a92a7d8ad74b3ec4     |
+------------+--------------------------------------+
~~~

2. Server groups have a default 10 members:
~~~
(projectmember) (overcloud) [stack.lab ~]$ openstack quota show -c server-group-members
+----------------------+-------+
| Field                | Value |
+----------------------+-------+
| server-group-members | 10    |
+----------------------+-------+
~~~

3. Create enough root volumes:
~~~
(projectmember) (overcloud) [stack.lab ~]$ for x in {1..11} ;do
> openstack volume create --image cirros-0.5.2 --size 1 --bootable vol-${x}
> done
...
(projectmember) (overcloud) [stack.lab ~]$ openstack volume list
+--------------------------------------+--------+-----------+------+-------------+
| ID                                   | Name   | Status    | Size | Attached to |
+--------------------------------------+--------+-----------+------+-------------+
| b48decf3-ca54-4021-892f-1f352396befa | vol-11 | available |    1 |             |
| f73e40f0-5884-4e94-8ada-ff91ca2fe4ec | vol-10 | available |    1 |             |
| 43072eba-d079-4cd4-bd15-f26f47cde067 | vol-9  | available |    1 |             |
| 3d0c8514-ed33-4ff6-abb1-029a83673755 | vol-8  | available |    1 |             |
| e83a97bc-59a1-4f27-90f0-40af0c85a528 | vol-7  | available |    1 |             |
| 0ba0c19a-26ce-46ba-9713-7b3d21b943b1 | vol-6  | available |    1 |             |
| d56813b3-35f5-4e40-b733-415f75c93b39 | vol-5  | available |    1 |             |
| 227398c0-2aba-4cea-bf07-33d1fff979f1 | vol-4  | available |    1 |             |
| c89c858e-0c96-4540-baa2-ac361e19fcc0 | vol-3  | available |    1 |             |
| 4612197a-9f9a-4155-9b8e-b005921dae08 | vol-2  | available |    1 |             |
| e2d2dde7-53c0-45d0-9394-bbb1cff7fb91 | vol-1  | available |    1 |             |
+--------------------------------------+--------+-----------+------+-------------+
~~~

4. Request creation of 11 servers using existing root volume:
~~~
(projectmember) (overcloud) [stack.lab ~]$ for x in 11 ;do openstack server create --volume vol-${x} --flavor 1c1g --hint group=4af0bb95-6eb0-47d7-a6fa-94992730848d testvm-${x} -c id; done
Quota exceeded, too many servers in group (HTTP 403) (Request-ID: req-a7bd8ba7-0fe2-42e6-8b61-c04770baa10b)
(projectmember) (overcloud) [stack.lab ~]$ 
~~~

Actual results:

1. The 11th server leaves vol-11 in reserved state:
~~~
(projectmember) (overcloud) [stack.lab ~]$ openstack volume list
+--------------------------------------+--------+----------+------+------------------------------------+
| ID                                   | Name   | Status   | Size | Attached to                        |
+--------------------------------------+--------+----------+------+------------------------------------+
| b48decf3-ca54-4021-892f-1f352396befa | vol-11 | reserved |    1 |                                    |
| f73e40f0-5884-4e94-8ada-ff91ca2fe4ec | vol-10 | in-use   |    1 | Attached to testvm-10 on /dev/vda  |
| 43072eba-d079-4cd4-bd15-f26f47cde067 | vol-9  | in-use   |    1 | Attached to testvm-9 on /dev/vda   |
| 3d0c8514-ed33-4ff6-abb1-029a83673755 | vol-8  | in-use   |    1 | Attached to testvm-8 on /dev/vda   |
| e83a97bc-59a1-4f27-90f0-40af0c85a528 | vol-7  | in-use   |    1 | Attached to testvm-7 on /dev/vda   |
| 0ba0c19a-26ce-46ba-9713-7b3d21b943b1 | vol-6  | in-use   |    1 | Attached to testvm-6 on /dev/vda   |
| d56813b3-35f5-4e40-b733-415f75c93b39 | vol-5  | in-use   |    1 | Attached to testvm-5 on /dev/vda   |
| 227398c0-2aba-4cea-bf07-33d1fff979f1 | vol-4  | in-use   |    1 | Attached to testvm-4 on /dev/vda   |
| c89c858e-0c96-4540-baa2-ac361e19fcc0 | vol-3  | in-use   |    1 | Attached to testvm-3 on /dev/vda   |
| 4612197a-9f9a-4155-9b8e-b005921dae08 | vol-2  | in-use   |    1 | Attached to testvm-2 on /dev/vda   |
| e2d2dde7-53c0-45d0-9394-bbb1cff7fb91 | vol-1  | in-use   |    1 | Attached to testvm-1 on /dev/vda   |
+--------------------------------------+--------+----------+------+------------------------------------+
(projectmember) (overcloud) [stack.lab ~]$ 
~~~

2. The user that created the volume can now neither reset the state nor delete it:
~~~
(projectmember) (overcloud) [stack.lab ~]$ openstack volume delete vol-11
Failed to delete volume with name or ID 'vol-11': Invalid volume: Volume status must be available or error or error_restoring or error_extending or error_managing and must not be migrating, attached, belong to a group, have snapshots or be disassociated from snapshots after volume transfer. (HTTP 400) (Request-ID: req-260992f9-5f76-46ad-ab30-ec6090691644)
1 of 1 volumes failed to delete.
(projectmember) (overcloud) [stack.lab ~]$ 
(projectmember) (overcloud) [stack.lab ~]$ cinder delete vol-11
Delete for volume vol-11 failed: Invalid volume: Volume status must be available or error or error_restoring or error_extending or error_managing and must not be migrating, attached, belong to a group, have snapshots or be disassociated from snapshots after volume transfer. (HTTP 400) (Request-ID: req-7f8ec1e0-a6e8-47c7-a47f-5c44e3681edc)
ERROR: Unable to delete any of the specified volumes.
(projectmember) (overcloud) [stack.lab ~]$ 
(projectmember) (overcloud) [stack.lab ~]$ cinder reset-state --state Available vol-11
Policy doesn't allow volume_extension:volume_admin_actions:reset_status to be performed. (HTTP 403) (Request-ID: req-08e650ea-a8ca-4f41-944b-7e5906965a5d)
ERROR: Unable to reset the state for the specified entity(s).
(projectmember) (overcloud) [stack.lab ~]$ 
~~~

Expected results:

1. Ideally, the failed server create request should cleanup the cinder volume reservation, or 
2. A user with member role should be able to delete, or reset-state and then delete the volume (although granted, there could be situations in which the root cause needs to be investigated before resetting the volume state and therefore and admin is required).


Additional info:

Note that this seems to be specifically related to using server groups. A subsequent build without a hint to a server group suceeds:
~~~
(projectmember) (overcloud) [stack.lab ~]$ openstack volume create --image cirros-0.5.2 --size 1 vol-12 -c id
+-------+--------------------------------------+
| Field | Value                                |
+-------+--------------------------------------+
| id    | 027646d9-8b3b-4f25-b367-c353cb82a88d |
+-------+--------------------------------------+
(projectmember) (overcloud) [stack.lab ~]$ openstack server create --volume vol-12 --flavor 1c1g testvm-12 -c id --wait

+-------+--------------------------------------+
| Field | Value                                |
+-------+--------------------------------------+
| id    | fac91ca3-a204-4ce3-ac60-6e15d6f02984 |
+-------+--------------------------------------+
(projectmember) (overcloud) [stack.lab ~]$ 
(projectmember) (overcloud) [stack.lab ~]$ openstack volume list
+--------------------------------------+--------+----------+------+------------------------------------+
| ID                                   | Name   | Status   | Size | Attached to                        |
+--------------------------------------+--------+----------+------+------------------------------------+
| 027646d9-8b3b-4f25-b367-c353cb82a88d | vol-12 | in-use   |    1 | Attached to testvm-12 on /dev/vda  |
| b48decf3-ca54-4021-892f-1f352396befa | vol-11 | reserved |    1 |                                    |
| f73e40f0-5884-4e94-8ada-ff91ca2fe4ec | vol-10 | in-use   |    1 | Attached to testvm-10 on /dev/vda  |
| 43072eba-d079-4cd4-bd15-f26f47cde067 | vol-9  | in-use   |    1 | Attached to testvm-9 on /dev/vda   |
| 3d0c8514-ed33-4ff6-abb1-029a83673755 | vol-8  | in-use   |    1 | Attached to testvm-8 on /dev/vda   |
| e83a97bc-59a1-4f27-90f0-40af0c85a528 | vol-7  | in-use   |    1 | Attached to testvm-7 on /dev/vda   |
| 0ba0c19a-26ce-46ba-9713-7b3d21b943b1 | vol-6  | in-use   |    1 | Attached to testvm-6 on /dev/vda   |
| d56813b3-35f5-4e40-b733-415f75c93b39 | vol-5  | in-use   |    1 | Attached to testvm-5 on /dev/vda   |
| 227398c0-2aba-4cea-bf07-33d1fff979f1 | vol-4  | in-use   |    1 | Attached to testvm-4 on /dev/vda   |
| c89c858e-0c96-4540-baa2-ac361e19fcc0 | vol-3  | in-use   |    1 | Attached to testvm-3 on /dev/vda   |
| 4612197a-9f9a-4155-9b8e-b005921dae08 | vol-2  | in-use   |    1 | Attached to testvm-2 on /dev/vda   |
| e2d2dde7-53c0-45d0-9394-bbb1cff7fb91 | vol-1  | in-use   |    1 | Attached to testvm-1 on /dev/vda   |
+--------------------------------------+--------+----------+------+------------------------------------+
(projectmember) (overcloud) [stack.lab ~]$ 
~~

Moreover, a server create request that exceeds a different quota (instances) does not leave the volume in reserved state:
~~~
(projectmember) (overcloud) [stack.lab ~]$ openstack volume create --image cirros-0.5.2 --size 1 vol-13
+---------------------+--------------------------------------+
| Field               | Value                                |
+---------------------+--------------------------------------+
| attachments         | []                                   |
| availability_zone   | nova                                 |
| bootable            | false                                |
| consistencygroup_id | None                                 |
| created_at          | 2024-05-08T14:42:47.754744           |
| description         | None                                 |
| encrypted           | False                                |
| id                  | c1cd70f4-23d0-4459-a238-2f92582e6e8c |
| multiattach         | False                                |
| name                | vol-13                               |
| properties          |                                      |
| replication_status  | None                                 |
| size                | 1                                    |
| snapshot_id         | None                                 |
| source_volid        | None                                 |
| status              | creating                             |
| type                | tripleo                              |
| updated_at          | None                                 |
| user_id             | 721a956e24de48b9a92a7d8ad74b3ec4     |
+---------------------+--------------------------------------+
(projectmember) (overcloud) [stack.lab ~]$ 
(projectmember) (overcloud) [stack.lab ~]$ openstack server create --volume vol-13 --flavor 1c1g testvm-13 -c id --wait
Quota exceeded for instances: Requested 1, but already used 11 of 11 instances (HTTP 403) (Request-ID: req-1de133c1-9f4c-4a05-a4d3-e87b6125e7be)
(projectmember) (overcloud) [stack.lab ~]$ 
(projectmember) (overcloud) [stack.lab ~]$ openstack volume list
+--------------------------------------+--------+-----------+------+------------------------------------+
| ID                                   | Name   | Status    | Size | Attached to                        |
+--------------------------------------+--------+-----------+------+------------------------------------+
| c1cd70f4-23d0-4459-a238-2f92582e6e8c | vol-13 | available |    1 |                                    |
| 027646d9-8b3b-4f25-b367-c353cb82a88d | vol-12 | in-use    |    1 | Attached to testvm-12 on /dev/vda  |
| b48decf3-ca54-4021-892f-1f352396befa | vol-11 | reserved  |    1 |                                    |
| f73e40f0-5884-4e94-8ada-ff91ca2fe4ec | vol-10 | in-use    |    1 | Attached to testvm-10 on /dev/vda  |
| 43072eba-d079-4cd4-bd15-f26f47cde067 | vol-9  | in-use    |    1 | Attached to testvm-9 on /dev/vda   |
| 3d0c8514-ed33-4ff6-abb1-029a83673755 | vol-8  | in-use    |    1 | Attached to testvm-8 on /dev/vda   |
| e83a97bc-59a1-4f27-90f0-40af0c85a528 | vol-7  | in-use    |    1 | Attached to testvm-7 on /dev/vda   |
| 0ba0c19a-26ce-46ba-9713-7b3d21b943b1 | vol-6  | in-use    |    1 | Attached to testvm-6 on /dev/vda   |
| d56813b3-35f5-4e40-b733-415f75c93b39 | vol-5  | in-use    |    1 | Attached to testvm-5 on /dev/vda   |
| 227398c0-2aba-4cea-bf07-33d1fff979f1 | vol-4  | in-use    |    1 | Attached to testvm-4 on /dev/vda   |
| c89c858e-0c96-4540-baa2-ac361e19fcc0 | vol-3  | in-use    |    1 | Attached to testvm-3 on /dev/vda   |
| 4612197a-9f9a-4155-9b8e-b005921dae08 | vol-2  | in-use    |    1 | Attached to testvm-2 on /dev/vda   |
| e2d2dde7-53c0-45d0-9394-bbb1cff7fb91 | vol-1  | in-use    |    1 | Attached to testvm-1 on /dev/vda   |
+--------------------------------------+--------+-----------+------+------------------------------------+
(projectmember) (overcloud) [stack.lab ~]$ 
~~~

Comment 1 melanie witt 2024-05-13 20:12:22 UTC
Thank you for the very clear and detailed bug report! It makes it so much easier to investigate the problem.

Taking a look at the code, I see where the volume is reserved before checking server group members quota [1]:

    def _provision_instances(
        [...]
                block_device_mapping = (
                    self._bdm_validate_set_size_and_instance(context,
                        instance, flavor, block_device_mapping,
                        image_cache, volumes, supports_multiattach))
        [...]

and I also see that there is no cleanup of the Cinder volume attachment if quota is exceeded. This is definitely a bug.

(In reply to Eric Nothen from comment #0)
> Expected results:
> 
> 1. Ideally, the failed server create request should cleanup the cinder
> volume reservation, or 
> 2. A user with member role should be able to delete, or reset-state and then
> delete the volume (although granted, there could be situations in which the
> root cause needs to be investigated before resetting the volume state and
> therefore and admin is required).

Regarding 2., you are right that reset-state is something that should be done with careful consideration and has a default API policy of admin because of it.

In this case, the cleanup that needs to happen is delete of the Cinder volume attachment. There was a CVE [2] around attachment deletes if they are attached to a Nova server. But in your case the volume is just 'reserved' and not associated with an existing Nova server yet.

So it may be already possible for the user with member role to unwedge the situation themselves by deleting the volume attachment, for example:

$ openstack --os-volume-api-version 3.27 --os-cloud devstack volume attachment list 
+--------------------------------------+--------------------------------------+--------------------------------------+----------+
| ID                                   | Volume ID                            | Server ID                            | Status   |
+--------------------------------------+--------------------------------------+--------------------------------------+----------+
| d4470d80-ec1d-480c-8c59-1b3ff2f3985f | a4ef4a82-6aad-4637-8759-e01bb706ef2f | c6e2f02e-a408-4a5f-a325-55481a7d5c61 | reserved |
+--------------------------------------+--------------------------------------+--------------------------------------+----------+

$ openstack --os-volume-api-version 3.27 --os-cloud devstack volume attachment delete d4470d80-ec1d-480c-8c59-1b3ff2f3985f

After that the volume should be in 'available' state and able to be deleted. I'm not sure if you tried that also.


[1] https://github.com/openstack/nova/blob/7096423b343ffce9622fd078fc2b3a87fd3386f7/nova/compute/api.py#L1464
[2] https://security.openstack.org/ossa/OSSA-2023-003.html

Comment 2 Eric Nothen 2024-05-14 07:23:16 UTC
~~~
(OSPCLIENT) (overcloud) [stack.lab ~]$ openstack volume show -c id -c status -c attachments vol-11
+-------------+--------------------------------------+
| Field       | Value                                |
+-------------+--------------------------------------+
| attachments | []                                   |
| id          | b48decf3-ca54-4021-892f-1f352396befa |
| status      | reserved                             |
+-------------+--------------------------------------+
(OSPCLIENT) (overcloud) [stack.lab ~]$ openstack volume attachment list --os-volume-api-version 3.27  --volume-id b48decf3-ca54-4021-892f-1f352396befa
+--------------------------------------+--------------------------------------+--------------------------------------+----------+
| ID                                   | Volume ID                            | Server ID                            | Status   |
+--------------------------------------+--------------------------------------+--------------------------------------+----------+
| 7b277469-ba1f-40ba-9fc0-474b12f2e607 | b48decf3-ca54-4021-892f-1f352396befa | cfbb0c77-36bd-4f86-b591-23a0a4f857cf | reserved |
+--------------------------------------+--------------------------------------+--------------------------------------+----------+
(OSPCLIENT) (overcloud) [stack.lab ~]$ 
(OSPCLIENT) (overcloud) [stack.lab ~]$ openstack volume attachment delete --os-volume-api-version 3.27 7b277469-ba1f-40ba-9fc0-474b12f2e607
(OSPCLIENT) (overcloud) [stack.lab ~]$ 
(OSPCLIENT) (overcloud) [stack.lab ~]$ openstack volume show -c id -c status -c attachments vol-11
+-------------+--------------------------------------+
| Field       | Value                                |
+-------------+--------------------------------------+
| attachments | []                                   |
| id          | b48decf3-ca54-4021-892f-1f352396befa |
| status      | available                            |
+-------------+--------------------------------------+
(OSPCLIENT) (overcloud) [stack.lab ~]$ 
~~~

Indeed that works. Thank you for the hint.


Note You need to log in before you can comment on or make changes to this bug.