Bug 1976693

Summary: Creating a manila share fails and gets stuck in "creating" status
Product: Red Hat OpenStack Reporter: lkuchlan <lkuchlan>
Component: openstack-containersAssignee: Victoria Martinez de la Cruz <vimartin>
Status: CLOSED CURRENTRELEASE QA Contact: lkuchlan <lkuchlan>
Severity: urgent Docs Contact: mmurray
Priority: urgent    
Version: 16.2 (Train)CC: alfrgarc, bhubbard, bkopilov, dhill, gcharot, gfidente, gkadam, gouthamr, idryomov, jamsmith, lmarsh, m.andre, mgarciac, mhackett, moddi, ndeevy, pdonnell, pgrist, spower, tbarron, vhariria, vimartin
Target Milestone: AlphaKeywords: AutomationBlocker, Regression, TestOnly, Triaged
Target Release: 16.2 (Train on RHEL 8.4)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-manila-share-container-16.2.0-54 Doc Type: Bug Fix
Doc Text:
The Shared File Systems service (manila) uses the CephFS volume client to communicate with Ceph Storage clusters. Previously, the CephFS volume client package aborted while creating or deleting file systems. + The aborted operations caused the manila-share process within the Shared File Systems service to restart, which caused shares that were being provisioned or deleted to be stuck in `creating` or `deleting` states, respectively. + With this release, the CephFS volume client package no longer aborts provisioning or deletion requests, and the manila-share process does not restart during these operations.
Story Points: ---
Clone Of:
: 1978688 1980423 (view as bug list) Environment:
Last Closed: 2021-09-22 10:10:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1978688    
Bug Blocks: 1980423    

Description lkuchlan 2021-06-28 04:16:53 UTC
Description of problem:
There is a failure to create a manila share in the ceph environments
(CephFS-Native and CephFS-NFS-Ganesha).

create-share requests get to the manila-share service and then the
child process handling the cephfs back end is killed for reasons not
yet known and then gets restarted and reinitialized.

Version-Release number of selected component (if applicable):
puppet-manila-15.5.0-2.20210601014536.9c6604a.el8ost.2.noarch
puppet-manila-15.5.0-2.20210601014536.9c6604a.el8ost.2.noarch
puppet-ceph-3.1.2-2.20210603181657.ffa80da.el8ost.1.noarch
ceph-ansible-4.0.57-1.el8cp.noarch

How reproducible:
100%

Steps to Reproduce:

(overcloud) [stack@undercloud-0 ~]$ manila type-list
+--------------------------------------+-----------+------------+------------+--------------------------------------+-------------------------------------+-------------+
| ID                                   | Name      | visibility | is_default | required_extra_specs                 | optional_extra_specs                | Description |
+--------------------------------------+-----------+------------+------------+--------------------------------------+-------------------------------------+-------------+
| 21a84950-d1d7-4bd4-9fd1-e410091e7479 | default   | public     | YES        | driver_handles_share_servers : False |  snapshot_support : True                                   | None        |
+--------------------------------------+-----------+------------+------------+--------------------------------------+-------------------------------------+-------------+

(overcloud) [stack@undercloud-0 ~]$ manila create nfs 1
+---------------------------------------+--------------------------------------+
| Property                              | Value                                |
+---------------------------------------+--------------------------------------+
| id                                    | 4064a7e9-8827-45fe-a88c-9e1550a49cd4 |
| size                                  | 1                                    |
| availability_zone                     | None                                 |
| created_at                            | 2021-06-28T04:02:26.000000           |
| status                                | creating                             |
| name                                  | None                                 |
| description                           | None                                 |
| project_id                            | d0d3edbb67754c01a4b71f0b379ca120     |
| snapshot_id                           | None                                 |
| share_network_id                      | None                                 |
| share_proto                           | NFS                                  |
| metadata                              | {}                                   |
| share_type                            | 21a84950-d1d7-4bd4-9fd1-e410091e7479 |
| is_public                             | False                                |
| snapshot_support                      | True                                 |
| task_state                            | None                                 |
| share_type_name                       | default                              |
| access_rules_status                   | active                               |
| replication_type                      | None                                 |
| has_replicas                          | False                                |
| user_id                               | 345024b1d79c4a1780d88f796575097c     |
| create_share_from_snapshot_support    | False                                |
| revert_to_snapshot_support            | False                                |
| share_group_id                        | None                                 |
| source_share_group_snapshot_member_id | None                                 |
| mount_snapshot_support                | False                                |
| share_server_id                       | None                                 |
| host                                  |                                      |
+---------------------------------------+--------------------------------------+

(overcloud) [stack@undercloud-0 ~]$ manila list
+--------------------------------------+------+------+-------------+----------+-----------+-----------------+-------------------------+-------------------+
| ID                                   | Name | Size | Share Proto | Status   | Is Public | Share Type Name | Host                    | Availability Zone |
+--------------------------------------+------+------+-------------+----------+-----------+-----------------+-------------------------+-------------------+
| 4064a7e9-8827-45fe-a88c-9e1550a49cd4 | None | 1    | NFS         | creating | False     | default         | hostgroup@cephfs#cephfs | nova              |
+--------------------------------------+------+------+-------------+----------+-----------+-----------------+-------------------------+-------------------+

Actual results:
Creating a manila share fails and gets stuck in "creating" status.

Expected results:
Creating a manila share should be successful.

Additional info:

2021-06-27 10:22:00.278 290 DEBUG manila.share.drivers.cephfs.driver
[req-d9e9bf29-7254-4ae8-8739-6256298b768d
345024b1d79c4a1780d88f796575097c d0d3edbb67754c01a4b71f0b379ca120 - -
-] create_share cephfs name=8dc0239c-1aef-4dc5-8647-f7bde9db4f66
size=1 share_group_id=None create_share
/usr/lib/python3.6/site-packages/manila/share/drivers/cephfs/driver.py:262
2021-06-27 10:22:00.288 290 INFO ceph_volume_client
[req-d9e9bf29-7254-4ae8-8739-6256298b768d
345024b1d79c4a1780d88f796575097c d0d3edbb67754c01a4b71f0b379ca120 - -
-] create_volume:
/volumes/_nogroup/8dc0239c-1aef-4dc5-8647-f7bde9db4f66
2021-06-27 10:22:01.017 7 INFO oslo_service.service
[req-54e69491-1385-4065-9f86-a77c606d65c6 - - - - -] Child 290 killed
by signal 6
2021-06-27 10:22:01.023 7 DEBUG oslo_service.service
[req-54e69491-1385-4065-9f86-a77c606d65c6 - - - - -] Started child 328
_start_child
/usr/lib/python3.6/site-packages/oslo_service/service.py:579

Comment 3 Yaniv Kaul 2021-06-30 12:24:21 UTC
Is that a regression? Isn't marked as such, but I assume it worked in previous releases?

Comment 4 Tom Barron 2021-06-30 14:51:35 UTC
(In reply to Yaniv Kaul from comment #3)
> Is that a regression? Isn't marked as such, but I assume it worked in
> previous releases?

Liron told me in email "I checked the earlier 16.2 phase3 run and we used ceph tag 4-44, here we used 4-55" so it used to work in 16.2 candidate code (and works in earlier versions).  Vida reported "Apparently the Ceph DFG job running the same pkg version had similar tempest failures."  I'm leaving the NEEDINFO in case QE has additional information but yeah, I think this is a regression and should be marked as such.

Comment 16 lkuchlan 2021-07-01 04:10:39 UTC
(In reply to Yaniv Kaul from comment #3)
> Is that a regression? Isn't marked as such, but I assume it worked in
> previous releases?

Right, in the previous phase 3 runs it's all been fine.

Comment 32 lkuchlan 2021-07-15 09:48:43 UTC
Verification steps:

(overcloud) [stack@undercloud-0 ~]$ manila type-list
+--------------------------------------+---------+------------+------------+--------------------------------------+-------------------------+-------------+
| ID                                   | Name    | visibility | is_default | required_extra_specs                 | optional_extra_specs    | Description |
+--------------------------------------+---------+------------+------------+--------------------------------------+-------------------------+-------------+
| 2649e34a-35b7-478f-a52f-cfe57523cedb | default | public     | YES        | driver_handles_share_servers : False | snapshot_support : True | None        |
+--------------------------------------+---------+------------+------------+--------------------------------------+-------------------------+-------------+

(overcloud) [stack@undercloud-0 ~]$ manila create nfs 1
+---------------------------------------+--------------------------------------+
| Property                              | Value                                |
+---------------------------------------+--------------------------------------+
| id                                    | b885b298-4693-416e-822f-21472cb4c622 |
| size                                  | 1                                    |
| availability_zone                     | None                                 |
| created_at                            | 2021-07-15T09:46:14.000000           |
| status                                | creating                             |
| name                                  | None                                 |
| description                           | None                                 |
| project_id                            | 5a25ffa19919411bb9c4a9405d8a677b     |
| snapshot_id                           | None                                 |
| share_network_id                      | None                                 |
| share_proto                           | NFS                                  |
| metadata                              | {}                                   |
| share_type                            | 2649e34a-35b7-478f-a52f-cfe57523cedb |
| is_public                             | False                                |
| snapshot_support                      | True                                 |
| task_state                            | None                                 |
| share_type_name                       | default                              |
| access_rules_status                   | active                               |
| replication_type                      | None                                 |
| has_replicas                          | False                                |
| user_id                               | abd23c0cdc894c35849812e9e194bc21     |
| create_share_from_snapshot_support    | False                                |
| revert_to_snapshot_support            | False                                |
| share_group_id                        | None                                 |
| source_share_group_snapshot_member_id | None                                 |
| mount_snapshot_support                | False                                |
| share_server_id                       | None                                 |
| host                                  |                                      |
+---------------------------------------+--------------------------------------+

(overcloud) [stack@undercloud-0 ~]$ manila list
+--------------------------------------+------+------+-------------+-----------+-----------+-----------------+-------------------------+-------------------+
| ID                                   | Name | Size | Share Proto | Status    | Is Public | Share Type Name | Host                    | Availability Zone |
+--------------------------------------+------+------+-------------+-----------+-----------+-----------------+-------------------------+-------------------+
| b885b298-4693-416e-822f-21472cb4c622 | None | 1    | NFS         | available | False     | default         | hostgroup@cephfs#cephfs | nova              |
+--------------------------------------+------+------+-------------+-----------+-----------+-----------------+-------------------------+-------------------+