Bug 2255324 - After OSP update or FFU (16.2->17.1) manila CephFS NFS shares are not accessible
Summary: After OSP update or FFU (16.2->17.1) manila CephFS NFS shares are not accessible
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: tripleo-ansible
Version: 17.1 (Wallaby)
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: z3
: 17.1
Assignee: OpenStack Manila Bugzilla Bot
QA Contact: Alfredo
RHOS Documentation Team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-12-20 06:15 UTC by Itzik Brown
Modified: 2024-10-09 16:16 UTC (History)
19 users (show)

Fixed In Version: tripleo-ansible-3.3.1-17.1.20231101230824.el9ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2024-01-29 14:36:43 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 905008 0 None NEW Do not mount assimilate_conf during ceph-nfs deployment 2024-01-09 19:42:25 UTC
Red Hat Issue Tracker OSP-30961 0 None None None 2023-12-20 06:19:22 UTC
Red Hat Knowledge Base (Solution) 7051579 0 None None None 2024-01-09 19:46:12 UTC
Red Hat Product Errata RHBA-2024:0547 0 None None None 2024-01-29 14:36:47 UTC

Description Itzik Brown 2023-12-20 06:15:46 UTC
Description of problem:
After upgrading from OSP16.2 to 17.1 a NFS share is not accessible.

[stack@undercloud-0 ~]$ manila list
+--------------------------------------+----------+------+-------------+-----------+-----------+-----------------+-------------------------+-------------------+
| ID                                   | Name     | Size | Share Proto | Status    | Is Public | Share Type Name | Host                    | Availability Zone |
+--------------------------------------+----------+------+-------------+-----------+-----------+-----------------+-------------------------+-------------------+
| 7dae998e-d619-4f4a-90e6-58604640e46e | share-01 | 1    | NFS         | available | False     | default         | hostgroup@cephfs#cephfs | nova              |
+--------------------------------------+----------+------+-------------+-----------+-----------+-----------------+-------------------------+-------------------+
 [stack@undercloud-0 ~]$ manila access-list 7dae998e-d619-4f4a-90e6-58604640e46e
+--------------------------------------+-------------+-----------+--------------+--------+------------+----------------------------+------------+
| id                                   | access_type | access_to | access_level | state  | access_key | created_at                 | updated_at |
+--------------------------------------+-------------+-----------+--------------+--------+------------+----------------------------+------------+
| d064eeb0-9dcf-470c-b2ee-5e64419fe609 | ip          | 0.0.0.0/0 | rw           | active | None       | 2023-12-19T13:22:16.000000 | None       |
+--------------------------------------+-------------+-----------+--------------+--------+------------+----------------------------+------------+
 [stack@undercloud-0 ~]$ manila share-export-location-list share-01
+--------------------------------------+--------------------------------------------------------------------+-----------+
| ID                                   | Path                                                               | Preferred |
+--------------------------------------+--------------------------------------------------------------------+-----------+
| 0ed3e1be-3a28-4808-a6d8-f4d666d768c8 | 172.17.5.33:/volumes/_nogroup/51c2a902-813a-4fbe-8785-ef2835f49e63 | False     |
+--------------------------------------+--------------------------------------------------------------------+-----------+

[root@vm1 ~]# mount -t nfs 172.17.5.33:/volumes/_nogroup/51c2a902-813a-4fbe-8785-ef2835f49e63  /mnt/foo
mount.nfs: mounting 172.17.5.33:/volumes/_nogroup/51c2a902-813a-4fbe-8785-ef2835f49e63 failed, reason given by server: No such file or directory


Version-Release number of selected component (if applicable):
RHOS-16.2-RHEL-8-20230510.n.1
RHOS-17.1-RHEL-8-20231215.n.1

How reproducible:


Steps to Reproduce:
1. After installation of OSP16.2 create a NFS share and set the proper access
2. Mount it in an instance
3. Upgrade to OSP17.1
4. Verify that the share is not accessible from the instance that was created

Actual results:


Expected results:


Additional info:

Comment 2 Itzik Brown 2023-12-20 06:55:45 UTC
The setup is using HCI.

Comment 13 Goutham Pacha Ravi 2024-01-03 23:03:24 UTC
Adding some more comments here; We were considering if this needs to be a blocker bug.. 

After the failure occurred, I saw that a rados object that contained all the export object entries (%url rados://manila_data/ganesha-export-index) was erased/re-set.. 


we have some code that deals with this object: https://opendev.org/openstack/tripleo-ansible/src/branch/master/tripleo_ansible/roles/tripleo_cephadm/tasks/nfs.yaml#L26-L44
I see that the tripleo step that checks for the object omits "become: true" (I suspect this command could fail silently)

We'll try to capture logs from this playbook in a re-run of the test.

Comment 17 Goutham Pacha Ravi 2024-01-05 22:41:04 UTC
Adding the steps to apply the workaround:


Prior to running the overcloud FFU (https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/17.1/html-single/framework_for_upgrades_16.2_to_17.1/index#upgrading-a-standard-overcloud_upgrading-overcloud-standard)

Run the following on the controller node that contains the "ceph-nfs-pacemaker" service:


   # podman exec ceph-nfs-pacemaker rados -n client.manila -p manila_data get ganesha-export-index export_index_backup.txt


You may inspect the data in the "export_index_backup.txt" file. If you had manila shares created, you will have one or more lines in this file, each containing a RADOS URL to export information. This export information exists on rados, and is not affected by this bug. 
Once the FFU is complete, and prior to proceeding to the system upgrade steps, ensure that the ganesha-export-index is recreated:


   # podman exec ceph-nfs-pacemaker rados -n client.manila -p manila_data put ganesha-export-index export_index_backup.txt


Verify that the object exists, and its contents match with:

   # podman exec ceph-nfs-pacemaker rados -n client.manila -p manila_data get ganesha-export-index -

Comment 37 errata-xmlrpc 2024-01-29 14:36:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 17.1.2 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2024:0547


Note You need to log in before you can comment on or make changes to this bug.