Bug 2255324

Summary:	After OSP update or FFU (16.2->17.1) manila CephFS NFS shares are not accessible
Product:	Red Hat OpenStack	Reporter:	Itzik Brown <itbrown>
Component:	tripleo-ansible	Assignee:	OpenStack Manila Bugzilla Bot <openstack-manila-bugs>
Status:	CLOSED ERRATA	QA Contact:	Alfredo <alfrgarc>
Severity:	urgent	Docs Contact:	RHOS Documentation Team <rhos-docs>
Priority:	high
Version:	17.1 (Wallaby)	CC:	alfrgarc, anbs, ashrodri, astupnik, dhughes, fpantano, gouthamr, gregraka, imatza, jamsmith, jbadiapa, jelynch, jjoyce, jjung, lsvaty, mariel, mburns, mgarciac, pgrist
Target Milestone:	z3	Keywords:	AutomationBlocker, Triaged
Target Release:	17.1
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	tripleo-ansible-3.3.1-17.1.20231101230824.el9ost	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2024-01-29 14:36:43 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Itzik Brown 2023-12-20 06:15:46 UTC

Description of problem:
After upgrading from OSP16.2 to 17.1 a NFS share is not accessible.

[stack@undercloud-0 ~]$ manila list
+--------------------------------------+----------+------+-------------+-----------+-----------+-----------------+-------------------------+-------------------+
| ID                                   | Name     | Size | Share Proto | Status    | Is Public | Share Type Name | Host                    | Availability Zone |
+--------------------------------------+----------+------+-------------+-----------+-----------+-----------------+-------------------------+-------------------+
| 7dae998e-d619-4f4a-90e6-58604640e46e | share-01 | 1    | NFS         | available | False     | default         | hostgroup@cephfs#cephfs | nova              |
+--------------------------------------+----------+------+-------------+-----------+-----------+-----------------+-------------------------+-------------------+
 [stack@undercloud-0 ~]$ manila access-list 7dae998e-d619-4f4a-90e6-58604640e46e
+--------------------------------------+-------------+-----------+--------------+--------+------------+----------------------------+------------+
| id                                   | access_type | access_to | access_level | state  | access_key | created_at                 | updated_at |
+--------------------------------------+-------------+-----------+--------------+--------+------------+----------------------------+------------+
| d064eeb0-9dcf-470c-b2ee-5e64419fe609 | ip          | 0.0.0.0/0 | rw           | active | None       | 2023-12-19T13:22:16.000000 | None       |
+--------------------------------------+-------------+-----------+--------------+--------+------------+----------------------------+------------+
 [stack@undercloud-0 ~]$ manila share-export-location-list share-01
+--------------------------------------+--------------------------------------------------------------------+-----------+
| ID                                   | Path                                                               | Preferred |
+--------------------------------------+--------------------------------------------------------------------+-----------+
| 0ed3e1be-3a28-4808-a6d8-f4d666d768c8 | 172.17.5.33:/volumes/_nogroup/51c2a902-813a-4fbe-8785-ef2835f49e63 | False     |
+--------------------------------------+--------------------------------------------------------------------+-----------+

[root@vm1 ~]# mount -t nfs 172.17.5.33:/volumes/_nogroup/51c2a902-813a-4fbe-8785-ef2835f49e63  /mnt/foo
mount.nfs: mounting 172.17.5.33:/volumes/_nogroup/51c2a902-813a-4fbe-8785-ef2835f49e63 failed, reason given by server: No such file or directory


Version-Release number of selected component (if applicable):
RHOS-16.2-RHEL-8-20230510.n.1
RHOS-17.1-RHEL-8-20231215.n.1

How reproducible:


Steps to Reproduce:
1. After installation of OSP16.2 create a NFS share and set the proper access
2. Mount it in an instance
3. Upgrade to OSP17.1
4. Verify that the share is not accessible from the instance that was created

Actual results:


Expected results:


Additional info:

Comment 2 Itzik Brown 2023-12-20 06:55:45 UTC

The setup is using HCI.

Comment 13 Goutham Pacha Ravi 2024-01-03 23:03:24 UTC

Adding some more comments here; We were considering if this needs to be a blocker bug.. 

After the failure occurred, I saw that a rados object that contained all the export object entries (%url rados://manila_data/ganesha-export-index) was erased/re-set.. 


we have some code that deals with this object: https://opendev.org/openstack/tripleo-ansible/src/branch/master/tripleo_ansible/roles/tripleo_cephadm/tasks/nfs.yaml#L26-L44
I see that the tripleo step that checks for the object omits "become: true" (I suspect this command could fail silently)

We'll try to capture logs from this playbook in a re-run of the test.

Comment 17 Goutham Pacha Ravi 2024-01-05 22:41:04 UTC

Adding the steps to apply the workaround:


Prior to running the overcloud FFU (https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/17.1/html-single/framework_for_upgrades_16.2_to_17.1/index#upgrading-a-standard-overcloud_upgrading-overcloud-standard)

Run the following on the controller node that contains the "ceph-nfs-pacemaker" service:


   # podman exec ceph-nfs-pacemaker rados -n client.manila -p manila_data get ganesha-export-index export_index_backup.txt


You may inspect the data in the "export_index_backup.txt" file. If you had manila shares created, you will have one or more lines in this file, each containing a RADOS URL to export information. This export information exists on rados, and is not affected by this bug. 
Once the FFU is complete, and prior to proceeding to the system upgrade steps, ensure that the ganesha-export-index is recreated:


   # podman exec ceph-nfs-pacemaker rados -n client.manila -p manila_data put ganesha-export-index export_index_backup.txt


Verify that the object exists, and its contents match with:

   # podman exec ceph-nfs-pacemaker rados -n client.manila -p manila_data get ganesha-export-index -

Comment 37 errata-xmlrpc 2024-01-29 14:36:43 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 17.1.2 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2024:0547