Bug 2229767 - Failed to start openstack-manila-share resource in Pacemaker
Summary: Failed to start openstack-manila-share resource in Pacemaker
Keywords:
Status: POST
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 17.1 (Wallaby)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: z1
: 17.1
Assignee: Francesco Pantano
QA Contact: Alfredo
Jenny-Anne Lynch
URL:
Whiteboard:
Depends On:
Blocks: 2229777
TreeView+ depends on / blocked
 
Reported: 2023-08-07 15:36 UTC by Francesco Pantano
Modified: 2023-08-16 12:12 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
There is currently a known issue when you upgrade Red Hat Ceph Storage 4 to 5 during the upgrade from RHOSP 16.2 to 17.1. The `ceph-nfs` resource is misconfigured and Pacemaker does not manage the resource. The overcloud upgrade fails because the containers that are associated with `ceph-nfs-pacemaker` are down, impacting the Shared File Systems service (manila). A fix is expected in RHOSP 17.1.1. Workaround: Apply the workaround from Red Hat KCS solution 7028073: link:https://access.redhat.com/solutions/7028073[Pacemaker does not manage the `ceph-nfs` resource correctly during RHOSP and RHCS upgrade].
Clone Of:
Environment:
Last Closed:
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 890682 0 None MERGED Duplicate Manila with CephNFS to work with ceph-ansible 2023-08-16 06:08:07 UTC
Red Hat Issue Tracker OSP-26991 0 None None None 2023-08-07 16:23:22 UTC
Red Hat Issue Tracker OSP-27269 0 None None None 2023-08-07 15:37:12 UTC

Description Francesco Pantano 2023-08-07 15:36:25 UTC
Description of problem:

After Ceph is moved from 4 to 5 using the existing FFU procedure, ceph-nfs is misconfigured and not managed by pacemaker anymore.
The overcloud upgrade is failing because the ceph-nfs-pacemaker associated containers can't be found:

```
[tripleo-admin@controller-1 ~]$ sudo journalctl -xef -u ceph-nfs@pacemaker                                                                                                                   
-- Logs begin at Mon 2023-08-07 09:24:36 UTC. --                                                                                                                                             
Aug 07 10:47:02 controller-1 podman[499303]: Error: no container with name or ID "ceph-nfs-pacemaker" found: no such container                                                               
Aug 07 10:47:02 controller-1 podman[499434]: Error: no container with name or ID "ceph-nfs-pacemaker" found: no such container                                                               
Aug 07 10:47:02 controller-1 podman[499519]: Error: error creating container storage: the container name "ceph-nfs-controller-1" is already in use by 
bf48b6439174131620e2feedf". You have to remove that container to be able to reuse that name.: that name is already in use                                                                    
Aug 07 10:47:02 controller-1 systemd[1]: ceph-nfs: Control process exited, code=exited status=125                                                                          
Aug 07 10:47:02 controller-1 systemd[1]: ceph-nfs: Failed with result 'exit-code'.                                                                                         
-- Support: https://access.redhat.com/support                                                                                                                                                
-- The unit ceph-nfs has entered the 'failed' state with result 'exit-code'.                                                                                               
Aug 07 10:47:02 controller-1 systemd[1]: Failed to start Cluster Controlled ceph-nfs@pacemaker.                                                                                              
-- Subject: Unit ceph-nfs has failed                                                                                                                                       
-- Defined-By: systemd                                                                                                                                                                       
-- Support: https://access.redhat.com/support                                                                                                                                                
-- Unit ceph-nfs has failed.                                                                                                                                               
-- The result is failed.
```

Instead of having a single container managed by pacemaker, we can see 3 different ceph-nfs containers with a default configuration
that doesn't apply to the OpenStack context.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:


Note You need to log in before you can comment on or make changes to this bug.