Bug 2229767

Summary: Failed to start openstack-manila-share resource in Pacemaker
Product: Red Hat OpenStack Reporter: Francesco Pantano <fpantano>
Component: openstack-tripleo-heat-templatesAssignee: Francesco Pantano <fpantano>
Status: POST --- QA Contact: Alfredo <alfrgarc>
Severity: urgent Docs Contact: Jenny-Anne Lynch <jelynch>
Priority: urgent    
Version: 17.1 (Wallaby)CC: eshames, gfidente, gouthamr, jbadiapa, jelynch, kthakre, lkuchlan, lsvaty, mburns, mkatari, pgrist, vhariria
Target Milestone: z1Keywords: Triaged
Target Release: 17.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Known Issue
Doc Text:
There is currently a known issue when you upgrade Red Hat Ceph Storage 4 to 5 during the upgrade from RHOSP 16.2 to 17.1. The `ceph-nfs` resource is misconfigured and Pacemaker does not manage the resource. The overcloud upgrade fails because the containers that are associated with `ceph-nfs-pacemaker` are down, impacting the Shared File Systems service (manila). A fix is expected in RHOSP 17.1.1. Workaround: Apply the workaround from Red Hat KCS solution 7028073: link:https://access.redhat.com/solutions/7028073[Pacemaker does not manage the `ceph-nfs` resource correctly during RHOSP and RHCS upgrade].
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2229777    

Description Francesco Pantano 2023-08-07 15:36:25 UTC
Description of problem:

After Ceph is moved from 4 to 5 using the existing FFU procedure, ceph-nfs is misconfigured and not managed by pacemaker anymore.
The overcloud upgrade is failing because the ceph-nfs-pacemaker associated containers can't be found:

```
[tripleo-admin@controller-1 ~]$ sudo journalctl -xef -u ceph-nfs@pacemaker                                                                                                                   
-- Logs begin at Mon 2023-08-07 09:24:36 UTC. --                                                                                                                                             
Aug 07 10:47:02 controller-1 podman[499303]: Error: no container with name or ID "ceph-nfs-pacemaker" found: no such container                                                               
Aug 07 10:47:02 controller-1 podman[499434]: Error: no container with name or ID "ceph-nfs-pacemaker" found: no such container                                                               
Aug 07 10:47:02 controller-1 podman[499519]: Error: error creating container storage: the container name "ceph-nfs-controller-1" is already in use by 
bf48b6439174131620e2feedf". You have to remove that container to be able to reuse that name.: that name is already in use                                                                    
Aug 07 10:47:02 controller-1 systemd[1]: ceph-nfs: Control process exited, code=exited status=125                                                                          
Aug 07 10:47:02 controller-1 systemd[1]: ceph-nfs: Failed with result 'exit-code'.                                                                                         
-- Support: https://access.redhat.com/support                                                                                                                                                
-- The unit ceph-nfs has entered the 'failed' state with result 'exit-code'.                                                                                               
Aug 07 10:47:02 controller-1 systemd[1]: Failed to start Cluster Controlled ceph-nfs@pacemaker.                                                                                              
-- Subject: Unit ceph-nfs has failed                                                                                                                                       
-- Defined-By: systemd                                                                                                                                                                       
-- Support: https://access.redhat.com/support                                                                                                                                                
-- Unit ceph-nfs has failed.                                                                                                                                               
-- The result is failed.
```

Instead of having a single container managed by pacemaker, we can see 3 different ceph-nfs containers with a default configuration
that doesn't apply to the OpenStack context.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info: