Bug 2137484 - Re-deployment fails if octavia or ceph is enabled
Summary: Re-deployment fails if octavia or ceph is enabled
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-common
Version: 16.1 (Train)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: z4
: 16.2 (Train on RHEL 8.4)
Assignee: Takashi Kajinami
QA Contact: David Rosenfeld
URL:
Whiteboard:
: 2139138 2149283 (view as bug list)
Depends On:
Blocks: 2138184
TreeView+ depends on / blocked
 
Reported: 2022-10-25 07:52 UTC by Takashi Kajinami
Modified: 2022-12-13 15:22 UTC (History)
19 users (show)

Fixed In Version: openstack-tripleo-common-11.7.1-2.20220923014728.el8ost
Doc Type: Bug Fix
Doc Text:
RHSA-2022:6969 introduced the process to clean up files in the /var/lib/mistral directory in the undercloud but the process consistently failed when the Load-balancing service (octavia) or Red Hat Ceph Storage was enabled because these services created additional directories, which the cleanup process could not properly remove. Some deployment actions, such as scale out, consistently failed if the Load-balancing service or Ceph Storage was enabled. With this update, Mistral no longer executes the cleanup. Users must manually delete files if they want to enforce the reduced permission of the files in the /var/lib/mistral directory. Deployment actions no longer fail because of a permission error.
Clone Of:
: 2138184 (view as bug list)
Environment:
Last Closed: 2022-12-07 19:25:51 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 862556 0 None stable/train: MERGED tripleo-common: Train-only: Do not attempt to remove config-download files (Ib819c40862302065b6b52f68f0460f3d533d2194) 2022-10-31 15:40:55 UTC
Red Hat Issue Tracker OSP-19607 0 None None None 2022-10-25 08:18:41 UTC
Red Hat Product Errata RHBA-2022:8794 0 None None None 2022-12-07 19:26:25 UTC

Description Takashi Kajinami 2022-10-25 07:52:46 UTC
Description of problem:

This was found in bz 2136393 initially.

Updating overcloud by running the deployment command fails because of the following workflow error if octavia or ceph is enabled.
~~~
Waiting for messages on queue 'tripleo' with no timeout.
The action raised an exception
[action_ex_id=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx, msg='[Errno 13] Permission denied: 'local_dir'', action_cls='<class 'mistral.actions.action_factory.DownloadConfigAction'>', attributes='{}', params='{'work_dir': '/var/lib/mistral/overcloud', 'container_config':
'overcloud-config'}']
~~~

This is the regression caused by the fix for bz 2125078 .
The change introduced the step to purge files in /var/lib/mistral/<stack name> to enforce the proper permission
but the deployment tasks for octavia/ceph create files/directories owned by tripleo-admin in the directory
and the cleanup process fails with permission error.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Deploy overcloud with Octavia enabled
2. Run the same deployment command

Actual results:
The 2nd deployment fails because of the workflow error

Expected results:
The 2nd deployment should not fail.

Additional info:

Comment 8 Francesco Pantano 2022-11-07 17:01:00 UTC
*** Bug 2139138 has been marked as a duplicate of this bug. ***

Comment 16 David Rosenfeld 2022-11-14 17:31:15 UTC
Stack update is successful in a ceph deployment using: RHOS-16.2-RHEL-8-20221111.n.1

Comment 25 Matthew Secaur 2022-11-16 19:53:36 UTC
We deployed this fix [1] for Octavia in our lab with 16.2.3 and we were able to get a successful deployment. This was a fresh deployment and not an upgrade.

However, running the deployment again (i.e. running the exact same overcloud deploy command after the successful deployment) resulted in Permission Denied errors on the ceph-ansible directory immediately after the stack deploy and before the ansible deploy:

The action raised an exception [action_ex_id=0f0dc7ed-5415-4fca-a9bf-5828612ba391, msg='[Errno 13] Permission denied: '/var/lib/mistral/overcloud/ceph-ansible'', action_cls='<class 'mistral.actions.action_factory.DownloadConfigAction'>', attributes='{}', params='{'work_dir': '/var/lib/mistral/overcloud', 'container_config': 'overcloud-config'}']Overcloud Endpoint: https://10.74.169.211:13000

I assume this would be a problem during an upgrade, too.

[1] https://review.opendev.org/c/openstack/tripleo-heat-templates/+/861945/1/deployment/octavia/octavia-deployment-config.j2.yaml#310

Comment 27 Takashi Kajinami 2022-11-17 01:21:24 UTC
@Matthew

That error is definitely what we are fixing in this bug. The package including the fix is not yet shipped in CDN.
Note that the octavia error you earlier mentioned is also fixed in bug 2136393 , which is mentioned in the problem description of this bug.

Comment 39 errata-xmlrpc 2022-12-07 19:25:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 16.2.4), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:8794

Comment 41 John Fulton 2022-12-13 15:22:06 UTC
*** Bug 2149283 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.