Bug 2148494 - Ansible task 'Write group_vars file' fails during converge step
Summary: Ansible task 'Write group_vars file' fails during converge step
Keywords:
Status: CLOSED DUPLICATE of bug 2136489
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 16.2 (Train)
Hardware: All
OS: Linux
high
high
Target Milestone: ---
: ---
Assignee: Brendan Shephard
QA Contact: Joe H. Rahme
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-11-25 16:27 UTC by Paul Jany
Modified: 2022-12-02 07:41 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-11-28 10:34:31 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-20473 0 None None None 2022-11-25 16:35:37 UTC

Description Paul Jany 2022-11-25 16:27:43 UTC
Description of problem:
Customer is performing minor upgrade to OSP version 16.2. 
During the converge step it fails with error[1]
[1]
2022-11-25 04:30:31.152665 | 043f72de-d3f9-3533-4d43-0000000000f3 |      FATAL | Write group_vars file | undercloud | error={"censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result"}

Version-Release number of selected component (if applicable):
Red Hat OpenStack Platform release 16.2.3 (Train)


How reproducible:
Customer has upgraded all overcloud nodes and in the last stage of overcloud converge. Above error happens during this step. 

The error seems to happen while executing this code:
https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/train/deployment/octavia/octavia-deployment-config.j2.yaml#L309-L315

This issue seems similar to BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2136393 
However, the installed version of openstack-tripleo-heat-templates is later than in the fixed version of 'openstack-tripleo-heat-templates-11.3.2-1.20221013153258.29a02c1.el8ost'

$ cat installed-rpms | grep openstack-tripleo-heat-templates
openstack-tripleo-heat-templates-11.6.1-2.20220409014870.el8ost.noarch Sun Nov 20 22:46:13 2022

Actual results:
Minor update fails.

Expected results:
Minor update complete.

Comment 2 Brendan Shephard 2022-11-27 08:56:58 UTC
I'd say the permissions on the directory are already incorrect, so mistral isn't able to access them during the execution.

What are the permissions there?
 $ sudo ls -la /var/lib/mistral/overcloud
 $ sudo ls -la /var/lib/mistral/overcloud/octavia-ansible
 $ sudo ls -la /var/lib/mistral/overcloud/octavia-ansible/group_vars
 $ sudo ls -la /var/lib/mistral/overcloud/octavia-ansible/local_dir

Try:

1. Move the existing overcloud directory to a backup directory, for example:
sudo mv /var/lib/mistral/overcloud{,-backup}

2. Re-run the update converge script.


Are you able to confirm if that resolves the issue here?

Comment 3 Takashi Kajinami 2022-11-28 04:04:00 UTC
This is definitely duplicate of bz 2136489. I'll close this as a duplicate once we get feedback to the comment:2.
Bz 2136393 is for OSP16.1. Please make sure you check the bug for the correct version.

The build with the fix is already available. In case the steps suggested by Brendan does not work then request hotfix.
Or probably you can try
 1. Downgrade mistral-executor in undercloud to 16.2.3-10
 2. Move /var/lib/mistral/overcloud
 3. Run converge again

Comment 5 Rabi Mishra 2022-11-28 10:34:31 UTC

*** This bug has been marked as a duplicate of bug 2136489 ***


Note You need to log in before you can comment on or make changes to this bug.