Bug 1817457 - [osp16] Undercloud update fails, ansible cannot parse the generated playbook: ERROR! no action detected in task
Summary: [osp16] Undercloud update fails, ansible cannot parse the generated playbook:...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-tripleoclient
Version: 16.0 (Train)
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: z2
: 16.0 (Train on RHEL 8.1)
Assignee: Sofer Athlan-Guyot
QA Contact: David Rosenfeld
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-03-26 11:38 UTC by Sofer Athlan-Guyot
Modified: 2021-03-16 13:38 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-03-16 13:38:46 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1869776 0 None None None 2020-04-06 11:48:27 UTC
OpenStack gerrit 717719 0 None MERGED Update necessary packages before undercloud upgarde. 2020-12-03 16:39:52 UTC
OpenStack gerrit 718683 0 None MERGED train: release tripleo-ansible 0.5.0 2020-12-03 16:40:19 UTC
RDO 26358 0 None None None 2020-04-09 12:12:03 UTC

Description Sofer Athlan-Guyot 2020-03-26 11:38:42 UTC
Description of problem: Running the update from GA to passed_phase1
fail during undercloud update:

    2020-03-26 02:58:57 |  Stack undercloud/c5bc28d0-b3cd-44c7-a885-871cf612b464 CREATE_COMPLETE
    2020-03-26 02:58:57 |
    2020-03-26 02:59:07 | Generating default ansible config file /home/stack/ansible.cfg
    2020-03-26 02:59:07 | ** Running ansible upgrade tasks **
    2020-03-26 02:59:08 | ERROR! no action detected in task. This often indicates a misspelled module name, or incorrect module path.
    2020-03-26 02:59:08 |
    2020-03-26 02:59:08 | The error appears to be in '/home/stack/undercloud-ansible-fgafm87l/common_deploy_steps_playbooks.yaml': line 47, column 7, but may
    2020-03-26 02:59:08 | be elsewhere in the file depending on the exact syntax problem.
    2020-03-26 02:59:08 |
    2020-03-26 02:59:08 | The offending line appears to be:
    2020-03-26 02:59:08 |
    2020-03-26 02:59:08 |   tasks:
    2020-03-26 02:59:08 |     - name: Render all_nodes data as group_vars for overcloud
    2020-03-26 02:59:08 |       ^ here
    2020-03-26 02:59:08 | Exception: Upgrade failed
    2020-03-26 02:59:08 | Traceback (most recent call last):
    2020-03-26 02:59:08 |   File "/usr/lib/python3.6/site-packages/tripleoclient/v1/tripleo_deploy.py", line 1319, in _standalone_deploy
    2020-03-26 02:59:08 |     raise exceptions.DeploymentError('Upgrade failed')
    2020-03-26 02:59:08 | tripleoclient.exceptions.DeploymentError: Upgrade failed

Version-Release number of selected component (if applicable): This
only happens when updating from GA to 16.0-RHEL-8-20200324.n.0,
updating from z1 (16.0-RHEL-8-20200226.n.1) to phase2
(RHOS_TRUNK-16.0-RHEL-8-20200226.n.1) doesn't show any issue at that
stage.

So is not working:

  openstack-tripleo-heat-templates-11.3.2-0.20200131125640.cc909b6 to openstack-tripleo-heat-templates-11.3.2-0.20200324120625.c3a8eb4.el8ost

while this works ...

  openstack-tripleo-heat-templates-11.3.2-0.20200211065546.d3d6dc3 to openstack-tripleo-heat-templates-11.3.2-0.20200211065546.d3d6dc3

... ok, this is the same version.  But it shows that the generate
ansible in
openstack-tripleo-heat-templates-11.3.2-0.20200211065546.d3d6dc3 are
working.

So there is a regression between d3d6dc3 and c3a8eb4

Change-Id: Ib00e8aa9f7d06517290543a8aaf8a2527969bd3c
445387589f2ebad1d85456bc03f82996ca87ffcf is a likely contender.

Comment 4 Sofer Athlan-Guyot 2020-03-30 17:27:40 UTC
Hi,

so the source of issue is that GA have a version of tripleo-ansible[1]
that doesn't have the action plugin
/usr/share/ansible/plugins/action/tripleo_all_nodes_data.py.

Then when we run the ansible action generated from the (manually
updated) tht, we lack that dependency.

So we need to update the documentation to reflect on the fact that
tripleo-ansible needs to be manually updated as well, before running the 
undercloud update.

The obvious workaround is thus:

   sudo dnf update tripleo-ansible

Changing this to doc bug.

Basically we need to update that section[2] and change:

   $ sudo dnf update -y python3-tripleoclient* openstack-tripleo-common openstack-tripleo-heat-templates

to

   $ sudo dnf update -y python3-tripleoclient* openstack-tripleo-common openstack-tripleo-heat-templates tripleo-ansible

We need to implements it as well in tripleo-upgrade to fix the ci.  I will follow up with a review there.

[1] tripleo-ansible-0.4.2-0.20200324115450.9501781.el8os
[2] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.0/html-single/keeping_red_hat_openstack_platform_updated/index#performing-a-minor-update-of-a-containerized-undercloud

Comment 5 Sofer Athlan-Guyot 2020-04-01 14:06:14 UTC
Hi,

so after discussion this should go inside python-tripleoclient and be removed from doc.

@Dan, hope you haven't started yet, we will need a modification but not the one mentioned currently.

Comment 8 Sofer Athlan-Guyot 2020-04-06 11:50:08 UTC
Note that the workaround is simple: if you hit that error, just run 

 sudo dnf upgrade -y tripleo-ansible 

and re-run the undercloud upgrade.

Comment 9 Emilien Macchi 2020-04-09 12:03:19 UTC
Note that the path forward is not https://review.opendev.org/#/c/717719.

1) We need to tag tripleo-ansible in stable/train where the tag contains the code for the plugin (needed after the upgrade)
2) We need to update tripleoclient-distgit to depends on tripleo-ansible new tag.

Comment 12 Sofer Athlan-Guyot 2020-04-16 09:03:12 UTC
Hi,

so we went through several iterations of solution here.  Eventually it was decided to fully integrate update of required rpm inside tripleoclient:

 - this is required : https://review.opendev.org/717292
 - this improve above the previous fix: https://review.opendev.org/718784

The usual packaging solution proposed by Emilien has been done nevertheless and is available in openstack-tripleo-heat-templates-11.3.2-0.20200413135434.cf3c03e.el8ost.noarch.rpm (spec file has "Requires: tripleo-ansible >= 0.5.0")

Now, given this is not a blocker (the simple workaround is to dnf install tripleo-ansible), but it's a blocker for CI, what is our best course of action here:

 1. wait for the code to be merged
 2. use a rpm that has the right spec file;
 3. update the doc for z2 (adding tripleo-ansible) and merge https://review.opendev.org/#/c/716034/ (in tripleo-upgrade) which would fix the ci:

Again unless we can get 2 somehow, I think our option is to get 3. and then create another bz that target z3 for the improvement in the user interface.

Comment 15 Sofer Athlan-Guyot 2020-04-27 13:41:51 UTC
So,

this bug has been "solved" for OSP16z2 by including tripleo-ansible as a package to be manually updated before update[1].  An associated review has been merged in tripleo-upgrade https://review.opendev.org/#/c/716034/ in train only.

I'm setting this bug as TestOnly and I can already validate it as we got a OSP16 jobs that went past this error in the CI.

All the code that automatize that part of the procedure, will be included in z3.  I've created https://bugzilla.redhat.com/show_bug.cgi?id=1828273 to follow up on the documentation change that will be then necessary.

[1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.0/html/keeping_red_hat_openstack_platform_updated/assembly-updating_the_undercloud#performing-a-minor-update-of-a-containerized-undercloud


Note You need to log in before you can comment on or make changes to this bug.