Bug 2149963 - ceph_spec_bootstrap not handling children groups in tripleo-ansible-inventory
Summary: ceph_spec_bootstrap not handling children groups in tripleo-ansible-inventory
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: tripleo-ansible
Version: 17.1 (Wallaby)
Hardware: Unspecified
OS: Linux
medium
medium
Target Milestone: rc
: 17.1
Assignee: Manoj Katari
QA Contact: Yogev Rabl
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-12-01 13:20 UTC by pojadhav
Modified: 2023-08-16 01:13 UTC (History)
12 users (show)

Fixed In Version: tripleo-ansible-3.3.1-1.20230518201534.el9ost
Doc Type: Bug Fix
Doc Text:
Before this update, the cephadm utility did not process child groups when building specification files from inventory. With this update, specification file generation processes child groups.
Clone Of:
Environment:
Last Closed: 2023-08-16 01:12:55 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
changes made to inventory to make symptoms of bug go away (1.27 KB, patch)
2022-12-01 20:25 UTC, John Fulton
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1998649 0 None None None 2022-12-02 20:39:23 UTC
OpenStack gerrit 866503 0 None NEW Handle child groups when building ceph spec from inventory 2022-12-02 21:36:44 UTC
Red Hat Issue Tracker OSP-20618 0 None None None 2022-12-01 13:28:47 UTC
Red Hat Product Errata RHEA-2023:4577 0 None None None 2023-08-16 01:13:40 UTC

Description pojadhav 2022-12-01 13:20:15 UTC
Description of problem:

pipeline_integration-pcci-17.1_dlrn-rhel-9.0-virthost-3cont_2comp_3ceph-ipv4-geneve-ceph failed consistently with error Create Ceph spec based on baremetal_deployed_path and tripleo_roles with below traceback :

2022-12-01 07:20:33.491271 | 5254000e-1e92-b58e-24eb-000000000018 |       TASK | Create Ceph spec based on baremetal_deployed_path and tripleo_roles
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: KeyError: 'hosts'
2022-12-01 07:20:34.051011 | 5254000e-1e92-b58e-24eb-000000000018 |      FATAL | Create Ceph spec based on baremetal_deployed_path and tripleo_roles | undercloud | error={"changed": false, "module_stderr": "Traceback (most recent call last):\n  File \"<stdin>\", line 107, in <module>\n  File \"<stdin>\", line 99, in _ansiballz_main\n  File \"<stdin>\", line 47, in invoke_module\n  File \"/usr/lib64/python3.9/runpy.py\", line 210, in run_module\n    return _run_module_code(code, init_globals, run_name, mod_spec)\n  File \"/usr/lib64/python3.9/runpy.py\", line 97, in _run_module_code\n    _run_code(code, mod_globals, init_globals,\n  File \"/usr/lib64/python3.9/runpy.py\", line 87, in _run_code\n    exec(code, run_globals)\n  File \"/tmp/ansible_ceph_spec_bootstrap_payload_z3x_t18l/ansible_ceph_spec_bootstrap_payload.zip/ansible/modules/ceph_spec_bootstrap.py\", line 576, in <module>\n  File \"/tmp/ansible_ceph_spec_bootstrap_payload_z3x_t18l/ansible_ceph_spec_bootstrap_payload.zip/ansible/modules/ceph_spec_bootstrap.py\", line 556, in main\n  File \"/tmp/ansible_ceph_spec_bootstrap_payload_z3x_t18l/ansible_ceph_spec_bootstrap_payload.zip/ansible/modules/ceph_spec_bootstrap.py\", line 179, in get_inventory_hosts_to_ips\nKeyError: 'hosts'\n", "module_stdout": "", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 1}
2022-12-01 07:20:34.051577 | 5254000e-1e92-b58e-24eb-000000000018 |     TIMING | Create Ceph spec based on baremetal_deployed_path and tripleo_roles | undercloud | 0:00:00.830279 | 0.56s


Version-Release number of selected component (if applicable):

rhos17.1 on rhel9

Comment 3 John Fulton 2022-12-01 13:46:41 UTC
Looks like https://access.redhat.com/solutions/6965325

I'll look at the logs a little closer.

Comment 4 John Fulton 2022-12-01 20:22:02 UTC
I reproduced the problem:

1. download the inventory [1] as tripleo-ansible-inventory.yaml to ~
2. download the deployed metal file [2] as overcloud-baremetal-deployed.yaml to ~
3. openstack overcloud ceph spec overcloud-baremetal-deployed.yaml -y -o spec.yaml --working-dir ~

I see the exact same error when I pass the same inputs from the job to the command in step 3.

Now, let's fix the problem by editing tripleo-ansible-inventory.yaml as described in the KCS [3]. 
  
$ vim tripleo-ansible-inventory.yaml
$ diff -u tripleo-ansible-inventory.yaml.bad tripleo-ansible-inventory.yaml | curl -F 'f:1=<-' ix.io
http://ix.io/4hoP

Note the diff: I removed the 'overcloud_' and now this command doesn't fail.

  openstack overcloud ceph spec overcloud-baremetal-deployed.yaml -y -o spec.yaml --working-dir .

The provided inventory violates that hostnames in the HostnameMap, in the overcloud-baremetal-deployed.yaml, must match the role name in the ansible inventory.

I wonder why this changed between the 17.0 and 17.1 composes. Either something changed in IR or how the job is set up OR maybe it's related to some inventory code that changed [4].

Is "<stack>_" stupposed to be appended to the inventory now as the new normal?

[1] http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/pipeline_integration-pcci-17.1_dlrn-rhel-9.0-virthost-3cont_2comp_3ceph-ipv4-geneve-ceph/136/undercloud-0/home/stack/overcloud-deploy/overcloud/tripleo-ansible-inventory.yaml.gz

[2] http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/pipeline_integration-pcci-17.1_dlrn-rhel-9.0-virthost-3cont_2comp_3ceph-ipv4-geneve-ceph/136/undercloud-0/home/stack/templates/overcloud-baremetal-deployed.yaml.gz

[3] https://access.redhat.com/solutions/6965325

[4] https://review.opendev.org/c/openstack/tripleo-common/+/861804

Comment 5 John Fulton 2022-12-01 20:25:09 UTC
Created attachment 1929130 [details]
changes made to inventory to make symptoms of bug go away

If you remove "<stack>_" from the inventory as in this diff, then the ansible module doesn't fail.

Comment 6 Luigi Toscano 2022-12-02 15:35:27 UTC
(In reply to John Fulton from comment #4)
> The provided inventory violates that hostnames in the HostnameMap, in the
> overcloud-baremetal-deployed.yaml, must match the role name in the ansible
> inventory.
> 
> I wonder why this changed between the 17.0 and 17.1 composes. Either
> something changed in IR or how the job is set up OR maybe it's related to
> some inventory code that changed [4].
> [...]
> [4] https://review.opendev.org/c/openstack/tripleo-common/+/861804

I haven't notice any infrared change related to that from the last time that specific workflow was working (last week) and now.
I would investigate the tripleo change.

Comment 7 John Fulton 2022-12-02 21:36:44 UTC
(In reply to Luigi Toscano from comment #6)
> (In reply to John Fulton from comment #4)
> > The provided inventory violates that hostnames in the HostnameMap, in the
> > overcloud-baremetal-deployed.yaml, must match the role name in the ansible
> > inventory.
> > 
> > I wonder why this changed between the 17.0 and 17.1 composes. Either
> > something changed in IR or how the job is set up OR maybe it's related to
> > some inventory code that changed [4].
> > [...]
> > [4] https://review.opendev.org/c/openstack/tripleo-common/+/861804
> 
> I haven't notice any infrared change related to that from the last time that
> specific workflow was working (last week) and now.
> I would investigate the tripleo change.

Yes.

 https://review.opendev.org/c/openstack/tripleo-ansible/+/866503

Comment 20 Jenny-Anne Lynch 2023-06-07 16:45:21 UTC
Hi Manoj, I edited the doc text for 17.1 beta release notes. Feel free to change if needed. Thanks.

Comment 28 errata-xmlrpc 2023-08-16 01:12:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.1 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2023:4577


Note You need to log in before you can comment on or make changes to this bug.