Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2149963

Summary: ceph_spec_bootstrap not handling children groups in tripleo-ansible-inventory
Product: Red Hat OpenStack Reporter: pojadhav
Component: tripleo-ansibleAssignee: Manoj Katari <mkatari>
Status: CLOSED ERRATA QA Contact: Yogev Rabl <yrabl>
Severity: medium Docs Contact:
Priority: medium    
Version: 17.1 (Wallaby)CC: alfrgarc, jjoyce, johfulto, jschluet, ltoscano, mandreou, mburns, mgarciac, mkatari, owalsh, prgutier, shrjoshi
Target Milestone: rcKeywords: Triaged
Target Release: 17.1   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: tripleo-ansible-3.3.1-1.20230518201534.el9ost Doc Type: Bug Fix
Doc Text:
Before this update, the cephadm utility did not process child groups when building specification files from inventory. With this update, specification file generation processes child groups.
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-08-16 01:12:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
changes made to inventory to make symptoms of bug go away none

Description pojadhav 2022-12-01 13:20:15 UTC
Description of problem:

pipeline_integration-pcci-17.1_dlrn-rhel-9.0-virthost-3cont_2comp_3ceph-ipv4-geneve-ceph failed consistently with error Create Ceph spec based on baremetal_deployed_path and tripleo_roles with below traceback :

2022-12-01 07:20:33.491271 | 5254000e-1e92-b58e-24eb-000000000018 |       TASK | Create Ceph spec based on baremetal_deployed_path and tripleo_roles
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: KeyError: 'hosts'
2022-12-01 07:20:34.051011 | 5254000e-1e92-b58e-24eb-000000000018 |      FATAL | Create Ceph spec based on baremetal_deployed_path and tripleo_roles | undercloud | error={"changed": false, "module_stderr": "Traceback (most recent call last):\n  File \"<stdin>\", line 107, in <module>\n  File \"<stdin>\", line 99, in _ansiballz_main\n  File \"<stdin>\", line 47, in invoke_module\n  File \"/usr/lib64/python3.9/runpy.py\", line 210, in run_module\n    return _run_module_code(code, init_globals, run_name, mod_spec)\n  File \"/usr/lib64/python3.9/runpy.py\", line 97, in _run_module_code\n    _run_code(code, mod_globals, init_globals,\n  File \"/usr/lib64/python3.9/runpy.py\", line 87, in _run_code\n    exec(code, run_globals)\n  File \"/tmp/ansible_ceph_spec_bootstrap_payload_z3x_t18l/ansible_ceph_spec_bootstrap_payload.zip/ansible/modules/ceph_spec_bootstrap.py\", line 576, in <module>\n  File \"/tmp/ansible_ceph_spec_bootstrap_payload_z3x_t18l/ansible_ceph_spec_bootstrap_payload.zip/ansible/modules/ceph_spec_bootstrap.py\", line 556, in main\n  File \"/tmp/ansible_ceph_spec_bootstrap_payload_z3x_t18l/ansible_ceph_spec_bootstrap_payload.zip/ansible/modules/ceph_spec_bootstrap.py\", line 179, in get_inventory_hosts_to_ips\nKeyError: 'hosts'\n", "module_stdout": "", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 1}
2022-12-01 07:20:34.051577 | 5254000e-1e92-b58e-24eb-000000000018 |     TIMING | Create Ceph spec based on baremetal_deployed_path and tripleo_roles | undercloud | 0:00:00.830279 | 0.56s


Version-Release number of selected component (if applicable):

rhos17.1 on rhel9

Comment 3 John Fulton 2022-12-01 13:46:41 UTC
Looks like https://access.redhat.com/solutions/6965325

I'll look at the logs a little closer.

Comment 4 John Fulton 2022-12-01 20:22:02 UTC
I reproduced the problem:

1. download the inventory [1] as tripleo-ansible-inventory.yaml to ~
2. download the deployed metal file [2] as overcloud-baremetal-deployed.yaml to ~
3. openstack overcloud ceph spec overcloud-baremetal-deployed.yaml -y -o spec.yaml --working-dir ~

I see the exact same error when I pass the same inputs from the job to the command in step 3.

Now, let's fix the problem by editing tripleo-ansible-inventory.yaml as described in the KCS [3]. 
  
$ vim tripleo-ansible-inventory.yaml
$ diff -u tripleo-ansible-inventory.yaml.bad tripleo-ansible-inventory.yaml | curl -F 'f:1=<-' ix.io
http://ix.io/4hoP

Note the diff: I removed the 'overcloud_' and now this command doesn't fail.

  openstack overcloud ceph spec overcloud-baremetal-deployed.yaml -y -o spec.yaml --working-dir .

The provided inventory violates that hostnames in the HostnameMap, in the overcloud-baremetal-deployed.yaml, must match the role name in the ansible inventory.

I wonder why this changed between the 17.0 and 17.1 composes. Either something changed in IR or how the job is set up OR maybe it's related to some inventory code that changed [4].

Is "<stack>_" stupposed to be appended to the inventory now as the new normal?

[1] http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/pipeline_integration-pcci-17.1_dlrn-rhel-9.0-virthost-3cont_2comp_3ceph-ipv4-geneve-ceph/136/undercloud-0/home/stack/overcloud-deploy/overcloud/tripleo-ansible-inventory.yaml.gz

[2] http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/pipeline_integration-pcci-17.1_dlrn-rhel-9.0-virthost-3cont_2comp_3ceph-ipv4-geneve-ceph/136/undercloud-0/home/stack/templates/overcloud-baremetal-deployed.yaml.gz

[3] https://access.redhat.com/solutions/6965325

[4] https://review.opendev.org/c/openstack/tripleo-common/+/861804

Comment 5 John Fulton 2022-12-01 20:25:09 UTC
Created attachment 1929130 [details]
changes made to inventory to make symptoms of bug go away

If you remove "<stack>_" from the inventory as in this diff, then the ansible module doesn't fail.

Comment 6 Luigi Toscano 2022-12-02 15:35:27 UTC
(In reply to John Fulton from comment #4)
> The provided inventory violates that hostnames in the HostnameMap, in the
> overcloud-baremetal-deployed.yaml, must match the role name in the ansible
> inventory.
> 
> I wonder why this changed between the 17.0 and 17.1 composes. Either
> something changed in IR or how the job is set up OR maybe it's related to
> some inventory code that changed [4].
> [...]
> [4] https://review.opendev.org/c/openstack/tripleo-common/+/861804

I haven't notice any infrared change related to that from the last time that specific workflow was working (last week) and now.
I would investigate the tripleo change.

Comment 7 John Fulton 2022-12-02 21:36:44 UTC
(In reply to Luigi Toscano from comment #6)
> (In reply to John Fulton from comment #4)
> > The provided inventory violates that hostnames in the HostnameMap, in the
> > overcloud-baremetal-deployed.yaml, must match the role name in the ansible
> > inventory.
> > 
> > I wonder why this changed between the 17.0 and 17.1 composes. Either
> > something changed in IR or how the job is set up OR maybe it's related to
> > some inventory code that changed [4].
> > [...]
> > [4] https://review.opendev.org/c/openstack/tripleo-common/+/861804
> 
> I haven't notice any infrared change related to that from the last time that
> specific workflow was working (last week) and now.
> I would investigate the tripleo change.

Yes.

 https://review.opendev.org/c/openstack/tripleo-ansible/+/866503

Comment 20 Jenny-Anne Lynch 2023-06-07 16:45:21 UTC
Hi Manoj, I edited the doc text for 17.1 beta release notes. Feel free to change if needed. Thanks.

Comment 28 errata-xmlrpc 2023-08-16 01:12:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.1 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2023:4577