Description of problem: pipeline_integration-pcci-17.1_dlrn-rhel-9.0-virthost-3cont_2comp_3ceph-ipv4-geneve-ceph failed consistently with error Create Ceph spec based on baremetal_deployed_path and tripleo_roles with below traceback : 2022-12-01 07:20:33.491271 | 5254000e-1e92-b58e-24eb-000000000018 | TASK | Create Ceph spec based on baremetal_deployed_path and tripleo_roles An exception occurred during task execution. To see the full traceback, use -vvv. The error was: KeyError: 'hosts' 2022-12-01 07:20:34.051011 | 5254000e-1e92-b58e-24eb-000000000018 | FATAL | Create Ceph spec based on baremetal_deployed_path and tripleo_roles | undercloud | error={"changed": false, "module_stderr": "Traceback (most recent call last):\n File \"<stdin>\", line 107, in <module>\n File \"<stdin>\", line 99, in _ansiballz_main\n File \"<stdin>\", line 47, in invoke_module\n File \"/usr/lib64/python3.9/runpy.py\", line 210, in run_module\n return _run_module_code(code, init_globals, run_name, mod_spec)\n File \"/usr/lib64/python3.9/runpy.py\", line 97, in _run_module_code\n _run_code(code, mod_globals, init_globals,\n File \"/usr/lib64/python3.9/runpy.py\", line 87, in _run_code\n exec(code, run_globals)\n File \"/tmp/ansible_ceph_spec_bootstrap_payload_z3x_t18l/ansible_ceph_spec_bootstrap_payload.zip/ansible/modules/ceph_spec_bootstrap.py\", line 576, in <module>\n File \"/tmp/ansible_ceph_spec_bootstrap_payload_z3x_t18l/ansible_ceph_spec_bootstrap_payload.zip/ansible/modules/ceph_spec_bootstrap.py\", line 556, in main\n File \"/tmp/ansible_ceph_spec_bootstrap_payload_z3x_t18l/ansible_ceph_spec_bootstrap_payload.zip/ansible/modules/ceph_spec_bootstrap.py\", line 179, in get_inventory_hosts_to_ips\nKeyError: 'hosts'\n", "module_stdout": "", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 1} 2022-12-01 07:20:34.051577 | 5254000e-1e92-b58e-24eb-000000000018 | TIMING | Create Ceph spec based on baremetal_deployed_path and tripleo_roles | undercloud | 0:00:00.830279 | 0.56s Version-Release number of selected component (if applicable): rhos17.1 on rhel9
Looks like https://access.redhat.com/solutions/6965325 I'll look at the logs a little closer.
I reproduced the problem: 1. download the inventory [1] as tripleo-ansible-inventory.yaml to ~ 2. download the deployed metal file [2] as overcloud-baremetal-deployed.yaml to ~ 3. openstack overcloud ceph spec overcloud-baremetal-deployed.yaml -y -o spec.yaml --working-dir ~ I see the exact same error when I pass the same inputs from the job to the command in step 3. Now, let's fix the problem by editing tripleo-ansible-inventory.yaml as described in the KCS [3]. $ vim tripleo-ansible-inventory.yaml $ diff -u tripleo-ansible-inventory.yaml.bad tripleo-ansible-inventory.yaml | curl -F 'f:1=<-' ix.io http://ix.io/4hoP Note the diff: I removed the 'overcloud_' and now this command doesn't fail. openstack overcloud ceph spec overcloud-baremetal-deployed.yaml -y -o spec.yaml --working-dir . The provided inventory violates that hostnames in the HostnameMap, in the overcloud-baremetal-deployed.yaml, must match the role name in the ansible inventory. I wonder why this changed between the 17.0 and 17.1 composes. Either something changed in IR or how the job is set up OR maybe it's related to some inventory code that changed [4]. Is "<stack>_" stupposed to be appended to the inventory now as the new normal? [1] http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/pipeline_integration-pcci-17.1_dlrn-rhel-9.0-virthost-3cont_2comp_3ceph-ipv4-geneve-ceph/136/undercloud-0/home/stack/overcloud-deploy/overcloud/tripleo-ansible-inventory.yaml.gz [2] http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/pipeline_integration-pcci-17.1_dlrn-rhel-9.0-virthost-3cont_2comp_3ceph-ipv4-geneve-ceph/136/undercloud-0/home/stack/templates/overcloud-baremetal-deployed.yaml.gz [3] https://access.redhat.com/solutions/6965325 [4] https://review.opendev.org/c/openstack/tripleo-common/+/861804
Created attachment 1929130 [details] changes made to inventory to make symptoms of bug go away If you remove "<stack>_" from the inventory as in this diff, then the ansible module doesn't fail.
(In reply to John Fulton from comment #4) > The provided inventory violates that hostnames in the HostnameMap, in the > overcloud-baremetal-deployed.yaml, must match the role name in the ansible > inventory. > > I wonder why this changed between the 17.0 and 17.1 composes. Either > something changed in IR or how the job is set up OR maybe it's related to > some inventory code that changed [4]. > [...] > [4] https://review.opendev.org/c/openstack/tripleo-common/+/861804 I haven't notice any infrared change related to that from the last time that specific workflow was working (last week) and now. I would investigate the tripleo change.
(In reply to Luigi Toscano from comment #6) > (In reply to John Fulton from comment #4) > > The provided inventory violates that hostnames in the HostnameMap, in the > > overcloud-baremetal-deployed.yaml, must match the role name in the ansible > > inventory. > > > > I wonder why this changed between the 17.0 and 17.1 composes. Either > > something changed in IR or how the job is set up OR maybe it's related to > > some inventory code that changed [4]. > > [...] > > [4] https://review.opendev.org/c/openstack/tripleo-common/+/861804 > > I haven't notice any infrared change related to that from the last time that > specific workflow was working (last week) and now. > I would investigate the tripleo change. Yes. https://review.opendev.org/c/openstack/tripleo-ansible/+/866503
Hi Manoj, I edited the doc text for 17.1 beta release notes. Feel free to change if needed. Thanks.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 17.1 (Wallaby)), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2023:4577