Description of problem: RHOSP17.0 + RHCS5.x While deploying ceph before overcloud provisioning with 'osds_per_device' the deployment fails at the initial stage itself as ceph_spec_bootstrap.py doesn't seem to have support for it. The deployment fails with the following traceback: +++ Using module file /usr/share/ansible/plugins/modules/ceph_spec_bootstrap.py Pipelining is enabled. <localhost> ESTABLISH LOCAL CONNECTION FOR USER: stack <localhost> EXEC /bin/sh -c '/usr/bin/python3.9 && sleep 0' The full traceback is: Traceback (most recent call last): File "<stdin>", line 107, in <module> File "<stdin>", line 99, in _ansiballz_main File "<stdin>", line 47, in invoke_module File "/usr/lib64/python3.9/runpy.py", line 210, in run_module return _run_module_code(code, init_globals, run_name, mod_spec) File "/usr/lib64/python3.9/runpy.py", line 97, in _run_module_code _run_code(code, mod_globals, init_globals, File "/usr/lib64/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/tmp/ansible_ceph_spec_bootstrap_payload_i7y1vbj4/ansible_ceph_spec_bootstrap_payload.zip/ansible/modules/ceph_spec_bootstrap.py", line 502, in <module> File "/tmp/ansible_ceph_spec_bootstrap_payload_i7y1vbj4/ansible_ceph_spec_bootstrap_payload.zip/ansible/modules/ceph_spec_bootstrap.py", line 489, in main File "/tmp/ansible_ceph_spec_bootstrap_payload_i7y1vbj4/ansible_ceph_spec_bootstrap_payload.zip/ansible/modules/ceph_spec_bootstrap.py", line 363, in get_specs File "/tmp/ansible_ceph_spec_bootstrap_payload_i7y1vbj4/ansible_ceph_spec_bootstrap_payload.zip/ansible/module_utils/ceph_spec.py", line 230, in make_daemon_spec Exception: Fatal: the spec should be composed by only allowed keywords 2022-06-27 10:05:50.647505 | 000af799-5956-c9fb-316e-00000000001e | FATAL | Create Ceph spec based on baremetal_deployed_path and tripleo_roles | undercloud | error={ "changed": false, "module_stderr": "Traceback (most recent call last):\n File \"<stdin>\", line 107, in <module>\n File \"<stdin>\", line 99, in _ansiballz_main\n File \"<stdin>\", line 47, in invoke_mo dule\n File \"/usr/lib64/python3.9/runpy.py\", line 210, in run_module\n return _run_module_code(code, init_globals, run_name, mod_spec)\n File \"/usr/lib64/python3.9/runpy.py\", line 97 , in _run_module_code\n _run_code(code, mod_globals, init_globals,\n File \"/usr/lib64/python3.9/runpy.py\", line 87, in _run_code\n exec(code, run_globals)\n File \"/tmp/ansible_ceph _spec_bootstrap_payload_i7y1vbj4/ansible_ceph_spec_bootstrap_payload.zip/ansible/modules/ceph_spec_bootstrap.py\", line 502, in <module>\n File \"/tmp/ansible_ceph_spec_bootstrap_payload_i7y 1vbj4/ansible_ceph_spec_bootstrap_payload.zip/ansible/modules/ceph_spec_bootstrap.py\", line 489, in main\n File \"/tmp/ansible_ceph_spec_bootstrap_payload_i7y1vbj4/ansible_ceph_spec_bootstr ap_payload.zip/ansible/modules/ceph_spec_bootstrap.py\", line 363, in get_specs\n File \"/tmp/ansible_ceph_spec_bootstrap_payload_i7y1vbj4/ansible_ceph_spec_bootstrap_payload.zip/ansible/mod ule_utils/ceph_spec.py\", line 230, in make_daemon_spec\nException: Fatal: the spec should be composed by only allowed keywords\n", "module_stdout": "", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 1 } +++ Here is the deployment command in use: openstack overcloud ceph deploy templates-output/overcloud-node-deployed.yaml --osd-spec osd-spec.yml --network-data templates/custom_network_data.yaml --roles-data templates/roles_data_nfs.yaml --container-image-prepare container-prepare-parameter.yaml -o template-output/deployed-ceph.yaml -vv && the osd spec file: # cat osd-spec.yml cat osd-spec.yml data_devices: paths: - /dev/nvme0n1 - /dev/nvme1n1 - /dev/nvme2n1 osds_per_device: 2 Version-Release number of selected component (if applicable): tripleo-ansible-3.3.1-0.20220307013244.130185a.el9ost.noarch How reproducible: Deploy ceph with osds_per_device specified in the spec file and the deployment fails with the mentioned error. Steps to Reproduce: 1. 2. 3. Actual results: Deployment fails. Expected results: Deployment should complete. Additional info: Deployment completes without 'osds_per_device' being used in the spec file.
We found the following snippet from a larger ceph spec [1] was valid when applied with 'ceph orch apply -i ': (undercloud) [stack@undercloud ~]$ cat osd-spec.yml data_devices: model: 'SAMSUNG' osds_per_device: 2 (undercloud) [stack@undercloud ~]$ However, when we pass 'openstack overcloud ceph deploy --osd-spec osd-spec.yml ', it fails [3] on the /usr/share/ansible/plugins/modules/ceph_spec_bootstrap.py Perhaps the ALLOWED_EXTRA_KEYS is too limited: https://github.com/openstack/tripleo-ansible/blob/stable/wallaby/tripleo_ansible/ansible_plugins/module_utils/ceph_spec.py#L33-L40 [1] """ service_type: alertmanager service_name: alertmanager placement: hosts: - controller-0 - controller-1 - controller-2 networks: - 192.17.3.0/24 spec: port: 9093 --- service_type: crash service_name: crash placement: host_pattern: '*' --- service_type: grafana service_name: grafana placement: hosts: - controller-0 - controller-1 - controller-2 networks: - 192.17.3.0/24 spec: port: 3100 --- service_type: mds service_id: mds service_name: mds.mds placement: hosts: - controller-0 - controller-1 - controller-2 --- service_type: mgr service_name: mgr placement: hosts: - controller-0 - controller-1 - controller-2 --- service_type: mon service_name: mon placement: hosts: - controller-0 - controller-1 - controller-2 --- service_type: node-exporter service_name: node-exporter placement: host_pattern: '*' networks: - 192.17.3.0/24 --- service_type: osd service_id: default_drive_group service_name: osd.default_drive_group placement: hosts: - cephstorage-0 - cephstorage-1 - cephstorage-3 - cephstorage-4 spec: data_devices: paths: - /dev/nvme0n1 - /dev/nvme1n1 - /dev/nvme2n1 filter_logic: AND objectstore: bluestore --- service_type: osd service_id: default_drive_group_vendor service_name: osd.default_drive_group_vendor placement: hosts: - cephstorage-2 spec: data_devices: model: SAMSUNG filter_logic: AND objectstore: bluestore osds_per_device: 2 --- service_type: prometheus service_name: prometheus placement: hosts: - controller-0 - controller-1 - controller-2 networks: - 192.17.3.0/24 spec: port: 9092 --- service_type: rgw service_id: rgw service_name: rgw.rgw placement: hosts: - controller-0 - controller-1 - controller-2 networks: - 192.17.3.0/24 spec: rgw_frontend_port: 8080 rgw_realm: default rgw_zone: default """ [2] openstack overcloud ceph deploy templates-output/overcloud-node-deployed.yaml --osd-spec osd-spec.yml --network-data templates/custom_network_data.yaml --roles-data templates/roles_data_nfs.yaml --container-image-prepare container-prepare-parameter.yaml -o templates-output/deployed-ceph.yaml -vv --skip-hosts-config --skip-user-create --skip-container-registry-config --yes [3] 2022-06-29 12:30:35.378849 | 000af799-5956-0b22-d49e-00000000001e | TASK | Create Ceph spec based on baremetal_deployed_path and tripleo_roles Using module file /usr/share/ansible/plugins/modules/ceph_spec_bootstrap.py Pipelining is enabled. <localhost> ESTABLISH LOCAL CONNECTION FOR USER: stack <localhost> EXEC /bin/sh -c '/usr/bin/python3.9 && sleep 0' The full traceback is: Traceback (most recent call last): File "<stdin>", line 107, in <module> File "<stdin>", line 99, in _ansiballz_main File "<stdin>", line 47, in invoke_module File "/usr/lib64/python3.9/runpy.py", line 210, in run_module return _run_module_code(code, init_globals, run_name, mod_spec) File "/usr/lib64/python3.9/runpy.py", line 97, in _run_module_code _run_code(code, mod_globals, init_globals, File "/usr/lib64/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/tmp/ansible_ceph_spec_bootstrap_payload__x2eulxb/ansible_ceph_spec_bootstrap_payload.zip/ansible/modules/ceph_spec_bootstrap.py", line 502, in <module> File "/tmp/ansible_ceph_spec_bootstrap_payload__x2eulxb/ansible_ceph_spec_bootstrap_payload.zip/ansible/modules/ceph_spec_bootstrap.py", line 489, in main File "/tmp/ansible_ceph_spec_bootstrap_payload__x2eulxb/ansible_ceph_spec_bootstrap_payload.zip/ansible/modules/ceph_spec_bootstrap.py", line 363, in get_specs File "/tmp/ansible_ceph_spec_bootstrap_payload__x2eulxb/ansible_ceph_spec_bootstrap_payload.zip/ansible/module_utils/ceph_spec.py", line 230, in make_daemon_spec Exception: Fatal: the spec should be composed by only allowed keywords 2022-06-29 12:30:35.938975 | 000af799-5956-0b22-d49e-00000000001e | FATAL | Create Ceph spec based on baremetal_deployed_path and tripleo_roles | undercloud | error={ "changed": false, "module_stderr": "Traceback (most recent call last):\n File \"<stdin>\", line 107, in <module>\n File \"<stdin>\", line 99, in _ansiballz_main\n File \"<stdin>\", line 47, in invoke_m odule\n File \"/usr/lib64/python3.9/runpy.py\", line 210, in run_module\n return _run_module_code(code, init_globals, run_name, mod_spec)\n File \"/usr/lib64/python3.9/runpy.py\", line 97, in _run_module_code\n _run_code(code, mod_globals, init_globals,\n File \"/usr/lib64/python3.9/runpy.py\", line 87, in _run_code\n exec(code, run_globals)\n File \"/tmp/ansible_c eph_spec_bootstrap_payload__x2eulxb/ansible_ceph_spec_bootstrap_payload.zip/ansible/modules/ceph_spec_bootstrap.py\", line 502, in <module>\n File \"/tmp/ansible_ceph_spec_bootstrap_payload __x2eulxb/ansible_ceph_spec_bootstrap_payload.zip/ansible/modules/ceph_spec_bootstrap.py\", line 489, in main\n File \"/tmp/ansible_ceph_spec_bootstrap_payload__x2eulxb/ansible_ceph_spec_bo otstrap_payload.zip/ansible/modules/ceph_spec_bootstrap.py\", line 363, in get_specs\n File \"/tmp/ansible_ceph_spec_bootstrap_payload__x2eulxb/ansible_ceph_spec_bootstrap_payload.zip/ansib le/module_utils/ceph_spec.py\", line 230, in make_daemon_spec\nException: Fatal: the spec should be composed by only allowed keywords\n", "module_stdout": "", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 1 }
WORKAROUND: (undercloud) [stack@undercloud ~]$ sudo vim /usr/share/ansible/plugins/module_utils/ceph_spec.py (undercloud) [stack@undercloud ~]$ grep osds_per_device !$ grep osds_per_device /usr/share/ansible/plugins/module_utils/ceph_spec.py 'osds_per_device' (undercloud) [stack@undercloud ~]$ grep -B 1 -A 1 osds_per_device /usr/share/ansible/plugins/module_utils/ceph_spec.py 'encrypted', 'osds_per_device' ] (undercloud) [stack@undercloud ~]$ after the above the same command worked. However, the bug fix for this is more than the above. We're hard-coding the allowed parameters and will always be behind as cephadm adds new features to its spec. We need to be more flexible.
To test that this fix works, do a deployment with the --osd-spec flag like this: openstack overcloud ceph deploy --osd-spec osd-spec.yml Where osd-spec.yml contains the following: (undercloud) [stack@undercloud ~]$ cat osd-spec.yml data_devices: all: true osds_per_device: 2 (undercloud) [stack@undercloud ~]$ The osds_per_device entry above is what would cause the bug; but as per the fix the, ceph deployment should not fail.
verified. Ceph deployment completed successfully with the parameters set by osd specs
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2022:6543