Description of problem: When running the the upgrade_tasks and post_upgrade_tasks for ceph during FFU upgrade to RHOSP 16.1 will if ceph-mon-${HOSTNAME} does not match the containers name during system_upgrade on the bootstrap node. In a setup intergated with IdM this can be a FQDN, while the default and supported container name for ceph uses shortnames. How reproducible: Everytime Steps to Reproduce: 1. Deploy RHOSP 13 Integrated with IdM using novajoin https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/integrate_with_identity_service/idm-novajoin 2. Follow the Framework for Upgrades (13 to 16.1) documentation. Actual results: 2020-10-23 04:31:40,455 p=5846 u=mistral n=ansible | failed: [controller-0 -> 192.168.24.33] (item=nobackfill) => {"ansible_loop_var": "item", "changed": true, "cmd": "docker exec -u root ceph-mon-${HOSTNAME} ceph osd set nobackfill", "delta": "0:00:00.025770", "end": "2020-10-23 04:31:40.395243", "item": "nobackfill", "msg": "non-zero return code", "rc": 1, "start": "2020-10-23 04:31:40.369473", "stderr": "Error response from daemon: No such container: ceph-mon-controller-0.redhat.local", "stderr_lines": ["Error response from daemon: No such container: ceph-mon-controller-0.redhat.local"], "stdout": "", "stdout_lines": []} 2020-10-23 04:31:40,658 p=5846 u=mistral n=ansible | failed: [controller-0 -> 192.168.24.33] (item=norebalance) => {"ansible_loop_var": "item", "changed": true, "cmd": "docker exec -u root ceph-mon-${HOSTNAME} ceph osd set norebalance", "delta": "0:00:00.028196", "end": "2020-10-23 04:31:40.612835", "item": "norebalance", "msg": "non-zero return code", "rc": 1, "start": "2020-10-23 04:31:40.584639", "stderr": "Error response from daemon: No such container: ceph-mon-controller-0.redhat.local", "stderr_lines": ["Error response from daemon: No such container: ceph-mon-controller-0.redhat.local"], "stdout": "", "stdout_lines": []} 2020-10-23 04:31:40,860 p=5846 u=mistral n=ansible | failed: [controller-0 -> 192.168.24.33] (item=nodeep-scrub) => {"ansible_loop_var": "item", "changed": true, "cmd": "docker exec -u root ceph-mon-${HOSTNAME} ceph osd set nodeep-scrub", "delta": "0:00:00.025621", "end": "2020-10-23 04:31:40.821615", "item": "nodeep-scrub", "msg": "non-zero return code", "rc": 1, "start": "2020-10-23 04:31:40.795994", "stderr": "Error response from daemon: No such container: ceph-mon-controller-0.redhat.local", "stderr_lines": ["Error response from daemon: No such container: ceph-mon-controller-0.redhat.local"], "stdout": "", "stdout_lines": []} [heat-admin@controller-2 ~]$ sudo podman ps |grep ceph 910bce71e597 registry.redhat.io/rhceph/rhceph-4-rhel8:4-33 4 days ago Up 4 days ago ceph-rgw-controller-2-rgw0 5486c84877f9 registry.redhat.io/rhceph/rhceph-4-rhel8:4-33 4 days ago Up 4 days ago ceph-mds-controller-2 5de932b1ef1e registry.redhat.io/rhceph/rhceph-4-rhel8:4-33 4 days ago Up 4 days ago ceph-mgr-controller-2 3030ca655de4 registry.redhat.io/rhceph/rhceph-4-rhel8:4-33 4 days ago Up 4 days ago ceph-mon-controller-2 Expected results: For the playbook to be able to find the container if it has either the host shortname or FQDN Additional info: Will affect any deployment where ${HOSTNAME} returns a FQDN and does not have "ceph_use_fqdn: true" https://github.com/openstack/tripleo-heat-templates/blob/stable/train/deployment/ceph-ansible/ceph-osd.yaml#L87 https://github.com/openstack/tripleo-heat-templates/blob/stable/train/deployment/ceph-ansible/ceph-osd.yaml#L114 Workaround the issue by getting the container name in a seperate task ~~~ 601 - name: Get ceph-mon containers 602 shell: "{{ container_cli }} ps -q -f name=ceph-mon" 603 register: ceph_mon_result 604 - name: Unset noout flag 605 shell: "{{ container_cli }} exec -u root {{ ceph_mon_result.stdout }} ceph osd unset {{ item }}" 606 become: true 607 with_items: 608 - noout 609 - norecover 610 - nobackfill 611 - norebalance 612 - nodeep-scrub 613 vars: 614 container_client: |- 615 {% set container_client = 'podman' %} 616 {% if check_docker_cli.stat.exists|bool %} 617 {% set container_client = 'docker' %} 618 {% endif %} 619 {{ container_client }} 620 delegate_to: "{{ ceph_mon_short_bootstrap_node_name }}" 621 tags: 622 - never 623 - system_upgrade 624 - system_upgrade_prepare 625 when: 626 - step|int == 1 627 - upgrade_leapp_enabled 628 post_upgrade_tasks: 629 - name: Get ceph-mon containers 630 shell: "{{ container_cli }} ps -q -f name=ceph-mon" 631 register: ceph_mon_result 632 - name: Unset noout flag 633 shell: "{{ container_cli }} exec -u root {{ ceph_mon_result.stdout }} ceph osd unset {{ item }}" 634 with_items: 635 - noout 636 - norecover 637 - nobackfill 638 - norebalance 639 - nodeep-scrub 640 when: step|int == 2 641 become: true ~~~
Verified on openstack-tripleo-heat-templates-11.3.2-1.20210104205661.el8ost.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 16.1.4 director bug fix advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:0817