Bug 1895756 - [FFU 13-16.1][Ceph] During upgrade_tasks on bootstrap node ansible is unable to find ceph-mon if controller has FQDN hostname
Summary: [FFU 13-16.1][Ceph] During upgrade_tasks on bootstrap node ansible is unable ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 16.1 (Train)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: z4
: 16.1 (Train on RHEL 8.2)
Assignee: Francesco Pantano
QA Contact: Yogev Rabl
URL:
Whiteboard:
Depends On:
Blocks: 1911669
TreeView+ depends on / blocked
 
Reported: 2020-11-09 01:04 UTC by David Sedgmen
Modified: 2024-06-13 23:22 UTC (History)
6 users (show)

Fixed In Version: openstack-tripleo-heat-templates-11.3.2-1.20201114031850.el8ost
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-03-17 15:35:39 UTC
Target Upstream Version:
Embargoed:
yrabl: automate_bug+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1903537 0 None None None 2020-11-09 14:31:59 UTC
OpenStack gerrit 761930 0 None MERGED Properly compute hostname when looking for the ceph-mon container 2021-02-17 17:16:22 UTC
OpenStack gerrit 762283 0 None MERGED Properly compute hostname when looking for the ceph-mon container 2021-02-17 17:16:22 UTC
OpenStack gerrit 762284 0 None MERGED Properly compute hostname when looking for the ceph-mon container 2021-02-17 17:16:22 UTC
OpenStack gerrit 762285 0 None MERGED Properly compute hostname when looking for the ceph-mon container 2021-02-17 17:16:22 UTC
OpenStack gerrit 767382 0 None MERGED Rely on the HOSTNAME var to resolve the mon container name 2021-02-17 17:16:22 UTC
OpenStack gerrit 767491 0 None MERGED Rely on the HOSTNAME var to resolve the mon container name 2021-02-17 17:16:22 UTC
Red Hat Product Errata RHBA-2021:0817 0 None None None 2021-03-17 15:36:03 UTC

Description David Sedgmen 2020-11-09 01:04:39 UTC
Description of problem:
When running the the upgrade_tasks and post_upgrade_tasks for ceph during FFU upgrade to RHOSP 16.1 will if ceph-mon-${HOSTNAME} does not match the containers name during system_upgrade on the bootstrap node. 

In a setup intergated with IdM this can be a FQDN, while the default and supported container name for ceph uses shortnames. 

How reproducible:
Everytime 


Steps to Reproduce:
1. Deploy RHOSP 13 Integrated with IdM using novajoin https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/integrate_with_identity_service/idm-novajoin

2. Follow the  Framework for Upgrades (13 to 16.1) documentation.


Actual results:

2020-10-23  04:31:40,455 p=5846 u=mistral n=ansible | failed: [controller-0 ->  192.168.24.33] (item=nobackfill) => {"ansible_loop_var": "item",  "changed": true, "cmd": "docker exec -u root ceph-mon-${HOSTNAME} ceph  osd set nobackfill", "delta":  "0:00:00.025770", "end": "2020-10-23 04:31:40.395243", "item":  "nobackfill", "msg": "non-zero return code", "rc": 1, "start":  "2020-10-23 04:31:40.369473", "stderr": "Error response from daemon: No  such container: ceph-mon-controller-0.redhat.local",  "stderr_lines": ["Error response from daemon: No such container:  ceph-mon-controller-0.redhat.local"], "stdout": "", "stdout_lines": []}
2020-10-23  04:31:40,658 p=5846 u=mistral n=ansible | failed: [controller-0 ->  192.168.24.33] (item=norebalance) => {"ansible_loop_var": "item",  "changed": true, "cmd": "docker exec -u root ceph-mon-${HOSTNAME} ceph osd set norebalance", "delta":  "0:00:00.028196", "end": "2020-10-23 04:31:40.612835", "item":  "norebalance", "msg": "non-zero return code", "rc": 1, "start":  "2020-10-23 04:31:40.584639", "stderr": "Error response from daemon: No  such container: ceph-mon-controller-0.redhat.local",  "stderr_lines": ["Error response from daemon: No such container:  ceph-mon-controller-0.redhat.local"], "stdout": "", "stdout_lines": []}
2020-10-23  04:31:40,860 p=5846 u=mistral n=ansible | failed: [controller-0 ->  192.168.24.33] (item=nodeep-scrub) => {"ansible_loop_var": "item",  "changed": true, "cmd": "docker exec -u root ceph-mon-${HOSTNAME} ceph  osd set nodeep-scrub", "delta":  "0:00:00.025621", "end": "2020-10-23 04:31:40.821615", "item":  "nodeep-scrub", "msg": "non-zero return code", "rc": 1, "start":  "2020-10-23 04:31:40.795994", "stderr": "Error response from daemon: No  such container: ceph-mon-controller-0.redhat.local",  "stderr_lines": ["Error response from daemon: No such container:  ceph-mon-controller-0.redhat.local"], "stdout": "", "stdout_lines": []}

[heat-admin@controller-2 ~]$ sudo podman ps |grep ceph
910bce71e597  registry.redhat.io/rhceph/rhceph-4-rhel8:4-33                                                     4 days ago   Up 4 days ago          ceph-rgw-controller-2-rgw0
5486c84877f9  registry.redhat.io/rhceph/rhceph-4-rhel8:4-33                                                     4 days ago   Up 4 days ago          ceph-mds-controller-2
5de932b1ef1e  registry.redhat.io/rhceph/rhceph-4-rhel8:4-33                                                     4 days ago   Up 4 days ago          ceph-mgr-controller-2
3030ca655de4  registry.redhat.io/rhceph/rhceph-4-rhel8:4-33                                                     4 days ago   Up 4 days ago          ceph-mon-controller-2


Expected results:

For the playbook to be able to find the container if it has either the host shortname or FQDN


Additional info:
Will affect any deployment where ${HOSTNAME} returns a FQDN  and does not have "ceph_use_fqdn: true"

https://github.com/openstack/tripleo-heat-templates/blob/stable/train/deployment/ceph-ansible/ceph-osd.yaml#L87
https://github.com/openstack/tripleo-heat-templates/blob/stable/train/deployment/ceph-ansible/ceph-osd.yaml#L114

Workaround the issue by getting the container name in a seperate task 

~~~
    601             - name: Get ceph-mon containers
    602               shell: "{{ container_cli }} ps -q -f name=ceph-mon"
    603               register: ceph_mon_result
    604             - name: Unset noout flag
    605               shell: "{{ container_cli }} exec -u root {{ ceph_mon_result.stdout }} ceph osd unset {{ item }}"
    606               become: true
    607               with_items:
    608                 - noout
    609                 - norecover
    610                 - nobackfill
    611                 - norebalance
    612                 - nodeep-scrub
    613               vars:
    614                 container_client: |-
    615                   {% set container_client = 'podman' %}
    616                   {%   if check_docker_cli.stat.exists|bool %}
    617                   {%     set container_client = 'docker' %}
    618                   {%   endif %}
    619                   {{ container_client }}
    620           delegate_to: "{{ ceph_mon_short_bootstrap_node_name }}"
    621           tags:
    622             - never
    623             - system_upgrade
    624             - system_upgrade_prepare
    625           when:
    626             - step|int == 1
    627             - upgrade_leapp_enabled
    628       post_upgrade_tasks:
    629         - name: Get ceph-mon containers
    630           shell: "{{ container_cli }} ps -q -f name=ceph-mon"
    631           register: ceph_mon_result
    632         - name: Unset noout flag
    633           shell: "{{ container_cli }} exec -u root {{ ceph_mon_result.stdout }} ceph osd unset {{ item }}"
    634           with_items:
    635             - noout
    636             - norecover
    637             - nobackfill
    638             - norebalance
    639             - nodeep-scrub
    640           when: step|int == 2
    641           become: true
  ~~~

Comment 17 Yogev Rabl 2021-02-12 15:39:50 UTC
Verified on openstack-tripleo-heat-templates-11.3.2-1.20210104205661.el8ost.noarch

Comment 22 errata-xmlrpc 2021-03-17 15:35:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.4 director bug fix advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0817


Note You need to log in before you can comment on or make changes to this bug.