Bug 2107914 - [Designate] ERROR: Can't run container designate_pool_manage - Overcloud error
Summary: [Designate] ERROR: Can't run container designate_pool_manage - Overcloud error
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: tripleo-ansible
Version: 17.0 (Wallaby)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: Alpha
: 17.0
Assignee: Brent Eagles
QA Contact: Lilach Avraham
URL:
Whiteboard:
: 2107913 (view as bug list)
Depends On: 2108651
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-07-17 14:52 UTC by Lilach Avraham
Modified: 2022-09-21 12:24 UTC (History)
6 users (show)

Fixed In Version: tripleo-ansible-3.3.1-0.20220720020859.fa5422f.el9ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-09-21 12:24:05 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-17664 0 None None None 2022-07-17 15:13:58 UTC
Red Hat Product Errata RHEA-2022:6543 0 None None None 2022-09-21 12:24:25 UTC

Internal Links: 2108651

Description Lilach Avraham 2022-07-17 14:52:03 UTC
Designate job [1] failing in Overcloud install stage with a ansible error:
 
2022-07-17 10:00:11.411772 | 5254006f-799c-d818-043e-00000000b7ee |       TASK | Create containers managed by Podman for /var/lib/tripleo-config/container-startup-config/step_5
2022-07-17 10:00:13.869057 |                                      |    WARNING | ERROR: Can't run container designate_pool_manage
stderr: time="2022-07-17T10:00:12Z" level=warning msg=" binary not found, container dns will not be enabled"
+ sudo -E kolla_set_configs
sudo: unable to send audit message: Operation not permitted
INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json
INFO:__main__:Validating config file
INFO:__main__:Kolla config strategy set to: COPY_ALWAYS
INFO:__main__:Copying service configuration files
INFO:__main__:Creating directory /etc/designate/private
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/designate/private/bind1.conf to /etc/designate/private/bind1.conf
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/designate/private/bind2.conf to /etc/designate/private/bind2.conf
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/designate/private/bind3.conf to /etc/designate/private/bind3.conf
INFO:__main__:Deleting /etc/designate/designate.conf
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/designate/designate.conf to /etc/designate/designate.conf
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/designate/pools.yaml to /etc/designate/pools.yaml
INFO:__main__:Creating directory /etc/my.cnf.d
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/my.cnf.d/tripleo.cnf to /etc/my.cnf.d/tripleo.cnf
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/rndc.key to /etc/rndc.key
INFO:__main__:Writing out command to execute
INFO:__main__:Setting permission for /var/log/designate
INFO:__main__:Setting permission for /var/log/designate/designate-api.log
INFO:__main__:Setting permission for /var/log/designate/producer.log
INFO:__main__:Setting permission for /var/log/designate/central.log
INFO:__main__:Setting permission for /var/log/designate/mdns.log
INFO:__main__:Setting permission for /var/log/designate/worker.log
++ cat /run_command
+ CMD='/usr/bin/bootstrap_host_exec designate_central su designate -s /bin/bash -c '\''/bin/designate-manage pool update'\'''
+ ARGS=
+ [[ ! -n '' ]]
+ . kolla_extend_start
+ echo 'Running command: '\''/usr/bin/bootstrap_host_exec designate_central su designate -s /bin/bash -c '\''/bin/designate-manage pool update'\'''\'''
+ exec /usr/bin/bootstrap_host_exec designate_central su designate -s /bin/bash -c ''\''/bin/designate-manage' pool 'update'\'''
2022-07-17 10:00:13.872874 | 5254006f-799c-d818-043e-00000000b7ee |      FATAL | Create containers managed by Podman for /var/lib/tripleo-config/container-startup-config/step_5 | controller-0 | error={"changed": false, "msg": "Failed containers: designate_pool_manage"}
2022-07-17 10:00:13.875127 | 5254006f-799c-d818-043e-00000000b7ee |     TIMING | tripleo_container_manage : Create containers managed by Podman for /var/lib/tripleo-config/container-startup-config/step_5 | controller-0 | 0:21:44.970046 | 2.46s
 
PLAY RECAP *********************************************************************
compute-0                  : ok=439  changed=183  unreachable=0    failed=0    skipped=233  rescued=0    ignored=0  
compute-1                  : ok=432  changed=183  unreachable=0    failed=0    skipped=236  rescued=0    ignored=0  
controller-0               : ok=539  changed=257  unreachable=0    failed=1    skipped=260  rescued=0    ignored=0  
controller-1               : ok=538  changed=252  unreachable=0    failed=0    skipped=267  rescued=0    ignored=0  
controller-2               : ok=537  changed=252  unreachable=0    failed=0    skipped=267  rescued=0    ignored=0  
localhost                  : ok=1    changed=0    unreachable=0    failed=0    skipped=2    rescued=0    ignored=0  
undercloud                 : ok=97   changed=39   unreachable=0    failed=0    skipped=10   rescued=0    ignored=0
 
 
Logs:
 
undercloud-0/home/stack/overcloud_install.log.gz Attached to BZ.
undercloud-0/home/stack/overcloud-deploy/overcloud/config-download/ansible-playbook-command.sh.gz Attached to BZ.
undercloud-0/home/stack/overcloud-deploy/overcloud/config-download/stdout.gz Attached to BZ.

Comment 5 Brent Eagles 2022-07-18 13:45:22 UTC
I'm reassigning to default. The problem is a mismatch between hostnames and node names. See http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-network-openstack-designate-17.0_director-rhel-virthost-3cont_2comp-ipv4-geneve/95/controller-0/etc/hostname.gz - controller 0 thinks it's controller 1 and vice versa. The bootstrap logic is running properly on controller-1, which has a hostname of controller-0. I can't really explain why the that pool manage container is the only one that breaks the deployment as the db_sync container also isn't run in the right place.

Comment 8 Brent Eagles 2022-07-18 23:44:09 UTC
Finding so far on 2107914

Some notes on my investigations so far:
- Examination of some of the other runs of the job posted in this bug show that there is at least one instance where the names of the log hosts match the hostnames of logs. This suggests that that unmatched hosts/ansible hosts might be unrelated. While still disconcerting, I am going to rate this a tepid lead for now and focus on other possibilities.
- To rule out a blatant coding error, I recreated a 3 controller/1 compute environment using upstream master and all configuration files were as expected. It is useful in that examining the differences in execution environments might yield clues.
- The three examples that I've looked at where the error occurs shows that the information that would be for controller-0 is instead getting the results for controller-2. This would normally suggest a fence post style error but this code largely uses iterator style addressing of collections, not indexing.

Some possibilities I'm examining:
- It may be that using set_fact and host_vars for dereferencing cannot be relied on "as is" when gathering data and referencing in multiple plays. There might be something additional required to make it more reliable. Note that up to now, we've actually been using this same mechanism but there have been some differences in what we reference (ie. the contents of groups.designate_bind instead of designate_bind_node_ips).
- Ansible version differences and execution strategies might be the source of some issue.
- I want to find out if the actual list of groups.designate_bind is valid where it is referenced - it's just a little odd that some of the data tends to be correct.

Comment 9 Brent Eagles 2022-07-19 12:39:23 UTC
I've been able to reproduce this in a downstream development environment and have pin-pointed where the error was introduced. I'm trying out some alternate implementations to get the desired result.

Comment 10 Gregory Thiemonge 2022-07-19 14:39:40 UTC
*** Bug 2107913 has been marked as a duplicate of this bug. ***

Comment 11 Brent Eagles 2022-07-19 14:56:33 UTC
Definitely looks like an ansible issue:

- hosts: localhost
  vars:
    some_data: [ 'hey', 'ho', 'lets go']
  tasks:
    - name: try this
      set_fact:
        well_this: "{{ item.1 }}"
      delegate_to: "{{ item.0 }}"
      delegate_facts: true
      loop: "{{ groups.designate_bind|zip(some_data)|list }}"

    - debug: 
        msg: "{{ hostvars[item].well_this }}"
      loop: "{{ groups.designate_bind }}"

(undercloud) [stack@undercloud-0 ~]$ ansible-playbook -i config-download/overcloud/tripleo-ansible-inventory.yaml hunh.yaml

PLAY [localhost] ******************************************************************************************************************************

TASK [Gathering Facts] ************************************************************************************************************************
ok: [localhost]

TASK [try this] *******************************************************************************************************************************
ok: [localhost -> controller-0(192.168.24.50)] => (item=['controller-0', 'hey'])
ok: [localhost -> controller-1(192.168.24.35)] => (item=['controller-1', 'ho'])
ok: [localhost -> controller-2(192.168.24.48)] => (item=['controller-2', 'lets go'])

TASK [debug] **********************************************************************************************************************************
ok: [localhost] => (item=controller-0) => {
    "msg": "lets go"
}
ok: [localhost] => (item=controller-1) => {
    "msg": "ho"
}
ok: [localhost] => (item=controller-2) => {
    "msg": "lets go"
}

PLAY RECAP ************************************************************************************************************************************
localhost                  : ok=3    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

Comment 12 Cédric Jeanneret 2022-07-19 15:14:23 UTC
It seems to be a bug in the precise version of Ansible you're using:

For instance, on my fc-35:
╭─cjeanner@marvin ~/tmp 
╰─$ ansible --version
ansible [core 2.11.12] 
  config file = /etc/ansible/ansible.cfg
  configured module search path = ['/home/cjeanner/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /home/cjeanner/.local/lib/python3.10/site-packages/ansible
  ansible collection location = /home/cjeanner/.ansible/collections:/usr/share/ansible/collections
  executable location = /home/cjeanner/.local/bin/ansible
  python version = 3.10.5 (main, Jun  9 2022, 00:00:00) [GCC 11.3.1 20220421 (Red Hat 11.3.1-2)]
  jinja version = 3.0.1
  libyaml = True

TASK [try this] *******************************************************************************************************************************************************************************************************************************
ok: [localhost -> builder1] => (item=['builder1', 'hey'])
ok: [localhost -> builder2] => (item=['builder2', 'ho'])
ok: [localhost -> builder3] => (item=['builder3', 'lets go'])

TASK [debug] **********************************************************************************************************************************************************************************************************************************
ok: [localhost] => (item=builder1) => {
    "msg": "hey"
}
ok: [localhost] => (item=builder2) => {
    "msg": "ho"
}
ok: [localhost] => (item=builder3) => {
    "msg": "lets go"
}



But running the exact same code in a cs9 container, with a tweaked venv using that 2.12.2:
ansible [core 2.12.2]
  config file = None
  configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /venv/lib64/python3.9/site-packages/ansible
  ansible collection location = /root/.ansible/collections:/usr/share/ansible/collections
  executable location = /venv/bin/ansible
  python version = 3.9.13 (main, Jun  9 2022, 00:00:00) [GCC 11.3.1 20220421 (Red Hat 11.3.1-2)]
  jinja version = 3.1.2
  libyaml = True

TASK [try this] *******************************************************************************************************************************************************************************************************************************
ok: [localhost -> builder1(localhost)] => (item=['builder1', 'hey'])
ok: [localhost -> builder2(localhost)] => (item=['builder2', 'ho'])
ok: [localhost -> builder3(localhost)] => (item=['builder3', 'lets go'])

TASK [debug] **********************************************************************************************************************************************************************************************************************************
ok: [localhost] => (item=builder1) => {
    "msg": "lets go"
}
ok: [localhost] => (item=builder2) => {
    "msg": "ho"
}
ok: [localhost] => (item=builder3) => {
    "msg": "lets go"
}

And, with the stock ansible-core provided by cs9 repositories:
ansible [core 2.13.1]
  config file = /etc/ansible/ansible.cfg
  configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python3.9/site-packages/ansible
  ansible collection location = /root/.ansible/collections:/usr/share/ansible/collections
  executable location = /usr/bin/ansible
  python version = 3.9.13 (main, Jun  9 2022, 00:00:00) [GCC 11.3.1 20220421 (Red Hat 11.3.1-2)]
  jinja version = 3.1.2
  libyaml = True

TASK [try this] *******************************************************************************************************************************************************************************************************************************
ok: [localhost -> builder1(localhost)] => (item=['builder1', 'hey'])
ok: [localhost -> builder2(localhost)] => (item=['builder2', 'ho'])
ok: [localhost -> builder3(localhost)] => (item=['builder3', 'lets go'])

TASK [debug] **********************************************************************************************************************************************************************************************************************************
ok: [localhost] => (item=builder1) => {
    "msg": "hey"
}
ok: [localhost] => (item=builder2) => {
    "msg": "ho"
}
ok: [localhost] => (item=builder3) => {
    "msg": "lets go"
}

Comment 13 Cédric Jeanneret 2022-07-19 15:38:13 UTC
2.12.3 is fine!

ansible [core 2.12.3]
  config file = None
  configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /venv/lib64/python3.9/site-packages/ansible
  ansible collection location = /root/.ansible/collections:/usr/share/ansible/collections
  executable location = /venv/bin/ansible
  python version = 3.9.13 (main, Jun  9 2022, 00:00:00) [GCC 11.3.1 20220421 (Red Hat 11.3.1-2)]
  jinja version = 3.1.2
  libyaml = True


TASK [try this] *******************************************************************************************************************************************************************************************************************************
ok: [localhost -> builder1(localhost)] => (item=['builder1', 'hey'])
ok: [localhost -> builder2(localhost)] => (item=['builder2', 'ho'])
ok: [localhost -> builder3(localhost)] => (item=['builder3', 'lets go'])

TASK [debug] **********************************************************************************************************************************************************************************************************************************
ok: [localhost] => (item=builder1) => {
    "msg": "hey"
}
ok: [localhost] => (item=builder2) => {
    "msg": "ho"
}
ok: [localhost] => (item=builder3) => {
    "msg": "lets go"
}

Comment 14 Cédric Jeanneret 2022-07-19 15:47:59 UTC
2.12.3 has this bugfix:     +- task_executor reverts the change to push facts into delegated vars on loop finalization as result managing code already handles this and was duplicating effort to wrong result.


It matches the issue we're seeing in 2.12.2 - so I guess that's really the one we want.

Commit hash: ae35fc04c3a2068b1d37efe813d1c6938b4f2634

If we could get it in a 2.12.2-2 downstream, that would be wonderful.

Comment 25 errata-xmlrpc 2022-09-21 12:24:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2022:6543


Note You need to log in before you can comment on or make changes to this bug.