Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2034765

Summary: tripleO ansible fails to provide IPA server failover during the overcloud installation
Product: Red Hat OpenStack Reporter: Donghwi Cha <dcha>
Component: tripleo-ansibleAssignee: Ade Lee <alee>
Status: CLOSED WONTFIX QA Contact: Joe H. Rahme <jhakimra>
Severity: high Docs Contact:
Priority: high    
Version: 16.2 (Train)CC: alee, astupnik, bshephar, dwilde, ggrasza, jinjli, pweeks, rhos-maint, sputhenp, udesale
Target Milestone: zstreamKeywords: Triaged, ZStream
Target Release: 16.2 (Train on RHEL 8.4)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-11-25 20:30:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2033935    
Bug Blocks: 2034512    

Description Donghwi Cha 2021-12-22 02:05:12 UTC
Description of problem:

When there are more than two IPA servers as a replicated cluster, 
TLSe application to RHOSP16.2 can fail because of LB failure. 

Case 1) when IPA server fixed to one server by IdMServer, IdMDomain: Single IPA server hostname will be passed to tripleo ansible playbook, but because of the both facts that the endpoint is fixed, and if that endpoint of IPA is down, then the ansible playbook will always ends with failure
Case 2) when IPA parameter  IdMServer, IdMDomain, are not provided on purpose to make use of DNS discovery in case of IPA server failure, gethostbyaddr face possibility of failure since this function does not care if the hostname is reachable or not

TripleO ansible needs improvement not to make ansible module fall in to the try-catch block, but to help ansible module find the right IPA host from IPA replication cluster without any failure or outage.

Version-Release number of selected component (if applicable): 16.2.1


How reproducible:
TLSe is applied when there are more than 2 IPA servers and one of them is down 

Steps to Reproduce:
1. setup single IPA replication cluster with more than 2 IPA servers
2. deploy and apply RHOSP16.2.1 that includes TLSe
3. find out the deployment fails when ansible playbook fails in case of failure of using IPA server that is down 

Actual results:


"2021-12-20 11:32:56,163 p=101181 u=mistral n=ansible | 2021-12-20 11:32:56.163432 |                                      |    WARNING | Module did not set no_log for random_password
2021-12-20 11:32:56,164 p=101181 u=mistral n=ansible | 2021-12-20 11:32:56.164471 | 525400ab-bd08-bd92-0123-000000001fb7 |      FATAL | add new host with random one-time password | undercloud | error={"changed": false, "msg": "host_find: Request failed: <urlopen error [Errno 113] No route to host>"}
2021-12-20 11:32:56,272 p=101181 u=mistral n=ansible | PLAY RECAP *********************************************************************
2021-12-20 11:32:56,273 p=101181 u=mistral n=ansible | exmaple-compute-1          : ok=151  changed=83   unreachable=0    failed=0    skipped=64   rescued=0    ignored=0
2021-12-20 11:32:56,273 p=101181 u=mistral n=ansible | exmaple-compute-2          : ok=147  changed=83   unreachable=0    failed=0    skipped=64   rescued=0    ignored=0
2021-12-20 11:32:56,273 p=101181 u=mistral n=ansible | exmaple-controller-1       : ok=165  changed=100  unreachable=0    failed=0    skipped=53   rescued=0    ignored=0
2021-12-20 11:32:56,273 p=101181 u=mistral n=ansible | undercloud"


[stack@manager templates]$ dig ipa-ca.xxx.com

; <<>> DiG 9.11.26-RedHat-9.11.26-4.el8_4 <<>> ipa-ca.xxx.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 25646
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 2, ADDITIONAL: 3

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: a361ab304ccb53ae7fbec63f61c1228836c91c6050a89348 (good)
;; QUESTION SECTION:
;ipa-ca.xxx.com.           IN      A

;; ANSWER SECTION:
ipa-ca.xxx.com.    86400   IN      A       10.00.00.101
ipa-ca.xxx.com.    86400   IN      A       10.00.00.91

;; AUTHORITY SECTION:
xxx.com.           86400   IN      NS      ipaserver2.xxx.com.
xxx.com.           86400   IN      NS      ipaserver.xxx.com.

;; ADDITIONAL SECTION:
ipaserver.xxx.com. 1200    IN      A       10.00.00.91
ipaserver2.xxx.com. 1200   IN      A       10.00.00.101

;; Query time: 0 msec
;; SERVER: 10.00.00.101#53(10.00.00.101)
;; WHEN: Tue Dec 21 09:40:40 KST 2021
;; MSG SIZE  rcvd: 189


Expected results:

Successful deployment without error 

Additional info:

Please find the relative code below 

    - name: confirm that host is not already registered with current keytab
      when: '"has_keytab: TRUE" not in host_raw_data.stdout'
      block:
        - name: remove stale host if present
          when: host_raw_data.rc == 0
          ipa_host:
            fqdn: "{{ base_server_fqdn }}"
            state: absent

        - name: add new host with random one-time password
          ipa_host:
            fqdn: "{{ base_server_fqdn }}"
            random_password: true
            force: true
          register: ipa_host

        - debug: var=ipa_host.host

        - name: set otp as a host fact
          set_fact:
            ipa_host_otp: "{{ ipa_host.host.randompassword }}"
          no_log: true
          delegate_facts: true
          delegate_to: "{{ tripleo_ipa_delegate_server }}"

... 
def main():
    argument_spec = ipa_argument_spec()
    argument_spec.update(description=dict(type='str'),
                         fqdn=dict(type='str', required=True, aliases=['name']),
                         force=dict(type='bool'),
                         ip_address=dict(type='str'),
                         ns_host_location=dict(type='str', aliases=['nshostlocation']),
                         ns_hardware_platform=dict(type='str', aliases=['nshardwareplatform']),
                         ns_os_version=dict(type='str', aliases=['nsosversion']),
                         user_certificate=dict(type='list', aliases=['usercertificate']),
                         mac_address=dict(type='list', aliases=['macaddress']),
                         update_dns=dict(type='bool'),
                         state=dict(type='str', default='present', choices=['present', 'absent', 'enabled', 'disabled']),
                         random_password=dict(type='bool'),)

... 

def _env_then_dns_fallback(*args, **kwargs):
    ''' Load value from environment or DNS in that order'''
    try:
        return env_fallback(*args, **kwargs)
    except AnsibleFallbackNotFound:
        # If no host was given, we try to guess it from IPA.
        # The ipa-ca entry is a standard entry that IPA will have set for
        # the CA.
        try:
            return socket.gethostbyaddr(socket.gethostbyname('ipa-ca'))[0]
        except Exception:
            raise AnsibleFallbackNotFound

Comment 2 Grzegorz Grasza 2022-02-01 11:26:08 UTC
I can confirm that the issue here is with Ansible (community.general), which doesn't use the same mechanisms to determine the IDM server address as documented for sssd [1], but in t-h-t we use the same settings for both.

The Ansible issue can be seen in the cited code implementing this in the bug description - Ansible tries to resolve the "ipa-ca" host in the calling host's domain, instead of checking SRV records.

The fastest way to approach this in my opinion is to fix [2] first. I.e. we should accept multiple IDM servers, use the first one during deployment with Ansible, and pass all of them to ipa-client-install.


[1] see FAILOVER and SERVICE DISCOVERY sections in "man ssd-ipa" https://linux.die.net/man/5/sssd-ipa
[2] https://bugzilla.redhat.com/show_bug.cgi?id=2033935

Comment 6 Grzegorz Grasza 2023-05-08 10:47:12 UTC
*** Bug 2186152 has been marked as a duplicate of this bug. ***

Comment 11 Red Hat Bugzilla 2025-03-26 04:25:06 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days