Description of problem:
When there are more than two IPA servers as a replicated cluster,
TLSe application to RHOSP16.2 can fail because of LB failure.
Case 1) when IPA server fixed to one server by IdMServer, IdMDomain: Single IPA server hostname will be passed to tripleo ansible playbook, but because of the both facts that the endpoint is fixed, and if that endpoint of IPA is down, then the ansible playbook will always ends with failure
Case 2) when IPA parameter IdMServer, IdMDomain, are not provided on purpose to make use of DNS discovery in case of IPA server failure, gethostbyaddr face possibility of failure since this function does not care if the hostname is reachable or not
TripleO ansible needs improvement not to make ansible module fall in to the try-catch block, but to help ansible module find the right IPA host from IPA replication cluster without any failure or outage.
Version-Release number of selected component (if applicable): 16.2.1
How reproducible:
TLSe is applied when there are more than 2 IPA servers and one of them is down
Steps to Reproduce:
1. setup single IPA replication cluster with more than 2 IPA servers
2. deploy and apply RHOSP16.2.1 that includes TLSe
3. find out the deployment fails when ansible playbook fails in case of failure of using IPA server that is down
Actual results:
"2021-12-20 11:32:56,163 p=101181 u=mistral n=ansible | 2021-12-20 11:32:56.163432 | | WARNING | Module did not set no_log for random_password
2021-12-20 11:32:56,164 p=101181 u=mistral n=ansible | 2021-12-20 11:32:56.164471 | 525400ab-bd08-bd92-0123-000000001fb7 | FATAL | add new host with random one-time password | undercloud | error={"changed": false, "msg": "host_find: Request failed: <urlopen error [Errno 113] No route to host>"}
2021-12-20 11:32:56,272 p=101181 u=mistral n=ansible | PLAY RECAP *********************************************************************
2021-12-20 11:32:56,273 p=101181 u=mistral n=ansible | exmaple-compute-1 : ok=151 changed=83 unreachable=0 failed=0 skipped=64 rescued=0 ignored=0
2021-12-20 11:32:56,273 p=101181 u=mistral n=ansible | exmaple-compute-2 : ok=147 changed=83 unreachable=0 failed=0 skipped=64 rescued=0 ignored=0
2021-12-20 11:32:56,273 p=101181 u=mistral n=ansible | exmaple-controller-1 : ok=165 changed=100 unreachable=0 failed=0 skipped=53 rescued=0 ignored=0
2021-12-20 11:32:56,273 p=101181 u=mistral n=ansible | undercloud"
[stack@manager templates]$ dig ipa-ca.xxx.com
; <<>> DiG 9.11.26-RedHat-9.11.26-4.el8_4 <<>> ipa-ca.xxx.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 25646
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 2, ADDITIONAL: 3
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: a361ab304ccb53ae7fbec63f61c1228836c91c6050a89348 (good)
;; QUESTION SECTION:
;ipa-ca.xxx.com. IN A
;; ANSWER SECTION:
ipa-ca.xxx.com. 86400 IN A 10.00.00.101
ipa-ca.xxx.com. 86400 IN A 10.00.00.91
;; AUTHORITY SECTION:
xxx.com. 86400 IN NS ipaserver2.xxx.com.
xxx.com. 86400 IN NS ipaserver.xxx.com.
;; ADDITIONAL SECTION:
ipaserver.xxx.com. 1200 IN A 10.00.00.91
ipaserver2.xxx.com. 1200 IN A 10.00.00.101
;; Query time: 0 msec
;; SERVER: 10.00.00.101#53(10.00.00.101)
;; WHEN: Tue Dec 21 09:40:40 KST 2021
;; MSG SIZE rcvd: 189
Expected results:
Successful deployment without error
Additional info:
Please find the relative code below
- name: confirm that host is not already registered with current keytab
when: '"has_keytab: TRUE" not in host_raw_data.stdout'
block:
- name: remove stale host if present
when: host_raw_data.rc == 0
ipa_host:
fqdn: "{{ base_server_fqdn }}"
state: absent
- name: add new host with random one-time password
ipa_host:
fqdn: "{{ base_server_fqdn }}"
random_password: true
force: true
register: ipa_host
- debug: var=ipa_host.host
- name: set otp as a host fact
set_fact:
ipa_host_otp: "{{ ipa_host.host.randompassword }}"
no_log: true
delegate_facts: true
delegate_to: "{{ tripleo_ipa_delegate_server }}"
...
def main():
argument_spec = ipa_argument_spec()
argument_spec.update(description=dict(type='str'),
fqdn=dict(type='str', required=True, aliases=['name']),
force=dict(type='bool'),
ip_address=dict(type='str'),
ns_host_location=dict(type='str', aliases=['nshostlocation']),
ns_hardware_platform=dict(type='str', aliases=['nshardwareplatform']),
ns_os_version=dict(type='str', aliases=['nsosversion']),
user_certificate=dict(type='list', aliases=['usercertificate']),
mac_address=dict(type='list', aliases=['macaddress']),
update_dns=dict(type='bool'),
state=dict(type='str', default='present', choices=['present', 'absent', 'enabled', 'disabled']),
random_password=dict(type='bool'),)
...
def _env_then_dns_fallback(*args, **kwargs):
''' Load value from environment or DNS in that order'''
try:
return env_fallback(*args, **kwargs)
except AnsibleFallbackNotFound:
# If no host was given, we try to guess it from IPA.
# The ipa-ca entry is a standard entry that IPA will have set for
# the CA.
try:
return socket.gethostbyaddr(socket.gethostbyname('ipa-ca'))[0]
except Exception:
raise AnsibleFallbackNotFound
I can confirm that the issue here is with Ansible (community.general), which doesn't use the same mechanisms to determine the IDM server address as documented for sssd [1], but in t-h-t we use the same settings for both.
The Ansible issue can be seen in the cited code implementing this in the bug description - Ansible tries to resolve the "ipa-ca" host in the calling host's domain, instead of checking SRV records.
The fastest way to approach this in my opinion is to fix [2] first. I.e. we should accept multiple IDM servers, use the first one during deployment with Ansible, and pass all of them to ipa-client-install.
[1] see FAILOVER and SERVICE DISCOVERY sections in "man ssd-ipa" https://linux.die.net/man/5/sssd-ipa
[2] https://bugzilla.redhat.com/show_bug.cgi?id=2033935
Description of problem: When there are more than two IPA servers as a replicated cluster, TLSe application to RHOSP16.2 can fail because of LB failure. Case 1) when IPA server fixed to one server by IdMServer, IdMDomain: Single IPA server hostname will be passed to tripleo ansible playbook, but because of the both facts that the endpoint is fixed, and if that endpoint of IPA is down, then the ansible playbook will always ends with failure Case 2) when IPA parameter IdMServer, IdMDomain, are not provided on purpose to make use of DNS discovery in case of IPA server failure, gethostbyaddr face possibility of failure since this function does not care if the hostname is reachable or not TripleO ansible needs improvement not to make ansible module fall in to the try-catch block, but to help ansible module find the right IPA host from IPA replication cluster without any failure or outage. Version-Release number of selected component (if applicable): 16.2.1 How reproducible: TLSe is applied when there are more than 2 IPA servers and one of them is down Steps to Reproduce: 1. setup single IPA replication cluster with more than 2 IPA servers 2. deploy and apply RHOSP16.2.1 that includes TLSe 3. find out the deployment fails when ansible playbook fails in case of failure of using IPA server that is down Actual results: "2021-12-20 11:32:56,163 p=101181 u=mistral n=ansible | 2021-12-20 11:32:56.163432 | | WARNING | Module did not set no_log for random_password 2021-12-20 11:32:56,164 p=101181 u=mistral n=ansible | 2021-12-20 11:32:56.164471 | 525400ab-bd08-bd92-0123-000000001fb7 | FATAL | add new host with random one-time password | undercloud | error={"changed": false, "msg": "host_find: Request failed: <urlopen error [Errno 113] No route to host>"} 2021-12-20 11:32:56,272 p=101181 u=mistral n=ansible | PLAY RECAP ********************************************************************* 2021-12-20 11:32:56,273 p=101181 u=mistral n=ansible | exmaple-compute-1 : ok=151 changed=83 unreachable=0 failed=0 skipped=64 rescued=0 ignored=0 2021-12-20 11:32:56,273 p=101181 u=mistral n=ansible | exmaple-compute-2 : ok=147 changed=83 unreachable=0 failed=0 skipped=64 rescued=0 ignored=0 2021-12-20 11:32:56,273 p=101181 u=mistral n=ansible | exmaple-controller-1 : ok=165 changed=100 unreachable=0 failed=0 skipped=53 rescued=0 ignored=0 2021-12-20 11:32:56,273 p=101181 u=mistral n=ansible | undercloud" [stack@manager templates]$ dig ipa-ca.xxx.com ; <<>> DiG 9.11.26-RedHat-9.11.26-4.el8_4 <<>> ipa-ca.xxx.com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 25646 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 2, ADDITIONAL: 3 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1232 ; COOKIE: a361ab304ccb53ae7fbec63f61c1228836c91c6050a89348 (good) ;; QUESTION SECTION: ;ipa-ca.xxx.com. IN A ;; ANSWER SECTION: ipa-ca.xxx.com. 86400 IN A 10.00.00.101 ipa-ca.xxx.com. 86400 IN A 10.00.00.91 ;; AUTHORITY SECTION: xxx.com. 86400 IN NS ipaserver2.xxx.com. xxx.com. 86400 IN NS ipaserver.xxx.com. ;; ADDITIONAL SECTION: ipaserver.xxx.com. 1200 IN A 10.00.00.91 ipaserver2.xxx.com. 1200 IN A 10.00.00.101 ;; Query time: 0 msec ;; SERVER: 10.00.00.101#53(10.00.00.101) ;; WHEN: Tue Dec 21 09:40:40 KST 2021 ;; MSG SIZE rcvd: 189 Expected results: Successful deployment without error Additional info: Please find the relative code below - name: confirm that host is not already registered with current keytab when: '"has_keytab: TRUE" not in host_raw_data.stdout' block: - name: remove stale host if present when: host_raw_data.rc == 0 ipa_host: fqdn: "{{ base_server_fqdn }}" state: absent - name: add new host with random one-time password ipa_host: fqdn: "{{ base_server_fqdn }}" random_password: true force: true register: ipa_host - debug: var=ipa_host.host - name: set otp as a host fact set_fact: ipa_host_otp: "{{ ipa_host.host.randompassword }}" no_log: true delegate_facts: true delegate_to: "{{ tripleo_ipa_delegate_server }}" ... def main(): argument_spec = ipa_argument_spec() argument_spec.update(description=dict(type='str'), fqdn=dict(type='str', required=True, aliases=['name']), force=dict(type='bool'), ip_address=dict(type='str'), ns_host_location=dict(type='str', aliases=['nshostlocation']), ns_hardware_platform=dict(type='str', aliases=['nshardwareplatform']), ns_os_version=dict(type='str', aliases=['nsosversion']), user_certificate=dict(type='list', aliases=['usercertificate']), mac_address=dict(type='list', aliases=['macaddress']), update_dns=dict(type='bool'), state=dict(type='str', default='present', choices=['present', 'absent', 'enabled', 'disabled']), random_password=dict(type='bool'),) ... def _env_then_dns_fallback(*args, **kwargs): ''' Load value from environment or DNS in that order''' try: return env_fallback(*args, **kwargs) except AnsibleFallbackNotFound: # If no host was given, we try to guess it from IPA. # The ipa-ca entry is a standard entry that IPA will have set for # the CA. try: return socket.gethostbyaddr(socket.gethostbyname('ipa-ca'))[0] except Exception: raise AnsibleFallbackNotFound