Description of problem: when executing the network diagnostics ansible check playbook, meet below error: Node qe-hongli-master-etcd-1: the IP address in OpenShift (172.16.120.31) does not match DNS/hosts (None) Version-Release number of selected component (if applicable): openshift-ansible-3.11.0-0.11.0.git.0.3c66516None.noarch.rpm How reproducible: always Steps to Reproduce: 1. prepare the hosts file qe-inventory 2. ansible-playbook -i qe-inventory -l all,localhost ./openshift-ansible/playbooks/openshift-checks/adhoc.yml -e openshift_checks=sdn -e openshift_checks_output_dir=$HOME/tmp/aws/output 3. Actual results: Failure summary: 1. Hosts: host-8-253-168.host.centralci.eng.rdu2.redhat.com Play: OpenShift Health Checks Task: Run health checks (adhoc) Message: One or more checks failed Details: check "sdn": Node qe-hongli-master-etcd-1: the IP address in OpenShift (172.16.120.31) does not match DNS/hosts (None) Node qe-hongli-node-1: the IP address in OpenShift (172.16.120.106) does not match DNS/hosts (None) Node qe-hongli-node-registry-router-1: the IP address in OpenShift (172.16.120.88) does not match DNS/hosts (None) Expected results: should not have the error Additional info: this issue can be reproduced in GCE/OpenStack env but passed in AWS. Here is example on OpenStack: [root@qe-hongli-master-etcd-1 ~]# oc get hostsubnet NAME HOST HOST IP SUBNET EGRESS IPS qe-hongli-master-etcd-1 qe-hongli-master-etcd-1 172.16.120.31 10.128.0.0/23 [] qe-hongli-node-1 qe-hongli-node-1 172.16.120.106 10.130.0.0/23 [] qe-hongli-node-registry-router-1 qe-hongli-node-registry-router-1 172.16.120.88 10.129.0.0/23 [] [root@qe-hongli-master-etcd-1 ~]# [root@qe-hongli-master-etcd-1 ~]# nslookup qe-hongli-master-etcd-1 Server: 172.16.120.31 Address: 172.16.120.31#53 Non-authoritative answer: Name: qe-hongli-master-etcd-1.int.0809-nfj.qe.rhcloud.com Address: 172.16.120.31 But on AWS it looks like: [root@ip-172-18-5-244 ~]# oc get hostsubnet NAME HOST HOST IP SUBNET EGRESS IPS ip-172-18-11-12.ec2.internal ip-172-18-11-12.ec2.internal 172.18.11.12 10.130.0.0/23 [] ip-172-18-5-244.ec2.internal ip-172-18-5-244.ec2.internal 172.18.5.244 10.128.0.0/23 [] ip-172-18-9-27.ec2.internal ip-172-18-9-27.ec2.internal 172.18.9.27 10.129.0.0/23 [] [root@ip-172-18-5-244 ~]# [root@ip-172-18-5-244 ~]# nslookup ip-172-18-11-12.ec2.internal Server: 172.18.5.244 Address: 172.18.5.244#53 Non-authoritative answer: Name: ip-172-18-11-12.ec2.internal Address: 172.18.11.12
Thanks! The check was requiring that the node's preferred host name or address match the canonical name of the node's internal address, but evidently that is not necessarily the case in a valid configuration, so we can relax that requirement. PR: https://github.com/openshift/openshift-ansible/pull/9511
Commit pushed to master at https://github.com/openshift/openshift-ansible https://github.com/openshift/openshift-ansible/commit/0ed00c773e29f5f79deef76bbd6fdacb9d5ae6ce SDN check: Ignore node's canonical name When verifying that a node's preferred host name or address resolves to its internal IP address, ignore the canonical name returned by name resolution. In some valid configurations, the node's preferred name or address differs from the canonical name. This change makes the check more consistent with the behavior of the debug.sh script on which the check was based: https://github.com/openshift/openshift-sdn/blob/d55c72b668018492059a862bee06f11745de1f97/hack/debug.sh#L126 This commit also fixes a problem with string encodings and comparisons that shows up on Python 3. This commit fixes bug 1614261. https://bugzilla.redhat.com/show_bug.cgi?id=1614261 * roles/openshift_health_checker/openshift_checks/sdn.py (SDNCheck.read_command_output): Add an optional parameter for UTF-8 encoding (default True). (SDNCheck.resolve_address): Use read_command_output's new parameter to disable UTF-8 encoding. Ignore canonical name in result. * roles/openshift_health_checker/test/sdn_tests.py (test_resolve_address): New test.
verified in openshift-ansible-3.11.0-0.20.0.git.0.ec6d8caNone.noarch.rpm and the issue has been fixed.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2652