Bug 1628233
| Summary: | openshift-ansible release-3.10 deployment fails on "Wait for all control plane pods to become ready" | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Sergii Marynenko <marynenko> | ||||||
| Component: | Installer | Assignee: | Scott Dodson <sdodson> | ||||||
| Status: | CLOSED NOTABUG | QA Contact: | Johnny Liu <jialiu> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | low | ||||||||
| Version: | 3.10.0 | CC: | aos-bugs, jokerman, mark.vinkx, marynenko, mmccomas, shlao, vrutkovs | ||||||
| Target Milestone: | --- | ||||||||
| Target Release: | 3.10.z | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2018-10-04 16:35:16 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
|
Description
Sergii Marynenko
2018-09-12 13:48:36 UTC
Workaround is to add "ignore_errors: true" to the TASK "Wait for all control plane pods to become ready" in /tmp/openshift/roles/openshift_control_plane/tasks/main.yml NotFound pods like "master-controllers-testosmaster1.xxxxx.xxxxxxx.com" are actually not fount, there are pods without domain part in the names like: oc get pods --all-namespaces' NAMESPACE NAME READY STATUS RESTARTS AGE kube-system master-api-testosmaster1 1/1 Running 0 4h kube-system master-api-testosmaster2 1/1 Running 0 4h kube-system master-api-testosmaster3 1/1 Running 0 4h kube-system master-controllers-testosmaster1 1/1 Running 0 4h kube-system master-controllers-testosmaster2 1/1 Running 0 4h kube-system master-controllers-testosmaster3 1/1 Running 0 4h kube-system master-etcd-testosmaster1 1/1 Running 0 4h kube-system master-etcd-testosmaster2 1/1 Running 0 4h kube-system master-etcd-testosmaster3 1/1 Running 0 4h (In reply to smarynenko from comment #2) > NotFound pods like "master-controllers-testosmaster1.xxxxx.xxxxxxx.com" > are actually not fount, there are pods without domain part in the names like: > oc get pods --all-namespaces' openshift-ansible uses openshift.node.nodename to predict pod names, which is generated from `hostname -f` output on the nodes. The kubelet also reads this value and would name pods accordingly. It seems "raw_hostname" is set to "testosmaster1", but other nodenames are set to FQDN. openshift-ansible would use whatever FQDN specified (since there's no cloudprovider here). Could you verify FQDN on the host AND your DNS server to return the same value (be it short or long, but pick one)? Scott, it seems `raw_hostname` should be used in any case, as kubelet picks it most of the times >Could you verify FQDN on the host AND your DNS server to return the same value
>(be it short or long, but pick one)?
"hostname -f" on all of the nodes gives FQDN
(A FQDN consists of a short host name and the DNS domain name)
like:
testosmaster1.xxxxx.xxxxxxx.com
btw "man hostname" says:
-f, --fqdn, --long
Display the FQDN (Fully Qualified Domain Name). A FQDN consists of a short host name and the DNS domain name. Unless you are using bind or NIS for host lookups you can change the FQDN and the DNS domain name (which is part of the FQDN) in the /etc/hosts file. See the warnings in section THE FQDN above, and avoid using this option; use hostname --all-fqdns instead.
So:
[root@testosmaster1 ~]# hostname --all-fqdns
testosmaster1.xxxxx.xxxxxxx.com testosmaster1
DNS server has A records in the direct zone xxxxx.xxxxxxx.com for all nodes:
testosmaster1.xxxxx.xxxxxxx.com. 3600 IN A 172.16.25.205
testosmaster2.xxxxx.xxxxxxx.com. 3585 IN A 172.16.25.206
testosmaster3.xxxxx.xxxxxxx.com. 3600 IN A 172.16.25.207
So DNS answer differs by "."
The DNS server isn't configured to serve reverse lookups for those names as it is not required in the documentation.
/etc/hosts on each node contains only two raws, for instance on testosmaster1: 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 172.16.25.205 testosmaster1.xxxxx.xxxxxxx.com testosmaster1 each node as a second raw has its own 'IP FQDN hostname' Created attachment 1482757 [details]
inventory file
This problem maybe case by the /etc/resolv.conf. If existed string looks like: search xxx.yyy.zzz.com then command : hostname -f, will output 'testosmaster2.xxx.yyy.zzz.com' I am checking which process modify file '/etc/resolv.conf' Solved by removing line with FQDN from /etc/hosts file. After terraform managed VM creating in vSphere each VM has a line with external IP and FQDN in /etc/hosts. The line removing eliminates the issue. |