Description of problem: Build pod fails due to DNS resolution error: initContainerStatuses: - containerID: cri-o://260ff352b2204b54baacabdbb4ce6c441d7e6e6f4d4558491b4787ebfd5b2fcc image: registry.reg-aws.openshift.com/openshift3/ose-docker-builder:v3.10.0-0.47.0 imageID: registry.reg-aws.openshift.com/openshift3/ose-docker-builder@sha256:7574159cec81cf724b8b502142d22e5ba01736fa4b685d794732035398a53ba0 lastState: {} name: git-clone ready: false restartCount: 0 state: terminated: containerID: cri-o://260ff352b2204b54baacabdbb4ce6c441d7e6e6f4d4558491b4787ebfd5b2fcc exitCode: 1 finishedAt: 2018-05-21T22:05:33Z message: | Cloning "https://github.com/openshift/ruby-ex.git" ... error: fatal: unable to access 'https://github.com/openshift/ruby-ex.git/': Could not resolve host: github.com; Unknown error reason: Error startedAt: 2018-05-21T22:05:33Z Version-Release number of selected component (if applicable): $ oc version oc v3.10.0-0.47.0 kubernetes v1.10.0+b81c8f8 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://ip-172-18-15-133.ec2.internal:8443 openshift v3.10.0-0.47.0 kubernetes v1.10.0+b81c8f8 How reproducible: Every time Steps to Reproduce: 1. deploy 3-node cluster with openshift_use_crio=true 2. oc new-project demo 3. oc new-app centos/ruby-22-centos7~https://github.com/openshift/ruby-ex.git Actual results: ./crictl logs 260ff352b2204b54baacabdbb4ce6c441d7e6e6f4d4558491b4787ebfd5b2fcc Cloning "https://github.com/openshift/ruby-ex.git" ... error: fatal: unable to access 'https://github.com/openshift/ruby-ex.git/': Could not resolve host: github.com; Unknown error Expected results: Build pod can resolve names and complete successfully. Additional info:
It looks like an installation/installer issue. The nameserver is set incorrectly. This is what I see inside a pod on this cluster: [ec2-user@ip-172-18-15-133 ~]$ oc exec -it mywebpriv sh /usr/local/apache2 # ps -ef PID USER TIME COMMAND 1 root 0:00 httpd -DFOREGROUND 6 daemon 0:00 httpd -DFOREGROUND 7 daemon 0:00 httpd -DFOREGROUND 9 daemon 0:00 httpd -DFOREGROUND 90 root 0:00 sh 94 root 0:00 ps -ef /usr/local/apache2 # cat /etc/resolv.conf search demo.svc.cluster.local svc.cluster.local cluster.local ec2.internal nameserver 0.0.0.0 options ndots:5 On another cluster on which I was debugging some other issue on ec2, the resolv.conf looks like this: root@ip-172-31-38-99: ~ # oc rsh eap-app-mysql-1-w4pqv sh-4.2$ cat /etc/resolv.conf search testeap.svc.cluster.local svc.cluster.local cluster.local us-west-2.compute.internal nameserver 172.31.31.156 options ndots:5
We ran into this issue on https://bugzilla.redhat.com/show_bug.cgi?id=1577886#c3
This reminds me of a very same issue we had in the past (couldn't find the link though) but looks related to installer as Mrunal said.
[root@ip-172-18-15-133 ~]# ps aux | grep cluster-dns root 15935 1.9 1.1 435224 91440 ? Ssl May21 25:50 /usr/bin/hyperkube kubelet --v=2 --address=0.0.0.0 --allow-privileged=true --anonymous-auth=true --authentication-token-webhook=true --authentication-token-webhook-cache-ttl=5m --authorization-mode=Webhook --authorization-webhook-cache-authorized-ttl=5m --authorization-webhook-cache-unauthorized-ttl=5m --bootstrap-kubeconfig=/etc/origin/node/bootstrap.kubeconfig --cadvisor-port=0 --cert-dir=/etc/origin/node/certificates --cgroup-driver=systemd --client-ca-file=/etc/origin/node/client-ca.crt --cloud-config=/etc/origin/cloudprovider/aws.conf --cloud-provider=aws --cluster-dns=0.0.0.0 ... Pretty sure this is a dupe of 1577886. Fixed in OCP + openshift-ansible 3.10.0-0.48.0 or newer. *** This bug has been marked as a duplicate of bug 1577886 ***
Updated the node package to latest and restarted the service, now cluster-dns is set properly. Build of ruby-ex in demo project is successful. I imagine what happened is a newer installer which switched to `openshift-node-config` rather than `openshift start node --write-config` was used but the latest openshift binaries weren't used. This discrepancy caused the problem.