1580981 – cri-o: error during build "Could not resolve host: github.com"

Bug 1580981 - cri-o: error during build "Could not resolve host: github.com"

Summary: cri-o: error during build "Could not resolve host: github.com"

Keywords:
Status:	CLOSED DUPLICATE of bug 1577886
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Containers
Sub Component:
Version:	3.10.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	3.10.0
Assignee:	Mrunal Patel
QA Contact:	DeShuai Ma
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-05-21 22:11 UTC by Seth Jennings
Modified:	2018-05-22 19:29 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-05-22 19:15:00 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Seth Jennings 2018-05-21 22:11:00 UTC

Description of problem:

Build pod fails due to DNS resolution error:

    initContainerStatuses:
    - containerID: cri-o://260ff352b2204b54baacabdbb4ce6c441d7e6e6f4d4558491b4787ebfd5b2fcc
      image: registry.reg-aws.openshift.com/openshift3/ose-docker-builder:v3.10.0-0.47.0
      imageID: registry.reg-aws.openshift.com/openshift3/ose-docker-builder@sha256:7574159cec81cf724b8b502142d22e5ba01736fa4b685d794732035398a53ba0
      lastState: {}
      name: git-clone
      ready: false
      restartCount: 0
      state:
        terminated:
          containerID: cri-o://260ff352b2204b54baacabdbb4ce6c441d7e6e6f4d4558491b4787ebfd5b2fcc
          exitCode: 1
          finishedAt: 2018-05-21T22:05:33Z
          message: |
            Cloning "https://github.com/openshift/ruby-ex.git" ...
            error: fatal: unable to access 'https://github.com/openshift/ruby-ex.git/': Could not resolve host: github.com; Unknown error
          reason: Error
          startedAt: 2018-05-21T22:05:33Z


Version-Release number of selected component (if applicable):

$ oc version
oc v3.10.0-0.47.0
kubernetes v1.10.0+b81c8f8
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://ip-172-18-15-133.ec2.internal:8443
openshift v3.10.0-0.47.0
kubernetes v1.10.0+b81c8f8

How reproducible:

Every time

Steps to Reproduce:
1. deploy 3-node cluster with openshift_use_crio=true
2. oc new-project demo
3. oc new-app centos/ruby-22-centos7~https://github.com/openshift/ruby-ex.git

Actual results:

./crictl logs 260ff352b2204b54baacabdbb4ce6c441d7e6e6f4d4558491b4787ebfd5b2fcc
Cloning "https://github.com/openshift/ruby-ex.git" ...
error: fatal: unable to access 'https://github.com/openshift/ruby-ex.git/': Could not resolve host: github.com; Unknown error

Expected results:

Build pod can resolve names and complete successfully.


Additional info:

Comment 2 Mrunal Patel 2018-05-22 03:01:02 UTC

It looks like an installation/installer issue. The nameserver is set incorrectly. This is what I see inside a pod on this cluster:

[ec2-user@ip-172-18-15-133 ~]$ oc exec -it mywebpriv sh 
/usr/local/apache2 # ps -ef
PID   USER     TIME   COMMAND
    1 root       0:00 httpd -DFOREGROUND
    6 daemon     0:00 httpd -DFOREGROUND
    7 daemon     0:00 httpd -DFOREGROUND
    9 daemon     0:00 httpd -DFOREGROUND
   90 root       0:00 sh
   94 root       0:00 ps -ef
/usr/local/apache2 # cat /etc/resolv.conf 
search demo.svc.cluster.local svc.cluster.local cluster.local ec2.internal
nameserver 0.0.0.0
options ndots:5

On another cluster on which I was debugging some other issue on ec2, the resolv.conf looks like this:

root@ip-172-31-38-99: ~ # oc rsh eap-app-mysql-1-w4pqv
sh-4.2$ cat /etc/resolv.conf 
search testeap.svc.cluster.local svc.cluster.local cluster.local us-west-2.compute.internal
nameserver 172.31.31.156
options ndots:5

Comment 3 Seth Jennings 2018-05-22 12:13:43 UTC

We ran into this issue on https://bugzilla.redhat.com/show_bug.cgi?id=1577886#c3

Comment 4 Antonio Murdaca 2018-05-22 13:35:30 UTC

This reminds me of a very same issue we had in the past (couldn't find the link though) but looks related to installer as Mrunal said.

Comment 5 Scott Dodson 2018-05-22 19:15:00 UTC

[root@ip-172-18-15-133 ~]# ps aux | grep cluster-dns
root     15935  1.9  1.1 435224 91440 ?        Ssl  May21  25:50 /usr/bin/hyperkube kubelet --v=2 --address=0.0.0.0 --allow-privileged=true --anonymous-auth=true --authentication-token-webhook=true --authentication-token-webhook-cache-ttl=5m --authorization-mode=Webhook --authorization-webhook-cache-authorized-ttl=5m --authorization-webhook-cache-unauthorized-ttl=5m --bootstrap-kubeconfig=/etc/origin/node/bootstrap.kubeconfig --cadvisor-port=0 --cert-dir=/etc/origin/node/certificates --cgroup-driver=systemd --client-ca-file=/etc/origin/node/client-ca.crt --cloud-config=/etc/origin/cloudprovider/aws.conf --cloud-provider=aws --cluster-dns=0.0.0.0 ...


Pretty sure this is a dupe of 1577886.

Fixed in OCP + openshift-ansible 3.10.0-0.48.0 or newer.

*** This bug has been marked as a duplicate of bug 1577886 ***

Comment 6 Scott Dodson 2018-05-22 19:29:46 UTC

Updated the node package to latest and restarted the service, now cluster-dns is set properly. Build of ruby-ex in demo project is successful.

I imagine what happened is a newer installer which switched to `openshift-node-config` rather than `openshift start node --write-config` was used but the latest openshift binaries weren't used. This discrepancy caused the problem.

Note You need to log in before you can comment on or make changes to this bug.