Description of problem: After creating system container install by passing following flags to openshift-ansible openshift_use_system_containers: true system_images_registry: registry.access.redhat.com I can see registry pod running in default project. When I try to create new app I see failure to push image Pushing image docker-registry.default.svc:5000/vlaad/cakephp-mysql-example:latest ... Registry server Address: Registry server User Name: serviceaccount Registry server Email: serviceaccount Registry server Password: <<non-empty>> error: build error: Failed to push image: Get https://docker-registry.default.svc:5000/v1/_ping: dial tcp: lookup docker-registry.default.svc on 172.31.55.188:53: no such host I also see warning in the default project events 25m 25m 1 docker-registry-1-deploy Pod Warning FailedMount kubelet, ip-172-31-42-188.us-west-2.compute.internal Unable to mount volumes for pod "docker-registry-1-deploy_default(e6d98103-82bb-11e7-a589-02618f0bef2c)": timeout expired waiting for volumes to attach/mount for pod "default"/"docker-registry-1-deploy". list of unattached/unmounted volumes=[deployer-token-j0qw7] oc get pods -n default docker-registry-1-nx952 1/1 Running 0 28m registry-console-1-691hc 1/1 Running 0 27m router-1-t3gwf 1/1 Running 0 29m Version-Release number of selected component (if applicable): -bash-4.2# openshift version openshift v3.6.173.0.5 kubernetes v1.6.1+5115d708d7 etcd 3.2.1 How reproducible: Always Steps to Reproduce: 1. Create a system container install on atomic host 2. create cakephp-mysql app 3. see build log Actual results: Build fails due to registry push failure Expected results: Build should pass Additional info: I will attach openshift ansible logs and all the events from default project.
-bash-4.2# oc get svc -n default NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE docker-registry 172.26.84.149 <none> 5000/TCP 16h kubernetes 172.24.0.1 <none> 443/TCP,53/UDP,53/TCP 17h registry-console 172.24.127.251 <none> 9000/TCP 16h router 172.25.82.116 <none> 80/TCP,443/TCP,1936/TCP 16h
It looks like the DNS resolved in a wrong IP address. Clayton, do you know if this is a known issue?
Moving this to networking team do investigate the DNS issues (I don't think there is anything wrong in the Docker Registry code).
It looks like docker-registry.default.svc is just not resolvable. How is the DNS set up on that node? Is it using the dnsmasq to do split horizon? If you ssh to the node and try: - nslookup docker-registry.default.svc - nslookup docker-registry.default.svc.cluster.local What happens? Also, please attach /etc/resolv.conf from the node.
Also, please grab /etc/dnsmasq.d/origin-dns.conf from the node. Also from any pod that is running on the node, please grab /etc/resolv.conf from _inside_ the pod.
Ben, In 3.6 dnsmasq config for cluster.local and in-addr.arpa are dynamically added when the node starts by sending a dbus signal to dnsmasq. In the event that they restart dnsmasq we also create /etc/dnsmasq.d/node-dnsmasq.conf which contains the same configuration values. This is removed when the node service is stopped because we no longer want to intercept in-addr.arpa queries. While the node service is running these queries should go to the node's dns service running on 127.0.0.1. server=/in-addr.arpa/127.0.0.1 server=/cluster.local/127.0.0.1 The node will have these configuration values specified in /etc/origin/node/node-config.yaml dnsBindAddress: 127.0.0.1:53 dnsRecursiveResolvConf: /etc/origin/node/resolv.conf The second will be a resolv.conf which contains the hosts's default resolvers so that the node can break the loop for queries it has to recurse. Vikas, Can you provide all of the contents of /etc/dnsmasq.d/ by running `more /etc/dnsmasq.d/* | cat` also /etc/origin/node/node-config.yaml ? And a log file from `journalctl --no-pager -u dnsmasq`
I am running into following bz, cant create cluster because of following https://bugzilla.redhat.com/show_bug.cgi?id=1489959
Also encounter this error on a system container installed 3.7 cluster, compared with rpm installed env, it's missing /etc/dnsmasq.d/node-dnsmasq.conf file on the nodes. [root@qe-gpei-node-zone1-primary-1 ~]# more /etc/dnsmasq.d/* |cat :::::::::::::: /etc/dnsmasq.d/origin-dns.conf :::::::::::::: no-resolv domain-needed no-negcache max-cache-ttl=1 enable-dbus bind-interfaces listen-address=10.240.0.54 :::::::::::::: /etc/dnsmasq.d/origin-upstream-dns.conf :::::::::::::: server=169.254.169.254 After copy /etc/origin/node/node-dnsmasq.conf to /etc/dnsmasq.d/node-dnsmasq.conf and restart dnsmasq service, docker-registry.default.svc could be resolved on nodes. Seems atomic-openshift-node systemd unit file is lacking of configuration about dnsmasq. [root@qe-gpei-node-zone1-primary-1 ~]# cat /etc/systemd/system/atomic-openshift-node.service [Unit] After=docker.service After=openvswitch.service Wants=docker.service After=atomic-openshift-node-dep.service After=atomic-openshift-master.service [Service] EnvironmentFile=/etc/sysconfig/atomic-openshift-node EnvironmentFile=/etc/sysconfig/atomic-openshift-node-dep ExecStartPre=/bin/bash -c 'export -p > /run/atomic-openshift-node-env' ExecStart=/bin/runc --systemd-cgroup run 'atomic-openshift-node' ExecStop=/bin/runc --systemd-cgroup kill 'atomic-openshift-node' SyslogIdentifier=atomic-openshift-node Restart=always RestartSec=5s WorkingDirectory=/sysroot/ostree/deploy/rhel-atomic-host/var/lib/containers/atomic/atomic-openshift-node.0 RuntimeDirectory=atomic-openshift-node [Install] WantedBy=docker.service The service template inside latest node image: [root@qe-gpei-node-zone2-primary-1 ~]# docker run --entrypoint cat registry.x.com/openshift3/node:v3.7.0-0.125.0 /exports/service.template [Unit] After=${DOCKER_SERVICE} After=${OPENVSWITCH_SERVICE} Wants=${DOCKER_SERVICE} After=$NAME-dep.service After=${MASTER_SERVICE} [Service] EnvironmentFile=/etc/sysconfig/$NAME EnvironmentFile=/etc/sysconfig/$NAME-dep ExecStartPre=/bin/bash -c 'export -p > /run/$NAME-env' ExecStart=$EXEC_START ExecStop=$EXEC_STOP SyslogIdentifier=$NAME Restart=always RestartSec=5s WorkingDirectory=$DESTDIR RuntimeDirectory=${NAME} [Install] WantedBy=docker.service
This issue is blocking QE's testing on system container environment. Version-Release number of selected component (if applicable): openshift-ansible-3.7.0-0.125.0.git.0.91043b6.el7.noarch.rpm ansible-2.3.2.0-2.el7.noarch openshift3/node:v3.7.0-0.125.0
proposed PR here: https://github.com/openshift/origin/pull/16378 https://github.com/openshift/openshift-ansible/pull/5429 In the meanwhile, could you verify if adding these lines to your /etc/systemd/system/atomic-openshift-node.service solve the issue for you? ExecStartPre=/usr/bin/cp /etc/origin/node/node-dnsmasq.conf /etc/dnsmasq.d/ ExecStartPre=/usr/bin/dbus-send --system --dest=uk.org.thekelleys.dnsmasq /uk/org/thekelleys/dnsmasq uk.org.thekelleys.SetDomainServers array:string:/in-addr.arpa/127.0.0.1,/cluster.local/127.0.0.1 ExecStopPost=/usr/bin/rm /etc/dnsmasq.d/node-dnsmasq.conf ExecStopPost=/usr/bin/dbus-send --system --dest=uk.org.thekelleys.dnsmasq /uk/org/thekelleys/dnsmasq uk.org.thekelleys.SetDomainServers array:string: and then: systemctl daemon-reload systemctl restart atomic-openshift-node
Verified in following version openshift v3.7.0-0.127.0 kubernetes v1.7.0+80709908fd etcd 3.2.1 Tested all the quickstart app build completed fine.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:3188