Description of problem: With the OCP Version 3.6 we was not able to push to registry. Version-Release number of selected component (if applicable): At least 3.6.0 How reproducible: Install vanilla OCP 3.6 and run a example build Actual results: --> Installing application source... Pushing image docker-registry.default.svc:5000/quickdemo/quidemo:latest ... Registry server Address: Registry server User Name: serviceaccount Registry server Email: serviceaccount Registry server Password: <<non-empty>> error: build error: Failed to push image: Get https://docker-registry.default.svc:5000/v1/_ping: dial tcp: lookup docker-registry.default.svc on xxx.xxx.xxx.xxx:53: no such host Expected results: Pushed successfully Additional info:
The network manager does not add the cluster.local search into the resolv.conf. The file below is a fixed version https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_node_dnsmasq/files/networkmanager/99-origin-dns.sh This is the issue https://github.com/openshift/openshift-ansible/issues/5372
Description of problem: - When there are no `search xxx` in /etc/resolv.conf on the host, 99-origin-dns.sh misses to add `cluster.local` to search domain. Version-Release number of selected component (if applicable): - OCP 3.6 How reproducible: - 100% (if /etc/resolv.conf doesn't have `search xxx` by default.) Steps to reproduce: - 0. Install OpenShift Node. - 1. Setup an env which does not have `search xxx` in /etc/resolv.conf by default. - 2. Restart NetworkManager. Actual results: - /etc/resolv.conf doesn't have "search cluster.local" Expected results: - /etc/resolv.conf has "search cluster.local" Additional info: - Current fix https://github.com/openshift/openshift-ansible/pull/5398 still misses the setting. - Here is a proposal patch https://github.com/openshift/openshift-ansible/pull/5585
@Aleks I'm sorry I changed subject of this ticket and component as 99-origin-dns.sh is provided by installer. Please feel free to update if you have any objection.
(In reply to Kenjiro Nakayama from comment #3) > @Aleks I'm sorry I changed subject of this ticket and component as > 99-origin-dns.sh is provided by installer. Please feel free to update if you > have any objection. Yes it's a better subject. Thanks for adopting.
All, After applying below script to my installer VM i see that images are still not being pushed to the registry . It is timing out. Please see the diff of my 99-origin-dns.sh script with the old one . You can see it has now two new lines as shown below. DIFF ON THE 99-ORIGIN-DNS.SH SCRIPT: [root@unknown0800276C42EA ~]# diff /usr/share/ansible/openshift-ansible/roles/openshift_node_dnsmasq/files/networkmanager/99-origin-dns.sh 99-origin-dns_bkp.sh 117,118d116 < elif ! grep -qw search ${NEW_RESOLV_CONF}; then < echo 'search cluster.local' >> ${NEW_RESOLV_CONF} [root@unknown0800276C42EA ~]# After I replaced the script on my installer VM, i successfully installed openshift cluster 3.6 from scratch and when i try to start the build i see error "error: build error: Failed to push image: After retrying 6 times, Push image still failed" I have collected my host (master) /etc/resolv.conf file and the container builder POD /etc/resolv.conf file for your reference. Please see below. Also i pasted builder logs + internal registry logs. Please let me know how to get over this error now ? SCTIPT USED ON INSTALLER VM: https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_node_dnsmasq/files/networkmanager/99-origin-dns.sh HOST (MASTER) /ETC/RESOV.CONF FILE: [root@ocp3 ~]# cat /etc/resolv.conf # Generated by NetworkManager search example.com default.svc.cluster.local svc.cluster.local cluster.local 6-master01.ocs.example.com nameserver 192.168.1.104 [root@ocp3 ~]# CONTAINER BUILDER POD /ETC/RESOLV.CONF FILE: [root@ocp3 ~]# oc get pods NAME READY STATUS RESTARTS AGE test-2-build 1/1 Running 0 20s [root@ocp3 tmp]#oc rsh test-2-build cat /etc/resolv.conf > /tmp/info_resolv.conf [root@ocp3 tmp]# cat info_resolv.conf nameserver 192.168.1.112 search test.svc.cluster.local svc.cluster.local cluster.local example.com default.svc.cluster.local 6-master01.ocs.example.com options ndots:5 [root@ocp3 tmp]# BUILDER LOGS: [root@ocp3 ~]# oc logs bc/test Cloning "https://github.com/openshift/cakephp-ex.git" ... Commit: 7969534afdf9490ca79e37e672f0b9c81887ec28 (Merge pull request #81 from bparees/readiness) Author: Ben Parees <bparees.github.com> Date: Mon Sep 11 01:15:51 2017 -0400 ---> Installing application source... Found 'composer.json', installing dependencies using composer.phar... % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 298k 100 298k 0 0 196k 0 0:00:01 0:00:01 --:--:-- 196k All settings correct for using Composer Downloading... Composer (version 1.5.2) successfully installed to: /opt/app-root/src/composer.phar Use it: php composer.phar Loading composer repositories with package information Installing dependencies (including require-dev) from lock file Package operations: 10 installs, 0 updates, 0 removals - Installing squizlabs/php_codesniffer (1.5.6): Downloading (100%) - Installing cakephp/cakephp-codesniffer (1.0.2): Downloading (100%) - Installing phpunit/php-token-stream (1.2.2): Downloading (100%) - Installing symfony/yaml (v2.8.16): Downloading (100%) - Installing phpunit/php-text-template (1.2.1): Downloading (100%) - Installing phpunit/phpunit-mock-objects (1.2.3): Downloading (100%) - Installing phpunit/php-timer (1.0.8): Downloading (100%) - Installing phpunit/php-file-iterator (1.4.2): Downloading (100%) - Installing phpunit/php-code-coverage (1.2.18): Downloading (100%) - Installing phpunit/phpunit (3.7.38): Downloading (100%) phpunit/php-code-coverage suggests installing ext-xdebug (>=2.0.5) phpunit/phpunit suggests installing phpunit/php-invoker (~1.1) Generating optimized autoload files Pushing image docker-registry.default.svc:5000/test/test:latest ... Warning: Push failed, retrying in 5s ... Warning: Push failed, retrying in 5s ... Warning: Push failed, retrying in 5s ... Warning: Push failed, retrying in 5s ... Warning: Push failed, retrying in 5s ... Warning: Push failed, retrying in 5s ... Warning: Push failed, retrying in 5s ... Registry server Address: Registry server User Name: serviceaccount Registry server Email: serviceaccount Registry server Password: <<non-empty>> error: build error: Failed to push image: After retrying 6 times, Push image still failed [root@ocp3 ~]# DOCKER REGISTRY LOGS: [root@ocp3 ~]# oc get pods NAME READY STATUS RESTARTS AGE docker-registry-1-pv1f4 1/1 Running 0 3h registry-console-1-rfvgv 1/1 Running 0 3h router-1-czr1n 1/1 Running 0 4h [root@ocp3 ~]# time="2017-10-08T17:06:07Z" level=warning msg="Ignoring unrecognized environment variable REGISTRY_CONSOLE_PORT" time="2017-10-08T17:06:07Z" level=warning msg="Ignoring unrecognized environment variable REGISTRY_CONSOLE_PORT_9000_TCP" time="2017-10-08T17:06:07Z" level=warning msg="Ignoring unrecognized environment variable REGISTRY_CONSOLE_PORT_9000_TCP_ADDR" time="2017-10-08T17:06:07Z" level=warning msg="Ignoring unrecognized environment variable REGISTRY_CONSOLE_PORT_9000_TCP_PORT" time="2017-10-08T17:06:07Z" level=warning msg="Ignoring unrecognized environment variable REGISTRY_CONSOLE_PORT_9000_TCP_PROTO" time="2017-10-08T17:06:07Z" level=warning msg="Ignoring unrecognized environment variable REGISTRY_CONSOLE_SERVICE_HOST" time="2017-10-08T17:06:07Z" level=warning msg="Ignoring unrecognized environment variable REGISTRY_CONSOLE_SERVICE_PORT" time="2017-10-08T17:06:07Z" level=warning msg="Ignoring unrecognized environment variable REGISTRY_CONSOLE_SERVICE_PORT_REGISTRY_CONSOLE" time="2017-10-08T17:06:07.818480818Z" level=info msg="version=v2.4.1+unknown" time="2017-10-08T17:06:07.819518926Z" level=info msg="OpenShift middleware for storage driver initializing" time="2017-10-08T17:06:07.819540839Z" level=info msg="redis not configured" go.version=go1.7.6 instance.id=3548f933-3daa-4453-828b-5e8100383a16 openshift.logger=registry time="2017-10-08T17:06:07.833687986Z" level=info msg="Starting upload purge in 31m0s" go.version=go1.7.6 instance.id=3548f933-3daa-4453-828b-5e8100383a16 openshift.logger=registry time="2017-10-08T17:06:07.835892641Z" level=info msg="using inmemory blob descriptor cache" go.version=go1.7.6 instance.id=3548f933-3daa-4453-828b-5e8100383a16 openshift.logger=registry time="2017-10-08T17:06:07.835905245Z" level=info msg="OpenShift registry middleware initializing" time="2017-10-08T17:06:07.835912344Z" level=info msg="Using Origin Auth handler" go.version=go1.7.6 instance.id=3548f933-3daa-4453-828b-5e8100383a16 openshift.logger=registry time="2017-10-08T17:06:07.835923052Z" level=debug msg="configured \"openshift\" access controller" go.version=go1.7.6 instance.id=3548f933-3daa-4453-828b-5e8100383a16 openshift.logger=registry time="2017-10-08T17:06:07.835949017Z" level=debug msg="configured token endpoint at \"/openshift/token\"" go.version=go1.7.6 instance.id=3548f933-3daa-4453-828b-5e8100383a16 openshift.logger=registry time="2017-10-08T17:06:07.836315032Z" level=info msg="listening on :5000" go.version=go1.7.6 instance.id=3548f933-3daa-4453-828b-5e8100383a16 openshift.logger=registry 10.128.0.1 - - [08/Oct/2017:17:06:08 +0000] "GET /healthz HTTP/1.1" 200 0 "" "Go-http-client/1.1" 10.128.0.1 - - [08/Oct/2017:17:06:15 +0000] "GET /healthz HTTP/1.1" 200 0 "" "Go-http-client/1.1" 10.128.0.1 - - [08/Oct/2017:17:06:25 +0000] "GET /healthz HTTP/1.1" 200 0 "" "Go-http-client/1.1" 10.128.0.1 - - [08/Oct/2017:17:06:25 +0000] "GET /healthz HTTP/1.1" 200 0 "" "Go-http-client/1.1" 10.128.0.1 - - [08/Oct/2017:17:06:35 +0000] "GET /healthz HTTP/1.1" 200 0 "" "Go-http-client/1.1" 10.128.0.1 - - [08/Oct/2017:17:06:35 +0000] "GET /healthz HTTP/1.1" 200 0 "" "Go-http-client/1.1" 10.128.0.1 - - [08/Oct/2017:17:06:45 +0000] "GET /healthz HTTP/1.1" 200 0 "" "Go-http-client/1.1" 10.128.0.1 - - [08/Oct/2017:17:06:45 +0000] "GET /healthz HTTP/1.1" 200 0 "" "Go-http-client/1.1" 10.128.0.1 - - [08/Oct/2017:17:06:55 +0000] "GET /healthz HTTP/1.1" 200 0 "" "Go-http-client/1.1" 10.128.0.1 - - [08/Oct/2017:17:06:55 +0000] "GET /healthz HTTP/1.1" 200 0 "" "Go-http-client/1.1" 10.128.0.1 - - [08/Oct/2017:17:07:05 +0000] "GET /healthz HTTP/1.1" 200 0 "" "Go-http-client/1.1" 10.128.0.1 - - [08/Oct/2017:17:07:05 +0000] "GET /healthz HTTP/1.1" 200 0 "" "Go-http-client/1.1" 10.128.0.1 - - [08/Oct/2017:17:07:15 +0000] "GET /healthz HTTP/1.1" 200 0 "" "Go-http-client/1.1" 10.128.0.1 - - [08/Oct/2017:17:07:15 +0000] "GET /healthz HTTP/1.1" 200 0 "" "Go-http-client/1.1" 10.128.0.1 - - [08/Oct/2017:17:07:25 +0000] "GET /healthz HTTP/1.1" 200 0 "" "Go-http-client/1.1" 10.128.0.1 - - [08/Oct/2017:17:07:25 +0000] "GET /healthz HTTP/1.1" 200 0 "" "Go-http-client/1.1" 10.128.0.1 - - [08/Oct/2017:17:07:35 +0000] "GET /healthz HTTP/1.1" 200 0 "" "Go-http-client/1.1" 10.128.0.1 - - [08/Oct/2017:17:07:35 +0000] "GET /healthz HTTP/1.1" 200 0 "" "Go-http-client/1.1" 10.128.0.1 - - [08/Oct/2017:17:07:45 +0000] "GET /healthz HTTP/1.1" 200 0 "" "Go-http-client/1.1" 10.128.0.1 - - [08/Oct/2017:17:07:45 +0000] "GET /healthz HTTP/1.1" 200 0 "" "Go-http-client/1.1" 10.128.0.1 - - [08/Oct/2017:17:07:55 +0000] "GET /healthz HTTP/1.1" 200 0 "" "Go-http-client/1.1" //////////////////////////////////////////////////////////////////////////////////// (In reply to Kenjiro Nakayama from comment #2) > Description of problem: > - When there are no `search xxx` in /etc/resolv.conf on the host, > 99-origin-dns.sh misses to add `cluster.local` to search domain. > > Version-Release number of selected component (if applicable): > - OCP 3.6 > > How reproducible: > - 100% (if /etc/resolv.conf doesn't have `search xxx` by default.) > > Steps to reproduce: > - 0. Install OpenShift Node. > - 1. Setup an env which does not have `search xxx` in /etc/resolv.conf by > default. > - 2. Restart NetworkManager. > > Actual results: > - /etc/resolv.conf doesn't have "search cluster.local" > > Expected results: > - /etc/resolv.conf has "search cluster.local" > > Additional info: > - Current fix https://github.com/openshift/openshift-ansible/pull/5398 still > misses the setting. > - Here is a proposal patch > https://github.com/openshift/openshift-ansible/pull/5585
@puneet, I have already sent the fixed PR here: https://github.com/openshift/openshift-ansible/pull/5585 But stil not merged.
Hi Kenjiro , Thanks for your update. I'm sure there will be some workaround to get over this error ? If yes, can someone please provide the detail steps ? I'm currently stuck with build process and cannot proceed further.If not, i understand i have to wait for the updated 99-origin-dns.sh script? Also , i have one question since my error (Push failed, retrying in 5s ...) is different than what is mentioned in this ticket i was wondering if this is what everyone is seeing now ? The reason i ask is as you can see below my host's /etc/resolv.conf is correctly populated . I had to manually update this file before cluster install and protected it with chattr +i command .But still image push is timing out for me???? [root@ocp3 ~]# cat /etc/resolv.conf # Generated by NetworkManager search example.com default.svc.cluster.local svc.cluster.local cluster.local 6-master01.ocs.example.com nameserver 192.168.1.104 [root@ocp3 ~]# (In reply to Kenjiro Nakayama from comment #7) > @puneet, I have already sent the fixed PR here: > > https://github.com/openshift/openshift-ansible/pull/5585 > > But stil not merged.
@puneet, > Also , i have one question since my error (Push failed, retrying in 5s ...) is > different than what is mentioned in this ticket i was wondering if this is what > everyone is seeing now ? I believe that your guessing is correct. Yours is different. Te issue which this bz is addressing is that "/etc/resolv.conf" does not have "search cluster.local", due to the bug of 99-origin-dns.sh. However, you added `cluster.local` to /etc/resolv.conf manually but still failed to push. So, most probably the bz is different from yours. Could you please open a support ticket or ask openshift-sme ML?
Proposed fix merged.
https://github.com/openshift/openshift-ansible/pull/5585
Getting stuck having such reproducible environment. On OpenStack, after disabling the DHCP functionality of the subnet, still got the search domain: # cat /etc/resolv.conf # Generated by NetworkManager search localdomain Will do further investigation for the test scenario in the next few days.
@Gan, I tested this on my env by adding "PEERDNS=no" to my /etc/sysconfig/network-scripts/ifcfg-eth0. Could you please try it?
Thanks Kenjiro! That works. Reproduced with openshift-ansible-3.6.173.0.45-1.git.0.dc70c99.el7.noarch.rpm Verified in openshift-ansible-3.7.0-0.148.0.git.0.b35eb14.el7.noarch.rpm
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:3188