Bug 1496593 - NetworkManager(99-origin-dns.sh) does not add cluster.local to resolv.conf if there are no `search xxx` in resolv.conf
Summary: NetworkManager(99-origin-dns.sh) does not add cluster.local to resolv.conf if...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.6.0
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: 3.7.0
Assignee: Scott Dodson
QA Contact: Gan Huang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-09-27 21:26 UTC by Aleks Lazic
Modified: 2018-06-27 03:28 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously, if there were no dns search path in /etc/resolv.conf then the NetworkManager dispatcher would omit adding 'cluster.local' to the search path. The dispatcher script has been updated to ensure that if no search path exists one is created.
Clone Of:
Environment:
Last Closed: 2017-11-28 22:13:20 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github 5372 0 None None None 2020-09-22 14:24:33 UTC
Red Hat Product Errata RHSA-2017:3188 0 normal SHIPPED_LIVE Moderate: Red Hat OpenShift Container Platform 3.7 security, bug, and enhancement update 2017-11-29 02:34:54 UTC

Description Aleks Lazic 2017-09-27 21:26:16 UTC
Description of problem:

With the OCP Version 3.6 we was not able to push to registry.


Version-Release number of selected component (if applicable):
At least 3.6.0

How reproducible:

Install vanilla OCP 3.6 and run a example build

Actual results:

--> Installing application source...
Pushing image docker-registry.default.svc:5000/quickdemo/quidemo:latest ...
Registry server Address:
Registry server User Name: serviceaccount
Registry server Email: serviceaccount
Registry server Password: <<non-empty>>
error: build error: Failed to push image: Get https://docker-registry.default.svc:5000/v1/_ping: dial tcp: lookup docker-registry.default.svc on xxx.xxx.xxx.xxx:53: no such host 

Expected results:

Pushed successfully 

Additional info:

Comment 1 Aleks Lazic 2017-09-27 21:28:44 UTC
The network manager does not add the cluster.local search into the resolv.conf.

The file below is a fixed version

https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_node_dnsmasq/files/networkmanager/99-origin-dns.sh

This is the issue https://github.com/openshift/openshift-ansible/issues/5372

Comment 2 Kenjiro Nakayama 2017-09-30 13:08:39 UTC
Description of problem:
- When there are no `search xxx` in /etc/resolv.conf on the host, 99-origin-dns.sh misses to add `cluster.local` to search domain.

Version-Release number of selected component (if applicable):
- OCP 3.6

How reproducible:
- 100% (if /etc/resolv.conf doesn't have `search xxx` by default.)

Steps to reproduce:
- 0. Install OpenShift Node.
- 1. Setup an env which does not have `search xxx` in /etc/resolv.conf by default.
- 2. Restart NetworkManager.

Actual results:
- /etc/resolv.conf doesn't have "search cluster.local"

Expected results:
- /etc/resolv.conf has "search cluster.local"

Additional info:
- Current fix https://github.com/openshift/openshift-ansible/pull/5398 still misses the setting.
- Here is a proposal patch https://github.com/openshift/openshift-ansible/pull/5585

Comment 3 Kenjiro Nakayama 2017-09-30 13:13:20 UTC
@Aleks I'm sorry I changed subject of this ticket and component as 99-origin-dns.sh is provided by installer. Please feel free to update if you have any objection.

Comment 5 Aleks Lazic 2017-10-04 07:40:07 UTC
(In reply to Kenjiro Nakayama from comment #3)
> @Aleks I'm sorry I changed subject of this ticket and component as
> 99-origin-dns.sh is provided by installer. Please feel free to update if you
> have any objection.

Yes it's a better subject.
Thanks for adopting.

Comment 6 puneet 2017-10-08 20:57:09 UTC
All,


After applying below script to my installer VM i see that images are still not being pushed to the registry . It is timing out.

Please see the diff of my 99-origin-dns.sh script with the old one . You can see it has now two new lines as shown below. 

DIFF ON THE 99-ORIGIN-DNS.SH SCRIPT:

[root@unknown0800276C42EA ~]# diff
/usr/share/ansible/openshift-ansible/roles/openshift_node_dnsmasq/files/networkmanager/99-origin-dns.sh
99-origin-dns_bkp.sh
117,118d116
<       elif ! grep -qw search ${NEW_RESOLV_CONF}; then
<         echo 'search cluster.local' >> ${NEW_RESOLV_CONF}
[root@unknown0800276C42EA ~]#



After I replaced the script on my installer VM, i successfully installed openshift cluster 3.6 from scratch and when i try to start the build i see error  "error: build error: Failed to push image: After retrying 6 times, Push image still failed"

I have collected my host (master) /etc/resolv.conf file and the container builder POD /etc/resolv.conf file for your reference. Please see below. Also i pasted builder logs + internal registry logs. 
Please let me know how to get over this error now ? 



SCTIPT USED ON INSTALLER VM:

https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_node_dnsmasq/files/networkmanager/99-origin-dns.sh

HOST (MASTER) /ETC/RESOV.CONF FILE:

[root@ocp3 ~]# cat /etc/resolv.conf
# Generated by NetworkManager
search example.com default.svc.cluster.local svc.cluster.local
cluster.local 6-master01.ocs.example.com
nameserver 192.168.1.104
[root@ocp3 ~]#


CONTAINER BUILDER POD /ETC/RESOLV.CONF FILE:

[root@ocp3 ~]# oc get pods
NAME READY STATUS RESTARTS AGE
test-2-build 1/1 Running 0 20s

[root@ocp3 tmp]#oc rsh test-2-build cat /etc/resolv.conf > /tmp/info_resolv.conf
[root@ocp3 tmp]# cat info_resolv.conf
nameserver 192.168.1.112
search test.svc.cluster.local svc.cluster.local cluster.local
example.com default.svc.cluster.local 6-master01.ocs.example.com
options ndots:5
[root@ocp3 tmp]#

BUILDER LOGS:

[root@ocp3 ~]# oc logs bc/test
Cloning "https://github.com/openshift/cakephp-ex.git" ...
        Commit: 7969534afdf9490ca79e37e672f0b9c81887ec28 (Merge pull
request #81 from bparees/readiness)
        Author: Ben Parees <bparees.github.com>
        Date:   Mon Sep 11 01:15:51 2017 -0400
---> Installing application source...
Found 'composer.json', installing dependencies using composer.phar...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  298k  100  298k    0     0   196k      0  0:00:01  0:00:01 --:--:--  196k
All settings correct for using Composer
Downloading...
Composer (version 1.5.2) successfully installed to:
/opt/app-root/src/composer.phar
Use it: php composer.phar
Loading composer repositories with package information
Installing dependencies (including require-dev) from lock file
Package operations: 10 installs, 0 updates, 0 removals
  - Installing squizlabs/php_codesniffer (1.5.6): Downloading (100%)
  - Installing cakephp/cakephp-codesniffer (1.0.2): Downloading (100%)
  - Installing phpunit/php-token-stream (1.2.2): Downloading (100%)
  - Installing symfony/yaml (v2.8.16): Downloading (100%)
  - Installing phpunit/php-text-template (1.2.1): Downloading (100%)
  - Installing phpunit/phpunit-mock-objects (1.2.3): Downloading (100%)
  - Installing phpunit/php-timer (1.0.8): Downloading (100%)
  - Installing phpunit/php-file-iterator (1.4.2): Downloading (100%)
  - Installing phpunit/php-code-coverage (1.2.18): Downloading (100%)
  - Installing phpunit/phpunit (3.7.38): Downloading (100%)
phpunit/php-code-coverage suggests installing ext-xdebug (>=2.0.5)
phpunit/phpunit suggests installing phpunit/php-invoker (~1.1)
Generating optimized autoload files
Pushing image docker-registry.default.svc:5000/test/test:latest ...
Warning: Push failed, retrying in 5s ...
Warning: Push failed, retrying in 5s ...
Warning: Push failed, retrying in 5s ...
Warning: Push failed, retrying in 5s ...
Warning: Push failed, retrying in 5s ...
Warning: Push failed, retrying in 5s ...
Warning: Push failed, retrying in 5s ...
Registry server Address:
Registry server User Name: serviceaccount
Registry server Email: serviceaccount
Registry server Password: <<non-empty>>
error: build error: Failed to push image: After retrying 6 times, Push
image still failed
[root@ocp3 ~]#



DOCKER REGISTRY LOGS:

[root@ocp3 ~]# oc get pods
NAME                       READY     STATUS    RESTARTS   AGE
docker-registry-1-pv1f4    1/1       Running   0          3h
registry-console-1-rfvgv   1/1       Running   0          3h
router-1-czr1n             1/1       Running   0          4h
[root@ocp3 ~]#

time="2017-10-08T17:06:07Z" level=warning msg="Ignoring unrecognized
environment variable REGISTRY_CONSOLE_PORT"
time="2017-10-08T17:06:07Z" level=warning msg="Ignoring unrecognized
environment variable REGISTRY_CONSOLE_PORT_9000_TCP"
time="2017-10-08T17:06:07Z" level=warning msg="Ignoring unrecognized
environment variable REGISTRY_CONSOLE_PORT_9000_TCP_ADDR"
time="2017-10-08T17:06:07Z" level=warning msg="Ignoring unrecognized
environment variable REGISTRY_CONSOLE_PORT_9000_TCP_PORT"
time="2017-10-08T17:06:07Z" level=warning msg="Ignoring unrecognized
environment variable REGISTRY_CONSOLE_PORT_9000_TCP_PROTO"
time="2017-10-08T17:06:07Z" level=warning msg="Ignoring unrecognized
environment variable REGISTRY_CONSOLE_SERVICE_HOST"
time="2017-10-08T17:06:07Z" level=warning msg="Ignoring unrecognized
environment variable REGISTRY_CONSOLE_SERVICE_PORT"
time="2017-10-08T17:06:07Z" level=warning msg="Ignoring unrecognized
environment variable REGISTRY_CONSOLE_SERVICE_PORT_REGISTRY_CONSOLE"
time="2017-10-08T17:06:07.818480818Z" level=info msg="version=v2.4.1+unknown"
time="2017-10-08T17:06:07.819518926Z" level=info msg="OpenShift
middleware for storage driver initializing"
time="2017-10-08T17:06:07.819540839Z" level=info msg="redis not
configured" go.version=go1.7.6
instance.id=3548f933-3daa-4453-828b-5e8100383a16
openshift.logger=registry
time="2017-10-08T17:06:07.833687986Z" level=info msg="Starting upload
purge in 31m0s" go.version=go1.7.6
instance.id=3548f933-3daa-4453-828b-5e8100383a16
openshift.logger=registry
time="2017-10-08T17:06:07.835892641Z" level=info msg="using inmemory
blob descriptor cache" go.version=go1.7.6
instance.id=3548f933-3daa-4453-828b-5e8100383a16
openshift.logger=registry
time="2017-10-08T17:06:07.835905245Z" level=info msg="OpenShift
registry middleware initializing"
time="2017-10-08T17:06:07.835912344Z" level=info msg="Using Origin
Auth handler" go.version=go1.7.6
instance.id=3548f933-3daa-4453-828b-5e8100383a16
openshift.logger=registry
time="2017-10-08T17:06:07.835923052Z" level=debug msg="configured
\"openshift\" access controller" go.version=go1.7.6
instance.id=3548f933-3daa-4453-828b-5e8100383a16
openshift.logger=registry
time="2017-10-08T17:06:07.835949017Z" level=debug msg="configured
token endpoint at \"/openshift/token\"" go.version=go1.7.6
instance.id=3548f933-3daa-4453-828b-5e8100383a16
openshift.logger=registry
time="2017-10-08T17:06:07.836315032Z" level=info msg="listening on
:5000" go.version=go1.7.6
instance.id=3548f933-3daa-4453-828b-5e8100383a16
openshift.logger=registry
10.128.0.1 - - [08/Oct/2017:17:06:08 +0000] "GET /healthz HTTP/1.1"
200 0 "" "Go-http-client/1.1"
10.128.0.1 - - [08/Oct/2017:17:06:15 +0000] "GET /healthz HTTP/1.1"
200 0 "" "Go-http-client/1.1"
10.128.0.1 - - [08/Oct/2017:17:06:25 +0000] "GET /healthz HTTP/1.1"
200 0 "" "Go-http-client/1.1"
10.128.0.1 - - [08/Oct/2017:17:06:25 +0000] "GET /healthz HTTP/1.1"
200 0 "" "Go-http-client/1.1"
10.128.0.1 - - [08/Oct/2017:17:06:35 +0000] "GET /healthz HTTP/1.1"
200 0 "" "Go-http-client/1.1"
10.128.0.1 - - [08/Oct/2017:17:06:35 +0000] "GET /healthz HTTP/1.1"
200 0 "" "Go-http-client/1.1"
10.128.0.1 - - [08/Oct/2017:17:06:45 +0000] "GET /healthz HTTP/1.1"
200 0 "" "Go-http-client/1.1"
10.128.0.1 - - [08/Oct/2017:17:06:45 +0000] "GET /healthz HTTP/1.1"
200 0 "" "Go-http-client/1.1"
10.128.0.1 - - [08/Oct/2017:17:06:55 +0000] "GET /healthz HTTP/1.1"
200 0 "" "Go-http-client/1.1"
10.128.0.1 - - [08/Oct/2017:17:06:55 +0000] "GET /healthz HTTP/1.1"
200 0 "" "Go-http-client/1.1"
10.128.0.1 - - [08/Oct/2017:17:07:05 +0000] "GET /healthz HTTP/1.1"
200 0 "" "Go-http-client/1.1"
10.128.0.1 - - [08/Oct/2017:17:07:05 +0000] "GET /healthz HTTP/1.1"
200 0 "" "Go-http-client/1.1"
10.128.0.1 - - [08/Oct/2017:17:07:15 +0000] "GET /healthz HTTP/1.1"
200 0 "" "Go-http-client/1.1"
10.128.0.1 - - [08/Oct/2017:17:07:15 +0000] "GET /healthz HTTP/1.1"
200 0 "" "Go-http-client/1.1"
10.128.0.1 - - [08/Oct/2017:17:07:25 +0000] "GET /healthz HTTP/1.1"
200 0 "" "Go-http-client/1.1"
10.128.0.1 - - [08/Oct/2017:17:07:25 +0000] "GET /healthz HTTP/1.1"
200 0 "" "Go-http-client/1.1"
10.128.0.1 - - [08/Oct/2017:17:07:35 +0000] "GET /healthz HTTP/1.1"
200 0 "" "Go-http-client/1.1"
10.128.0.1 - - [08/Oct/2017:17:07:35 +0000] "GET /healthz HTTP/1.1"
200 0 "" "Go-http-client/1.1"
10.128.0.1 - - [08/Oct/2017:17:07:45 +0000] "GET /healthz HTTP/1.1"
200 0 "" "Go-http-client/1.1"
10.128.0.1 - - [08/Oct/2017:17:07:45 +0000] "GET /healthz HTTP/1.1"
200 0 "" "Go-http-client/1.1"
10.128.0.1 - - [08/Oct/2017:17:07:55 +0000] "GET /healthz HTTP/1.1"
200 0 "" "Go-http-client/1.1"







////////////////////////////////////////////////////////////////////////////////////


(In reply to Kenjiro Nakayama from comment #2)
> Description of problem:
> - When there are no `search xxx` in /etc/resolv.conf on the host,
> 99-origin-dns.sh misses to add `cluster.local` to search domain.
> 
> Version-Release number of selected component (if applicable):
> - OCP 3.6
> 
> How reproducible:
> - 100% (if /etc/resolv.conf doesn't have `search xxx` by default.)
> 
> Steps to reproduce:
> - 0. Install OpenShift Node.
> - 1. Setup an env which does not have `search xxx` in /etc/resolv.conf by
> default.
> - 2. Restart NetworkManager.
> 
> Actual results:
> - /etc/resolv.conf doesn't have "search cluster.local"
> 
> Expected results:
> - /etc/resolv.conf has "search cluster.local"
> 
> Additional info:
> - Current fix https://github.com/openshift/openshift-ansible/pull/5398 still
> misses the setting.
> - Here is a proposal patch
> https://github.com/openshift/openshift-ansible/pull/5585

Comment 7 Kenjiro Nakayama 2017-10-09 02:29:48 UTC
@puneet, I have already sent the fixed PR here:

https://github.com/openshift/openshift-ansible/pull/5585

But stil not merged.

Comment 8 puneet 2017-10-09 04:47:18 UTC
Hi Kenjiro ,

Thanks for your update. I'm sure there will be some workaround to get over this error ? If yes, can someone please provide the detail steps ? I'm currently stuck with build process and cannot proceed further.If not, i understand i have to wait for the updated 99-origin-dns.sh script?

Also , i have one question since my error (Push failed, retrying in 5s ...) is  different than what is mentioned in this ticket i was wondering if this is what everyone is seeing now ? The reason i ask is as you can see below my host's /etc/resolv.conf is correctly populated . I had to manually update this file before cluster install and protected it with chattr +i command .But still image push is timing out for me????


[root@ocp3 ~]# cat /etc/resolv.conf
# Generated by NetworkManager
search example.com default.svc.cluster.local svc.cluster.local
cluster.local 6-master01.ocs.example.com
nameserver 192.168.1.104
[root@ocp3 ~]#








(In reply to Kenjiro Nakayama from comment #7)
> @puneet, I have already sent the fixed PR here:
> 
> https://github.com/openshift/openshift-ansible/pull/5585
> 
> But stil not merged.

Comment 9 Kenjiro Nakayama 2017-10-09 05:17:17 UTC
@puneet,

> Also , i have one question since my error (Push failed, retrying in 5s ...) is  > different than what is mentioned in this ticket i was wondering if this is what > everyone is seeing now ?

I believe that your guessing is correct. Yours is different.

Te issue which this bz is addressing is that "/etc/resolv.conf" does not have "search cluster.local", due to the bug of 99-origin-dns.sh.
However, you added `cluster.local` to /etc/resolv.conf manually but still failed to push. So, most probably the bz is different from yours. Could you please open a support ticket or ask openshift-sme ML?

Comment 10 Scott Dodson 2017-10-11 15:17:40 UTC
Proposed fix merged.

Comment 13 Gan Huang 2017-10-13 09:41:33 UTC
Getting stuck having such reproducible environment.

On OpenStack, after disabling the DHCP functionality of the subnet, still got the search domain:

# cat /etc/resolv.conf 
# Generated by NetworkManager
search localdomain


Will do further investigation for the test scenario in the next few days.

Comment 14 Kenjiro Nakayama 2017-10-14 11:01:30 UTC
@Gan,
I tested this on my env by adding "PEERDNS=no" to my /etc/sysconfig/network-scripts/ifcfg-eth0. Could you please try it?

Comment 15 Gan Huang 2017-10-16 08:32:27 UTC
Thanks Kenjiro! That works.

Reproduced with
openshift-ansible-3.6.173.0.45-1.git.0.dc70c99.el7.noarch.rpm

Verified in
openshift-ansible-3.7.0-0.148.0.git.0.b35eb14.el7.noarch.rpm

Comment 19 errata-xmlrpc 2017-11-28 22:13:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188


Note You need to log in before you can comment on or make changes to this bug.