Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1662122

Summary:	OpenShift node drop DNS package to dnsmasq caused pod cost 5s to resolve svc hostnames
Product:	OpenShift Container Platform	Reporter:	wangzhida <zhiwang>
Component:	Networking	Assignee:	Dan Mace <dmace>
Networking sub component:	router	QA Contact:	Hongan Li <hongli>
Status:	CLOSED DUPLICATE	Docs Contact:
Severity:	high
Priority:	high	CC:	aivaraslaimikis, aos-bugs, dmace, hongli, scuppett, sreber, weliang
Version:	3.11.0
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-01-18 13:20:37 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Comment 10 Weibin Liang 2019-01-09 20:52:21 UTC

The problem can be reproduced every time now

Setup cluster through QE Jenkins tool: https://openshift-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/Launch%20Environment%20Flexy/

Same curl delay 5s issue can only be seen when set vm_type: m1.large, but not vm_type: m1.medium

Same curl delay 5s issue can be seen in both v3.9.51 and v3.11.67

[root@qe-weliang-case4master-etcd-nfs-1 ~]# oc new-app https://github.com/OpenShiftDemos/os-sample-python.git
[root@qe-weliang-case4master-etcd-nfs-1 ~]# oc rsh os-sample-python-1-hv9ms
(app-root) sh-4.2$ export svc5=os-sample-python.p1.svc.cluster.local
(app-root) sh-4.2$ OUTPUT="        %{time_namelookup}     %{time_connect}        %{time_appconnect}         %{time_pretransfer}      %{time_redirect}           %{time_starttransfer}   %{time_total}\n"; echo ""; echo "                        time   namelookup   connect   appconnect   pretransfer   redirect   starttransfer   total"; echo "-----------------------------------------------------------------------------------------------------------------"; while true; do echo -n "$(date)"; curl -w "$OUTPUT" -o /dev/null -s $svc5:8080; sleep 2; echo -n "$(date)"; curl -w "$OUTPUT" -o /dev/null -s $svc5:8080; sleep 120; done

                        time   namelookup   connect   appconnect   pretransfer   redirect   starttransfer   total
-----------------------------------------------------------------------------------------------------------------
Wed Jan  9 20:35:16 UTC 2019        0.125     0.126        0.000         0.126      0.000           0.130   0.130
Wed Jan  9 20:35:19 UTC 2019        0.012     0.013        0.000         0.013      0.000           0.016   0.016
Wed Jan  9 20:37:19 UTC 2019        5.515     5.515        0.000         5.516      0.000           5.518   5.519
Wed Jan  9 20:37:26 UTC 2019        0.012     0.013        0.000         0.013      0.000           0.016   0.016
Wed Jan  9 20:39:26 UTC 2019        5.514     5.515        0.000         5.515      0.000           5.521   5.521
Wed Jan  9 20:39:34 UTC 2019        0.012     0.013        0.000         0.013      0.000           0.016   0.016
Wed Jan  9 20:41:34 UTC 2019        5.513     5.514        0.000         5.514      0.000           5.519   5.519
Wed Jan  9 20:41:41 UTC 2019        0.012     0.013        0.000         0.013      0.000           0.014   0.014
Wed Jan  9 20:43:41 UTC 2019        5.515     5.515        0.000         5.516      0.000           5.518   5.518
Wed Jan  9 20:43:49 UTC 2019        0.012     0.013        0.000         0.013      0.000           0.014   0.014

Comment 11 Weibin Liang 2019-01-10 16:34:50 UTC

Get more interesting results when running more testing:

Same testing env setup:
v3.9.51
Red Hat Enterprise Linux Server release 7.6 (Maipo)
openvswitch-2.9.0-83.el7fdp.1.x86_64
openvswitch-selinux-extra-policy-1.0-8.el7fdp.noarch

Case1: Cluster created in openstack with instance_type: m1.large
Test result: fail

Case2: Cluster created in openstack with instance_type: m1.medium
Test result: pass

Case3: Cluster created in AWS/EC2 with instance_type: m1.large
Test result: pass

Case4: Cluster created in AWS/EC2  with instance_type: m1.medium
Test result: pass

All above test results are consistent.

Comment 12 wangzhida 2019-01-15 02:52:07 UTC

(In reply to Weibin Liang from comment #11)

Hi, Weibin:

Could you help to test to add below settings to centos dc/pod and check whether the latency will be reduced or not ? 
...
spec
  template
    spec
      ...
      dnsConfig:
        options:
        - name: single-request
      dnsPolicy: ClusterFirst
...


Besides, It is better to test same pods running with hostnetwork because the customer said it has no issue as from nodes.

Thanks a lot.

Comment 13 Weibin Liang 2019-01-15 16:14:50 UTC

Hi Wangzhida,

Editing dc to have dnsConfig option name:single-request can be a optional workaround for this bug.

Tested in oc v3.11.69

[root@qe-weliang-311master-etcd-nfs-1 ~]# oc edit deploymentconfig.apps.openshift.io/os-sample-python
spec
  template
    spec
      ...
      dnsConfig:
        options:
        - name: single-request
      dnsPolicy: ClusterFirst


[root@qe-weliang-311master-etcd-nfs-1 ~]# oc rsh os-sample-python-2-jdm45
(app-root) sh-4.2$ OUTPUT="        %{time_namelookup}     %{time_connect}        %{time_appconnect}         %{time_pretransfer}      %{time_redirect}           %{time_starttransfer}   %{time_total}\n"; echo ""; echo "                        time   namelookup   connect   appconnect   pretransfer   redirect   starttransfer   total"; echo "-----------------------------------------------------------------------------------------------------------------"; while true; do echo -n "$(date)"; curl -w "$OUTPUT" -o /dev/null -s $svc5:8080; sleep 2; echo -n "$(date)"; curl -w "$OUTPUT" -o /dev/null -s $svc5:8080; sleep 120; done

                        time   namelookup   connect   appconnect   pretransfer   redirect   starttransfer   total
-----------------------------------------------------------------------------------------------------------------
Tue Jan 15 16:06:29 UTC 2019        0.013     0.013        0.000         0.013      0.000           0.014   0.014
Tue Jan 15 16:06:31 UTC 2019        0.012     0.012        0.000         0.012      0.000           0.014   0.014
Tue Jan 15 16:08:31 UTC 2019        0.012     0.013        0.000         0.013      0.000           0.014   0.014
Tue Jan 15 16:08:33 UTC 2019        0.012     0.012        0.000         0.013      0.000           0.014   0.014
Tue Jan 15 16:10:33 UTC 2019        0.012     0.013        0.000         0.013      0.000           0.014   0.014
Tue Jan 15 16:10:35 UTC 2019        0.012     0.012        0.000         0.013      0.000           0.013   0.013
Tue Jan 15 16:12:35 UTC 2019        0.012     0.013        0.000         0.013      0.000           0.014   0.014
Tue Jan 15 16:12:37 UTC 2019        0.012     0.013        0.000         0.013      0.000           0.013   0.013

Comment 15 Stephen Cuppett 2019-01-18 13:20:37 UTC


*** This bug has been marked as a duplicate of bug 1661928 ***

Comment 16 Red Hat Bugzilla 2023-09-15 01:28:02 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days