Bug 1662122 - OpenShift node drop DNS package to dnsmasq caused pod cost 5s to resolve svc hostnames
Summary: OpenShift node drop DNS package to dnsmasq caused pod cost 5s to resolve svc ...
Keywords:
Status: CLOSED DUPLICATE of bug 1661928
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Dan Mace
QA Contact: Hongan Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-12-26 09:38 UTC by wangzhida
Modified: 2023-09-15 01:28 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-01-18 13:20:37 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Comment 10 Weibin Liang 2019-01-09 20:52:21 UTC
The problem can be reproduced every time now

Setup cluster through QE Jenkins tool: https://openshift-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/Launch%20Environment%20Flexy/

Same curl delay 5s issue can only be seen when set vm_type: m1.large, but not vm_type: m1.medium

Same curl delay 5s issue can be seen in both v3.9.51 and v3.11.67

[root@qe-weliang-case4master-etcd-nfs-1 ~]# oc new-app https://github.com/OpenShiftDemos/os-sample-python.git
[root@qe-weliang-case4master-etcd-nfs-1 ~]# oc rsh os-sample-python-1-hv9ms
(app-root) sh-4.2$ export svc5=os-sample-python.p1.svc.cluster.local
(app-root) sh-4.2$ OUTPUT="        %{time_namelookup}     %{time_connect}        %{time_appconnect}         %{time_pretransfer}      %{time_redirect}           %{time_starttransfer}   %{time_total}\n"; echo ""; echo "                        time   namelookup   connect   appconnect   pretransfer   redirect   starttransfer   total"; echo "-----------------------------------------------------------------------------------------------------------------"; while true; do echo -n "$(date)"; curl -w "$OUTPUT" -o /dev/null -s $svc5:8080; sleep 2; echo -n "$(date)"; curl -w "$OUTPUT" -o /dev/null -s $svc5:8080; sleep 120; done

                        time   namelookup   connect   appconnect   pretransfer   redirect   starttransfer   total
-----------------------------------------------------------------------------------------------------------------
Wed Jan  9 20:35:16 UTC 2019        0.125     0.126        0.000         0.126      0.000           0.130   0.130
Wed Jan  9 20:35:19 UTC 2019        0.012     0.013        0.000         0.013      0.000           0.016   0.016
Wed Jan  9 20:37:19 UTC 2019        5.515     5.515        0.000         5.516      0.000           5.518   5.519
Wed Jan  9 20:37:26 UTC 2019        0.012     0.013        0.000         0.013      0.000           0.016   0.016
Wed Jan  9 20:39:26 UTC 2019        5.514     5.515        0.000         5.515      0.000           5.521   5.521
Wed Jan  9 20:39:34 UTC 2019        0.012     0.013        0.000         0.013      0.000           0.016   0.016
Wed Jan  9 20:41:34 UTC 2019        5.513     5.514        0.000         5.514      0.000           5.519   5.519
Wed Jan  9 20:41:41 UTC 2019        0.012     0.013        0.000         0.013      0.000           0.014   0.014
Wed Jan  9 20:43:41 UTC 2019        5.515     5.515        0.000         5.516      0.000           5.518   5.518
Wed Jan  9 20:43:49 UTC 2019        0.012     0.013        0.000         0.013      0.000           0.014   0.014

Comment 11 Weibin Liang 2019-01-10 16:34:50 UTC
Get more interesting results when running more testing:

Same testing env setup:
v3.9.51
Red Hat Enterprise Linux Server release 7.6 (Maipo)
openvswitch-2.9.0-83.el7fdp.1.x86_64
openvswitch-selinux-extra-policy-1.0-8.el7fdp.noarch

Case1: Cluster created in openstack with instance_type: m1.large
Test result: fail

Case2: Cluster created in openstack with instance_type: m1.medium
Test result: pass

Case3: Cluster created in AWS/EC2 with instance_type: m1.large
Test result: pass

Case4: Cluster created in AWS/EC2  with instance_type: m1.medium
Test result: pass

All above test results are consistent.

Comment 12 wangzhida 2019-01-15 02:52:07 UTC
(In reply to Weibin Liang from comment #11)

Hi, Weibin:

Could you help to test to add below settings to centos dc/pod and check whether the latency will be reduced or not ? 
...
spec
  template
    spec
      ...
      dnsConfig:
        options:
        - name: single-request
      dnsPolicy: ClusterFirst
...


Besides, It is better to test same pods running with hostnetwork because the customer said it has no issue as from nodes.

Thanks a lot.

Comment 13 Weibin Liang 2019-01-15 16:14:50 UTC
Hi Wangzhida,

Editing dc to have dnsConfig option name:single-request can be a optional workaround for this bug.

Tested in oc v3.11.69

[root@qe-weliang-311master-etcd-nfs-1 ~]# oc edit deploymentconfig.apps.openshift.io/os-sample-python
spec
  template
    spec
      ...
      dnsConfig:
        options:
        - name: single-request
      dnsPolicy: ClusterFirst


[root@qe-weliang-311master-etcd-nfs-1 ~]# oc rsh os-sample-python-2-jdm45
(app-root) sh-4.2$ OUTPUT="        %{time_namelookup}     %{time_connect}        %{time_appconnect}         %{time_pretransfer}      %{time_redirect}           %{time_starttransfer}   %{time_total}\n"; echo ""; echo "                        time   namelookup   connect   appconnect   pretransfer   redirect   starttransfer   total"; echo "-----------------------------------------------------------------------------------------------------------------"; while true; do echo -n "$(date)"; curl -w "$OUTPUT" -o /dev/null -s $svc5:8080; sleep 2; echo -n "$(date)"; curl -w "$OUTPUT" -o /dev/null -s $svc5:8080; sleep 120; done

                        time   namelookup   connect   appconnect   pretransfer   redirect   starttransfer   total
-----------------------------------------------------------------------------------------------------------------
Tue Jan 15 16:06:29 UTC 2019        0.013     0.013        0.000         0.013      0.000           0.014   0.014
Tue Jan 15 16:06:31 UTC 2019        0.012     0.012        0.000         0.012      0.000           0.014   0.014
Tue Jan 15 16:08:31 UTC 2019        0.012     0.013        0.000         0.013      0.000           0.014   0.014
Tue Jan 15 16:08:33 UTC 2019        0.012     0.012        0.000         0.013      0.000           0.014   0.014
Tue Jan 15 16:10:33 UTC 2019        0.012     0.013        0.000         0.013      0.000           0.014   0.014
Tue Jan 15 16:10:35 UTC 2019        0.012     0.012        0.000         0.013      0.000           0.013   0.013
Tue Jan 15 16:12:35 UTC 2019        0.012     0.013        0.000         0.013      0.000           0.014   0.014
Tue Jan 15 16:12:37 UTC 2019        0.012     0.013        0.000         0.013      0.000           0.013   0.013

Comment 15 Stephen Cuppett 2019-01-18 13:20:37 UTC

*** This bug has been marked as a duplicate of bug 1661928 ***

Comment 16 Red Hat Bugzilla 2023-09-15 01:28:02 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days


Note You need to log in before you can comment on or make changes to this bug.