Bug 1662982 - SkyDNS not responding on parallel requests from applications inside pods
Summary: SkyDNS not responding on parallel requests from applications inside pods
Keywords:
Status: CLOSED DUPLICATE of bug 1661928
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.9.0
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ---
: ---
Assignee: Dan Mace
QA Contact: Hongan Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-01-02 16:03 UTC by Dmitry Zhukovski
Modified: 2023-03-24 14:28 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-01-23 20:39:55 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Dmitry Zhukovski 2019-01-02 16:03:12 UTC
Description of problem:
SkyDNS not responding on parallel requests from applications inside pods

Version-Release number of selected component (if applicable):
3.9

How reproducible:
everytime

Steps to Reproduce:
1. Create a pod with .net app. Make it connecting to another service within same project
2.
3.

Actual results:
At first app needs to resolve DNS of service. 
App queries DNS with svc.cluster.local by sending both A and AAAA requests in parallel. And it does not get reply on AAAA

After some attempts app sends DNS requests sequentially and gets correct answers and is able to proceed with connection.

Expected results:
app gets dns resolved in first attempt

Additional info:
more info in next comment. Also pcap is attached.

Comment 20 Weibin Liang 2019-01-09 20:51:12 UTC
The problem can be reproduced every time now

Setup cluster through QE Jenkins tool: https://openshift-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/Launch%20Environment%20Flexy/

Same curl delay 5s issue can only be seen when set vm_type: m1.large, but not vm_type: m1.medium

Same curl delay 5s issue can be seen in both v3.9.51 and v3.11.67

[root@qe-weliang-case4master-etcd-nfs-1 ~]# oc new-app https://github.com/OpenShiftDemos/os-sample-python.git
[root@qe-weliang-case4master-etcd-nfs-1 ~]# oc rsh os-sample-python-1-hv9ms
(app-root) sh-4.2$ export svc5=os-sample-python.p1.svc.cluster.local
(app-root) sh-4.2$ OUTPUT="        %{time_namelookup}     %{time_connect}        %{time_appconnect}         %{time_pretransfer}      %{time_redirect}           %{time_starttransfer}   %{time_total}\n"; echo ""; echo "                        time   namelookup   connect   appconnect   pretransfer   redirect   starttransfer   total"; echo "-----------------------------------------------------------------------------------------------------------------"; while true; do echo -n "$(date)"; curl -w "$OUTPUT" -o /dev/null -s $svc5:8080; sleep 2; echo -n "$(date)"; curl -w "$OUTPUT" -o /dev/null -s $svc5:8080; sleep 120; done

                        time   namelookup   connect   appconnect   pretransfer   redirect   starttransfer   total
-----------------------------------------------------------------------------------------------------------------
Wed Jan  9 20:35:16 UTC 2019        0.125     0.126        0.000         0.126      0.000           0.130   0.130
Wed Jan  9 20:35:19 UTC 2019        0.012     0.013        0.000         0.013      0.000           0.016   0.016
Wed Jan  9 20:37:19 UTC 2019        5.515     5.515        0.000         5.516      0.000           5.518   5.519
Wed Jan  9 20:37:26 UTC 2019        0.012     0.013        0.000         0.013      0.000           0.016   0.016
Wed Jan  9 20:39:26 UTC 2019        5.514     5.515        0.000         5.515      0.000           5.521   5.521
Wed Jan  9 20:39:34 UTC 2019        0.012     0.013        0.000         0.013      0.000           0.016   0.016
Wed Jan  9 20:41:34 UTC 2019        5.513     5.514        0.000         5.514      0.000           5.519   5.519
Wed Jan  9 20:41:41 UTC 2019        0.012     0.013        0.000         0.013      0.000           0.014   0.014
Wed Jan  9 20:43:41 UTC 2019        5.515     5.515        0.000         5.516      0.000           5.518   5.518
Wed Jan  9 20:43:49 UTC 2019        0.012     0.013        0.000         0.013      0.000           0.014   0.014

Comment 21 Weibin Liang 2019-01-10 16:34:27 UTC
Get more interesting results when running more testing:

Same testing env setup:
v3.9.51
Red Hat Enterprise Linux Server release 7.6 (Maipo)
openvswitch-2.9.0-83.el7fdp.1.x86_64
openvswitch-selinux-extra-policy-1.0-8.el7fdp.noarch

Case1: Cluster created in openstack with instance_type: m1.large
Test result: fail

Case2: Cluster created in openstack with instance_type: m1.medium
Test result: pass

Case3: Cluster created in AWS/EC2 with instance_type: m1.large
Test result: pass

Case4: Cluster created in AWS/EC2  with instance_type: m1.medium
Test result: pass

All above test results are consistent.

Comment 23 Dan Mace 2019-01-23 20:39:55 UTC
This bug is almost certainly a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1661928, so I'm consolidating into the other bug (which predates this one).

*** This bug has been marked as a duplicate of bug 1661928 ***


Note You need to log in before you can comment on or make changes to this bug.