Bug 1860200

Summary: local (/etc/hosts) name resolution failing intermittently in application pods
Product: OpenShift Container Platform Reporter: Anand Paladugu <apaladug>
Component: ContainersAssignee: Tom Sweeney <tsweeney>
Status: CLOSED DUPLICATE QA Contact: Sunil Choudhary <schoudha>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.11.0CC: aos-bugs, dwalsh, jokerman
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-07-24 18:52:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Anand Paladugu 2020-07-24 01:18:11 UTC
Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. run curl command to reach the endpoint from inside the pod in a loop
2.
3.

Actual results:

curl fails sometimes

Expected results:

curl should pass all the time

Additional info:

customer is having intermittent pod egress connectivity issues in an OCP 3.11 with egress router in a proxied environment.

Use Case:  Application pod connects to an external endpoint URL (via proxy) to upload files.

Proxy has multiple interfaces and it is defined as  148.171.179.249 in the corporate DNS,  and it is defined as   192.168.219.120 in /etc/hosts in the app pod.  App pod needs to connect to proxy @ 192.168.219.120, to establish a session with endpoint URL,  otherwise, the endpoint URL blocks the connection.  

App pod /etc/nsswitch.conf has  "hosts:  files dns"

Issue:  10% of app pod requests to the endpoint URL are failing

Observations:

1. TCP dump shows that sometimes the DNS resolution for the proxy is happening upstream (as if local /etc/hosts resolution is failing) which results in pod connecting to 148.171.179.249 and subsequently fails to connect with endpoint URL.

2.  Strace of curl in the pod shows that /etc/hosts is not red the same way every time.  Some times the contents of the /etc/hosts do not have the proxy line, which is resulting in DNS resolution, and subsequent failures.

3. No resolution related errors are seen in the nodes SOS report.



sosreport, tcpdump and strace outputs are available in the case

Comment 1 Tom Sweeney 2020-07-24 18:52:32 UTC
This appears to be a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1860201.  As such, I'm going to close this one.  If I'm mistaken, or information that should have been entered into the BZ is missing, please feel free to reopen this BZ and update it.

*** This bug has been marked as a duplicate of bug 1860201 ***