Description of problem: Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. run curl command to reach the endpoint from inside the pod in a loop 2. 3. Actual results: curl fails sometimes Expected results: curl should pass all the time Additional info: customer is having intermittent pod egress connectivity issues in an OCP 3.11 with egress router in a proxied environment. Use Case: Application pod connects to an external endpoint URL (via proxy) to upload files. Proxy has multiple interfaces and it is defined as <internal ip> in the corporate DNS, and it is defined as 192.168.219.120 in /etc/hosts in the app pod. App pod needs to connect to proxy @ 192.168.219.120, to establish a session with endpoint URL, otherwise, the endpoint URL blocks the connection. App pod /etc/nsswitch.conf has "hosts: files dns" Issue: 10% of app pod requests to the endpoint URL are failing Observations: 1. TCP dump shows that sometimes the DNS resolution for the proxy is happening upstream (as if local /etc/hosts resolution is failing) which results in pod connecting to 148.171.179.249 and subsequently fails to connect with endpoint URL. 2. Strace of curl in the pod shows that /etc/hosts is not red the same way every time. Some times the contents of the /etc/hosts do not have the proxy line, which is resulting in DNS resolution, and subsequent failures. 3. No resolution related errors are seen in the nodes SOS report. sosreport, tcpdump and strace outputs are available in the case
*** Bug 1860200 has been marked as a duplicate of this bug. ***
customer is requesting an escalation as more teams are complaining about this issue and they have increases the case severity to 1.
Chatting with Anand, it seems the CU is building this /etc/hosts into their images. The kubelet projects a mount into the pod for /etc/hosts if the pod doesn't define one, and this would overwrite any baked in /etc/hosts. However, I don't have an explanation for why the application seems to be seeing the baked in /etc/hosts 90% of the time and the kubelet projected /etc/hosts the other 10%. If CU is willing, the proper way to do with is with hostAliases on the pod spec https://kubernetes.io/docs/concepts/services-networking/add-entries-to-pod-etc-hosts-with-host-aliases/ https://github.com/kubernetes/kubernetes/blob/release-1.11/pkg/kubelet/kubelet_pods.go#L128-L254
Customer is not willing to change as their env is pretty large. It seems the solution has been working for the past 2 years He did say that he may try the hostaliases just to see, but is not willing to take that as a workaround or solution Also from the call today, I understood that they are not including any /etc/hosts during the image build time. They are adding egress rules to the project settings that is populating the /etc/hosts appropriately during deployment
One other thing they brought up today is that it seems the init container is restarting in almost all pods regularly in about 8 minutes. Will that have any bearing on the /etc/hosts file getting mounted from the node temporarily ?. From we could see when the init container restarts, the other containers in the pod come back up pretty quickly, is if only the state has changes and that containers have not really restarted.
It is true that the /etc/hosts file is shared for all containers in the pod. It may be that the init container restarts are causing the kubelet to rewrite the file, and perhaps it starts with an incorrect version of the file before correcting it. I'll take a look to see how that process works and see if that could explain it. In the mean time, it would be great if the customer could verify that wrong read events correspond with the timing of the init container (or other container) restarts or if that was just a coincidence.
Joel, Per our earlier testing the timings were not quite matching, but I will ask customer to confirm again. But please continue your investigation. Is there any way to have the init container restarted in our local environments so we can reproduce the issue ?
it seems to me that should append onto the images hosts file rather then replace it.
I have re-familiarized myself with the code that manages /etc/hosts. The kubelet when writing the file always delimits with a tab, not a space, so it's clear that one of the containers is adding the proxy line to /etc/hosts for the pod or that they have some process running on the node to add it to the files in /var/lib/origin/openshift.local.volumes/pods/*/etc-hosts. The kubelet expects to be fully managing this file. Every time an init container or a regular container from the pod is started, the file will be overwritten with the content that the kubelet expects it to have. There is a message that will show up in the node logs if logging at level >= 3. The log message will end with "creating hosts mount: true" if the pod's /etc/hosts will be written. Here's what I ran on my node to debug it: $ grep ^DEBUG_LOGLEVEL= /etc/sysconfig/atomic-openshift-node DEBUG_LOGLEVEL=2 $ sudo sed -i 's/^DEBUG_LOGLEVEL=[0-9]*/DEBUG_LOGLEVEL=3/' /etc/sysconfig/atomic-openshift-node $ sudo systemctl restart atomic-openshift-node.service $ journalctl -f -u atomic-openshift-node.service | grep "creating hosts mount: true" Jul 31 13:06:58 rhel7-node3.test1 atomic-openshift-node[16111]: I0731 13:06:58.573914 16111 kubelet_pods.go:138] container: default/busybox/busybox podIP: "10.131.0.2" creating hosts mount: true The system does provide a supported means of injecting custom content into the file, and that is with hostAliases, as mentioned in previous comments. https://access.redhat.com/solutions/3696301 Any other method of adding content to /etc/hosts is unsupported, and the customer should expect that any restarts of containers in the pod (whether init containers or regular containers) will cause the file to be reverted to the state that Kubernetes expected it to be in. The system also provides a method for automatically editing a pod prior to creation. A mutating admission webhook could be created by the customer which would check a new pod's hostAliases, and adds one if not present. However, in 3.11 this feature is only considered TechnologyPreview. https://docs.openshift.com/container-platform/3.11/architecture/additional_concepts/dynamic_admission_controllers.html I'm not sure, but I believe that this feature is considered GA in 4.x. Finally, the customer could provide their own /etc/hosts file for each container which will be completely ignored by kubernetes. Any container that mounts /etc/hosts directly will not get the node-generated file and will not have the file overwritten by the node upon container starts. But this seems like more work than adding hostAliases to pods. This bug should probably be closed with NOTABUG, pending confirmation that the customer is rewriting the file and that their containers are restarting, causing the overwrites by the kubelet.