Description of problem: Iptables shows the incorrect IP address of a endpoint. The iptables are not getting updated. The restart of the openvswitch service will fix the issue. Version-Release number of selected component (if applicable): OpenSHift Container Platfrom 3.6 How reproducible: can't reproduce Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Please grab the node logs and the output from iptables-save.
Also, please grab the logs from the affected node.
Hello, got the reply from customer. The iptables is not getting updated: The main reason is that somehow nodes loose commands from master. Nodes don't update iptables rules for pods, even though the system knows what should be updated. So the etcd stores the info right, and it can be retrieved right using client, but all the nodes stop receiving iptables updates. Pod works and gets righ ip, and local traffic works to it, but iptables doesn't implement the right forward/nat rules needed to forward traffic to/from the pod. There is a marking in logs for iptables version not being detected, which could result to this perhaps: Oct 26 08:49:35 oc-master-1-0 atomic-openshift-node: I1026 08:49:35.510552 120469 iptables.go:562] couldn't get iptables-restore version; assuming it doesn't support --wait Oct 26 08:49:35 oc-master-1-0 atomic-openshift-node: I1026 08:49:35.510552 120469 iptables.go:562] couldn't get iptables-restore version; assuming it doesn't support --wait
Hello, maybe found the issue. I have lab with 3.5 with iptables: iptables-1.4.21-17.el7.x86_64 and the iptables-restore --version doesn't work as on customers. I did check on 3.6 where iptables runs with version iptables-1.4.21-18.el7.x86_64 and it works. I suggested to update. However, if with the 3.6, shouldn't it be mandatory to have the version of iptables? Thx
Hello Ben, definitely the issue with version. After update to version 18 of iptables, it works. The next step is to make the dependency between the 3.6 openshift and iptables 18. Thx
The problem was the deadlock that Miheer identified. A thread held a lock and never released it because a for loop never triggers deferred actions.
verified in atomic-openshift-3.6.173.0.94-1.git.0.8525e8f.el7.x86_64 and the issue has been fixed.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0113