Bug 1269454

Summary: openshift-node should wait for xtables lock to be released
Product: OpenShift Container Platform Reporter: Evgheni Dereveanchin <ederevea>
Component: ContainersAssignee: Paul Weil <pweil>
Status: CLOSED DUPLICATE QA Contact: Chao Yang <chaoyang>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.0.0CC: aos-bugs, jokerman, mmccomas, pep
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-10-14 11:31:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Evgheni Dereveanchin 2015-10-07 11:53:16 UTC
Description of problem:
openshift-node uses the iptables binary to set up rules at boot. Since RHEL 7.1 there is a locking mechanism to protect against two instances of iptables running simultaneously. Currently in the event of locking the process will fail producing inconsistent rulesets. This can happen when another script (firewalld/iptables/etc) is started at the same time as openshift-node

Version-Release number of selected component (if applicable):
3.0.2

How reproducible:
rarely in conditions when some other script is starting at the same time as openshift-node and using the iptables binary

Steps to Reproduce:
no clear reproducer at the moment. In some configuration another script may be running which also invokes iptables.

Actual results:

Sep 30 15:15:15 node1.demo.lan openshift-node[2587]: ++ iptables -nvL INPUT --line-numbers
Sep 30 15:46:27 node1.demo.lan openshift-node[2587]: ++ grep 'state RELATED,ESTABLISHED'
Sep 30 15:46:27 node1.demo.lan openshift-node[2587]: Another app is currently holding the xtables lock. Perhaps you want to use the -w option?
Sep 30 15:46:27 node1.demo.lan openshift-node[2587]: ++ awk '{print $1}'
Sep 30 15:46:27 node1.demo.lan openshift-node[2587]: + lineno=
Sep 30 15:46:27 node1.demo.lan openshift-node[2587]: + iptables -I INPUT -p udp -m multiport --dports 4789 -m comment --comment '001 vxlan incoming' -j ACCEPT
Sep 30 15:46:27 node1.demo.lan openshift-node[2587]: Another app is currently holding the xtables lock. Perhaps you want to use the -w option?
Sep 30 15:46:27 node1.demo.lan openshift-node[2587]: E0930 15:46:27.927232    2587 kube.go:39] Error executing setup script.

Expected results:

Sep 30 15:15:15 node1.demo.lan openshift-node[2200]: ++ iptables -nvL INPUT --line-numbers
Sep 30 15:15:15 node1.demo.lan openshift-node[2200]: ++ grep 'state RELATED,ESTABLISHED'
Sep 30 15:15:15 node1.demo.lan openshift-node[2200]: ++ awk '{print $1}'
Sep 30 15:15:15 node1.demo.lan openshift-node[2200]: + lineno=1
Sep 30 15:15:15 node1.demo.lan openshift-node[2200]: + iptables -I INPUT 1 -p udp -m multiport --dports 4789 -m comment --comment '001 vxlan incoming' -j ACCEPT
Sep 30 15:15:15 node1.demo.lan openshift-node[2200]: + iptables -I INPUT 2 -i tun0 -m comment --comment 'traffic from docker for internet' -j ACCEPT

Additional info:

Note that due to locking listing and consecutive parsing goes wrong so that lineno is not set which breaks consecutive rules.
A solution (as noted in the error message) is to add a -w option to wait for the lock to be released or -w2 to wait for 2 seconds and then fail. Another option would be to catch these types of errors and wait or stop processing completely instead of trying to run incorrect iptables commands

Comment 1 Josep 'Pep' Turro Mauri 2015-10-14 11:31:48 UTC

*** This bug has been marked as a duplicate of bug 1267670 ***