Bug 1616150

Summary: [3.7] [RHEL-7.6] Failed to execute iptables-restore: exit status 1 (iptables-restore: invalid option -- '5'
Product: OpenShift Container Platform Reporter: Gan Huang <ghuang>
Component: NetworkingAssignee: Jacob Tanenbaum <jtanenba>
Status: CLOSED ERRATA QA Contact: Weihua Meng <wmeng>
Severity: high Docs Contact:
Priority: high    
Version: 3.7.1CC: ademaria, aos-bugs, bbeaudoi, bbennett, cshereme, danw, dapark, egarver, gferrazs, hchatter, hgomes, jialiu, jkaur, jmalde, lstanton, mhepburn, openshift-bugs-escalate, psutter, rkshirsa, salmy, skulkarn, tomek, wmeng
Target Milestone: ---   
Target Release: 3.7.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1632744 (view as bug list) Environment:
Last Closed: 2018-11-21 11:56:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1632744    

Description Gan Huang 2018-08-15 06:17:43 UTC
Description of problem:
Installation against RHEL-7.6 beta and OCP v3.7.61, consequently the router pod failed to start up:

<--snip-->
  37s		37s		1	kubelet, qe-ghuang-merrn-1			Warning		FailedCreatePodSandBox	Failed create pod sandbox: rpc error: code = 2 desc = NetworkPlugin cni failed to set up pod "router-1-deploy_default" network: CNI request failed with status 400: 'Failed to execute iptables-restore: exit status 1 (iptables-restore: invalid option -- '5'
Try `iptables-restore -h' for more information.

<--snip-->

Version-Release number of selected component (if applicable):
openshift v3.7.61
kubernetes v1.7.6+a08f5eeb62
iptables-1.4.21-28.el7.x86_64

How reproducible:
always

Steps to Reproduce:
1. Trigger 3.7 installation on RHEL-7.6 beta (same results no matter it's firewalld or iptables)


Actual results:
Installation failed due to router pod failed

  37s		37s		1	kubelet, qe-ghuang-merrn-1			Warning		FailedCreatePodSandBox	Failed create pod sandbox: rpc error: code = 2 desc = NetworkPlugin cni failed to set up pod "router-1-deploy_default" network: CNI request failed with status 400: 'Failed to execute iptables-restore: exit status 1 (iptables-restore: invalid option -- '5'
Try `iptables-restore -h' for more information.

Expected results:


Additional info:
Tested against OCP 3.6/3.9.3.10, do not hit the issue.

Tested with OCP 3.7 + RHEL-7.5 (iptables-1.4.21-24.1.el7_5.x86_64), also worked fine.

Comment 1 Gan Huang 2018-08-15 06:21:24 UTC
router pod can be re-deployed successfully after downgrading iptables to iptables-1.4.21-24.1.el7_5.x86_64

Adding test blocker as it's blocking the OCP 3.7 testing against RHEL-7.6

Comment 3 Eric Garver 2018-08-16 15:47:45 UTC
Can you show what is being passed to iptables-restore on stdin?

Comment 4 Phil Sutter 2018-08-16 16:23:44 UTC
This looks like fallout from Bug 1465078 for which iptables-restore argument parser was changed to not ignore unknown parameters given on command line.

Gan Huang, could you please find out how exactly iptables-restore is being called, i.e. what parameters are passed to the command?

Thanks, Phil

Comment 5 Gan Huang 2018-08-17 01:37:29 UTC
Unfortunately I don't know how to check that, the error was threw out by Kubernetes.

A very similar issue found on upstream:
https://github.com/kubernetes/kubernetes/issues/58956

OpenShift networking team should have proper input here.

Comment 7 Meng Bo 2018-08-17 02:36:15 UTC
# iptables-restore -w5
iptables-restore: invalid option -- '5'
Try `iptables-restore -h' for more information.

I can get the same error with the command above.

And from the openshift iptables.go code here:
https://github.com/openshift/origin/blob/release-3.7/vendor/k8s.io/kubernetes/pkg/util/iptables/iptables.go#L123

Seems the openshift failed to judge the iptables version correctly.

Comment 8 Phil Sutter 2018-08-17 12:44:12 UTC
Hi Meng Bo,

(In reply to Meng Bo from comment #7)
> # iptables-restore -w5

Ah yes, that's what I suspected. All iptables tools use getopt(), so '-w5' is equivalent to '-w -5' and there is no '-5' flag.

> iptables-restore: invalid option -- '5'
> Try `iptables-restore -h' for more information.
> 
> I can get the same error with the command above.
> 
> And from the openshift iptables.go code here:
> https://github.com/openshift/origin/blob/release-3.7/vendor/k8s.io/
> kubernetes/pkg/util/iptables/iptables.go#L123

Looking at the comments to that PR, it seems like people start to forget how unix program parameters typically work. :)

An easier solution than the one pointed out there would be to just pass '--wait=5' instead of '-w5'. That should reduce the change set considerably.

> Seems the openshift failed to judge the iptables version correctly.

Maybe I don't get your point here, but '-w5' has never worked and will never work. It's just wrong syntax.

Cheers, Phil

Comment 9 Casey Callendrello 2018-08-17 12:56:12 UTC
Not only that, but iptables-restore didn't get "--wait" support until v1.6.2. The version logic linked is for iptables, not iptables-restore.

Comment 10 Phil Sutter 2018-08-17 13:07:19 UTC
(In reply to Casey Callendrello from comment #9)
> Not only that, but iptables-restore didn't get "--wait" support until
> v1.6.2. The version logic linked is for iptables, not iptables-restore.

In fact, RHEL7 supports --wait option for iptables-restore since iptables-1.4.21-18.el7. The relevant bug is 1438597.

Cheers, Phil

Comment 11 Casey Callendrello 2018-09-19 11:35:34 UTC
CC Dan Winship.

Dan, should we backport https://github.com/kubernetes/kubernetes/pull/60978 ?

Comment 12 Dan Winship 2018-09-19 13:04:22 UTC
Doh. Yes. We backported the fix as far back as OCP 3.9 because the bug was introduced in kube 1.9, but I forgot that we had backported the buggy kube 1.9 code into OCP 3.7 too.

Comment 13 Casey Callendrello 2018-09-20 13:09:40 UTC
Assigning to Jacob.

Comment 16 Weihua Meng 2018-10-08 10:18:30 UTC
fixed.

openshift v3.7.65

iptables-1.4.21-28.el7.x86_64

Kernel Version: 3.10.0-957.el7.x86_64
Operating System: Red Hat Enterprise Linux Server 7.6 (Maipo)

Comment 22 Casey Callendrello 2018-11-20 10:45:14 UTC
*** Bug 1651436 has been marked as a duplicate of this bug. ***

Comment 24 Dan Winship 2018-11-20 13:42:56 UTC
3.10 and later never had the bug; it was fixed upstream before that

Comment 25 errata-xmlrpc 2018-11-21 11:56:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2906