1616150 – [3.7] [RHEL-7.6] Failed to execute iptables-restore: exit status 1 (iptables-restore: invalid option -- '5'

Bug 1616150 - [3.7] [RHEL-7.6] Failed to execute iptables-restore: exit status 1 (iptables-restore: invalid option -- '5'

Summary: [3.7] [RHEL-7.6] Failed to execute iptables-restore: exit status 1 (iptables-...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	3.7.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	3.7.z
Assignee:	Jacob Tanenbaum
QA Contact:	Weihua Meng
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1651436 (view as bug list)
Depends On:
Blocks:	1632744
TreeView+	depends on / blocked

Reported:	2018-08-15 06:17 UTC by Gan Huang
Modified:	2019-03-20 19:42 UTC (History)
CC List:	23 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1632744 (view as bug list)
Environment:
Last Closed:	2018-11-21 11:56:23 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	3677791	0	None	None	None	2018-11-09 18:46:23 UTC
Red Hat Product Errata	RHSA-2018:2906	0	None	None	None	2018-11-21 11:56:49 UTC

Description Gan Huang 2018-08-15 06:17:43 UTC

Description of problem:
Installation against RHEL-7.6 beta and OCP v3.7.61, consequently the router pod failed to start up:

<--snip-->
  37s		37s		1	kubelet, qe-ghuang-merrn-1			Warning		FailedCreatePodSandBox	Failed create pod sandbox: rpc error: code = 2 desc = NetworkPlugin cni failed to set up pod "router-1-deploy_default" network: CNI request failed with status 400: 'Failed to execute iptables-restore: exit status 1 (iptables-restore: invalid option -- '5'
Try `iptables-restore -h' for more information.

<--snip-->

Version-Release number of selected component (if applicable):
openshift v3.7.61
kubernetes v1.7.6+a08f5eeb62
iptables-1.4.21-28.el7.x86_64

How reproducible:
always

Steps to Reproduce:
1. Trigger 3.7 installation on RHEL-7.6 beta (same results no matter it's firewalld or iptables)


Actual results:
Installation failed due to router pod failed

  37s		37s		1	kubelet, qe-ghuang-merrn-1			Warning		FailedCreatePodSandBox	Failed create pod sandbox: rpc error: code = 2 desc = NetworkPlugin cni failed to set up pod "router-1-deploy_default" network: CNI request failed with status 400: 'Failed to execute iptables-restore: exit status 1 (iptables-restore: invalid option -- '5'
Try `iptables-restore -h' for more information.

Expected results:


Additional info:
Tested against OCP 3.6/3.9.3.10, do not hit the issue.

Tested with OCP 3.7 + RHEL-7.5 (iptables-1.4.21-24.1.el7_5.x86_64), also worked fine.

Comment 1 Gan Huang 2018-08-15 06:21:24 UTC

router pod can be re-deployed successfully after downgrading iptables to iptables-1.4.21-24.1.el7_5.x86_64

Adding test blocker as it's blocking the OCP 3.7 testing against RHEL-7.6

Comment 3 Eric Garver 2018-08-16 15:47:45 UTC

Can you show what is being passed to iptables-restore on stdin?

Comment 4 Phil Sutter 2018-08-16 16:23:44 UTC

This looks like fallout from Bug 1465078 for which iptables-restore argument parser was changed to not ignore unknown parameters given on command line.

Gan Huang, could you please find out how exactly iptables-restore is being called, i.e. what parameters are passed to the command?

Thanks, Phil

Comment 5 Gan Huang 2018-08-17 01:37:29 UTC

Unfortunately I don't know how to check that, the error was threw out by Kubernetes.

A very similar issue found on upstream:
https://github.com/kubernetes/kubernetes/issues/58956

OpenShift networking team should have proper input here.

Comment 7 Meng Bo 2018-08-17 02:36:15 UTC

# iptables-restore -w5
iptables-restore: invalid option -- '5'
Try `iptables-restore -h' for more information.

I can get the same error with the command above.

And from the openshift iptables.go code here:
https://github.com/openshift/origin/blob/release-3.7/vendor/k8s.io/kubernetes/pkg/util/iptables/iptables.go#L123

Seems the openshift failed to judge the iptables version correctly.

Comment 8 Phil Sutter 2018-08-17 12:44:12 UTC

Hi Meng Bo,

(In reply to Meng Bo from comment #7)
> # iptables-restore -w5

Ah yes, that's what I suspected. All iptables tools use getopt(), so '-w5' is equivalent to '-w -5' and there is no '-5' flag.

> iptables-restore: invalid option -- '5'
> Try `iptables-restore -h' for more information.
> 
> I can get the same error with the command above.
> 
> And from the openshift iptables.go code here:
> https://github.com/openshift/origin/blob/release-3.7/vendor/k8s.io/
> kubernetes/pkg/util/iptables/iptables.go#L123

Looking at the comments to that PR, it seems like people start to forget how unix program parameters typically work. :)

An easier solution than the one pointed out there would be to just pass '--wait=5' instead of '-w5'. That should reduce the change set considerably.

> Seems the openshift failed to judge the iptables version correctly.

Maybe I don't get your point here, but '-w5' has never worked and will never work. It's just wrong syntax.

Cheers, Phil

Comment 9 Casey Callendrello 2018-08-17 12:56:12 UTC

Not only that, but iptables-restore didn't get "--wait" support until v1.6.2. The version logic linked is for iptables, not iptables-restore.

Comment 10 Phil Sutter 2018-08-17 13:07:19 UTC

(In reply to Casey Callendrello from comment #9)
> Not only that, but iptables-restore didn't get "--wait" support until
> v1.6.2. The version logic linked is for iptables, not iptables-restore.

In fact, RHEL7 supports --wait option for iptables-restore since iptables-1.4.21-18.el7. The relevant bug is 1438597.

Cheers, Phil

Comment 11 Casey Callendrello 2018-09-19 11:35:34 UTC

CC Dan Winship.

Dan, should we backport https://github.com/kubernetes/kubernetes/pull/60978 ?

Comment 12 Dan Winship 2018-09-19 13:04:22 UTC

Doh. Yes. We backported the fix as far back as OCP 3.9 because the bug was introduced in kube 1.9, but I forgot that we had backported the buggy kube 1.9 code into OCP 3.7 too.

Comment 13 Casey Callendrello 2018-09-20 13:09:40 UTC

Assigning to Jacob.

Comment 14 Jacob Tanenbaum 2018-09-21 18:36:08 UTC

3.8 PR: https://github.com/openshift/ose/pull/1418
3.7 PR: https://github.com/openshift/ose/pull/1419

Comment 16 Weihua Meng 2018-10-08 10:18:30 UTC

fixed.

openshift v3.7.65

iptables-1.4.21-28.el7.x86_64

Kernel Version: 3.10.0-957.el7.x86_64
Operating System: Red Hat Enterprise Linux Server 7.6 (Maipo)

Comment 22 Casey Callendrello 2018-11-20 10:45:14 UTC

*** Bug 1651436 has been marked as a duplicate of this bug. ***

Comment 24 Dan Winship 2018-11-20 13:42:56 UTC

3.10 and later never had the bug; it was fixed upstream before that

Comment 25 errata-xmlrpc 2018-11-21 11:56:23 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2906

Note You need to log in before you can comment on or make changes to this bug.

ademaria
aos-bugs
bbeaudoi
bbennett
cshereme
danw
dapark
egarver
gferrazs
hchatter
hgomes
jialiu
jkaur
jmalde
lstanton
mhepburn
openshift-bugs-escalate
psutter
rkshirsa
salmy
skulkarn
tomek
wmeng