Description of problem: Installation against RHEL-7.6 beta and OCP v3.7.61, consequently the router pod failed to start up: <--snip--> 37s 37s 1 kubelet, qe-ghuang-merrn-1 Warning FailedCreatePodSandBox Failed create pod sandbox: rpc error: code = 2 desc = NetworkPlugin cni failed to set up pod "router-1-deploy_default" network: CNI request failed with status 400: 'Failed to execute iptables-restore: exit status 1 (iptables-restore: invalid option -- '5' Try `iptables-restore -h' for more information. <--snip--> Version-Release number of selected component (if applicable): openshift v3.7.61 kubernetes v1.7.6+a08f5eeb62 iptables-1.4.21-28.el7.x86_64 How reproducible: always Steps to Reproduce: 1. Trigger 3.7 installation on RHEL-7.6 beta (same results no matter it's firewalld or iptables) Actual results: Installation failed due to router pod failed 37s 37s 1 kubelet, qe-ghuang-merrn-1 Warning FailedCreatePodSandBox Failed create pod sandbox: rpc error: code = 2 desc = NetworkPlugin cni failed to set up pod "router-1-deploy_default" network: CNI request failed with status 400: 'Failed to execute iptables-restore: exit status 1 (iptables-restore: invalid option -- '5' Try `iptables-restore -h' for more information. Expected results: Additional info: Tested against OCP 3.6/3.9.3.10, do not hit the issue. Tested with OCP 3.7 + RHEL-7.5 (iptables-1.4.21-24.1.el7_5.x86_64), also worked fine.
router pod can be re-deployed successfully after downgrading iptables to iptables-1.4.21-24.1.el7_5.x86_64 Adding test blocker as it's blocking the OCP 3.7 testing against RHEL-7.6
Can you show what is being passed to iptables-restore on stdin?
This looks like fallout from Bug 1465078 for which iptables-restore argument parser was changed to not ignore unknown parameters given on command line. Gan Huang, could you please find out how exactly iptables-restore is being called, i.e. what parameters are passed to the command? Thanks, Phil
Unfortunately I don't know how to check that, the error was threw out by Kubernetes. A very similar issue found on upstream: https://github.com/kubernetes/kubernetes/issues/58956 OpenShift networking team should have proper input here.
# iptables-restore -w5 iptables-restore: invalid option -- '5' Try `iptables-restore -h' for more information. I can get the same error with the command above. And from the openshift iptables.go code here: https://github.com/openshift/origin/blob/release-3.7/vendor/k8s.io/kubernetes/pkg/util/iptables/iptables.go#L123 Seems the openshift failed to judge the iptables version correctly.
Hi Meng Bo, (In reply to Meng Bo from comment #7) > # iptables-restore -w5 Ah yes, that's what I suspected. All iptables tools use getopt(), so '-w5' is equivalent to '-w -5' and there is no '-5' flag. > iptables-restore: invalid option -- '5' > Try `iptables-restore -h' for more information. > > I can get the same error with the command above. > > And from the openshift iptables.go code here: > https://github.com/openshift/origin/blob/release-3.7/vendor/k8s.io/ > kubernetes/pkg/util/iptables/iptables.go#L123 Looking at the comments to that PR, it seems like people start to forget how unix program parameters typically work. :) An easier solution than the one pointed out there would be to just pass '--wait=5' instead of '-w5'. That should reduce the change set considerably. > Seems the openshift failed to judge the iptables version correctly. Maybe I don't get your point here, but '-w5' has never worked and will never work. It's just wrong syntax. Cheers, Phil
Not only that, but iptables-restore didn't get "--wait" support until v1.6.2. The version logic linked is for iptables, not iptables-restore.
(In reply to Casey Callendrello from comment #9) > Not only that, but iptables-restore didn't get "--wait" support until > v1.6.2. The version logic linked is for iptables, not iptables-restore. In fact, RHEL7 supports --wait option for iptables-restore since iptables-1.4.21-18.el7. The relevant bug is 1438597. Cheers, Phil
CC Dan Winship. Dan, should we backport https://github.com/kubernetes/kubernetes/pull/60978 ?
Doh. Yes. We backported the fix as far back as OCP 3.9 because the bug was introduced in kube 1.9, but I forgot that we had backported the buggy kube 1.9 code into OCP 3.7 too.
Assigning to Jacob.
3.8 PR: https://github.com/openshift/ose/pull/1418 3.7 PR: https://github.com/openshift/ose/pull/1419
fixed. openshift v3.7.65 iptables-1.4.21-28.el7.x86_64 Kernel Version: 3.10.0-957.el7.x86_64 Operating System: Red Hat Enterprise Linux Server 7.6 (Maipo)
*** Bug 1651436 has been marked as a duplicate of this bug. ***
3.10 and later never had the bug; it was fixed upstream before that
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2906