Bug 1822945
| Summary: | Egress Router pod is stuck in Init:CrashLoopBackOff | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Weibin Liang <weliang> | ||||
| Component: | Networking | Assignee: | Dan Winship <danw> | ||||
| Networking sub component: | openshift-sdn | QA Contact: | Weibin Liang <weliang> | ||||
| Status: | CLOSED ERRATA | Docs Contact: | |||||
| Severity: | high | ||||||
| Priority: | high | CC: | aconstan, anbhat, anusaxen, bbennett, ckoep, danw, dcbw, maupadhy, mifiedle, nraghava, pstrick, rkhan | ||||
| Version: | 4.1.z | Keywords: | ServiceDeliveryImpact | ||||
| Target Milestone: | --- | ||||||
| Target Release: | 4.6.0 | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | SDN-CI-IMPACT,SDN-STALE | ||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: |
Cause:
Consequence: Egress router pods could not be run in OCP 4.x
Fix: A modification was made to the RHCOS image to allow containers to use legacy iptables binaries in their own network namespace
Result: Egress router pods can be run in OCP 4.x
|
Story Points: | --- | ||||
| Clone Of: | |||||||
| : | 1855894 (view as bug list) | Environment: | |||||
| Last Closed: | 2020-10-27 15:57:47 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1855894 | ||||||
| Attachments: |
|
||||||
|
Description
Weibin Liang
2020-04-10 15:29:56 UTC
> [weliang@weliang FILE]$ oc logs egress-redirect-pod
> Error from server (BadRequest): container "egressrouter-redirect" in pod "egress-redirect-pod" is waiting to start: PodInitializing
hm... does it work if you do "oc logs -c egress-router egress-redirect-pod" ? Or failing that, try modifying the egress-router initContainer's definition to include "terminationMessagePolicy: FallbackToLogsOnError" so that the logs will be captured into the pod status.
Created attachment 1678542 [details]
Testing log
> [weliang@weliang FILE]$ oc logs -c egress-router egress-redirect-pod
> iptables v1.4.21: can't initialize iptables table `nat': Table does not exist (do you need to insmod?)
Ah. We are not loading the iptables legacy kernel modules by default, so pods that try to use legacy iptables in their own network namespace will fail.
I think we had agreed that we wanted this to work, right? So we should fix RHCOS (or something) to ensure that the legacy iptables modules are loaded no matter what?
Yes, that was what we decided: privileged pods (et al.) can use legacy iptables in their network namespace. Among other things, istio makes use of this. Of course, we have no way of enforcing that they don't insert rules in to the root network namespace. Perhaps we should write an alert for that. this block the egress router feature on all version (4.1/2/3/4) as well MCO folks say it would make sense to add a file to the default template for this (eg, something in /etc/modules-load.d/ to get systemd to load the modules). It will need some testing though (eg, to make confirm that it doesn't break the logic of containers using https://github.com/kubernetes-sigs/iptables-wrappers) Dan: Was that something that the MCO team was going to do, or that they wanted the CNO to do? It seems a little weird for the CNO to do it if we expect the platform to be able to use iptables. Network team was going to submit a patch to MCO (not CNO), after testing that it doesn't break various scenarios (comment 6) Pushing to 4.6 since this has been in every 4.y release. Tested and verified in 4.6.0-0.nightly-2020-06-30-112422 oc logs -c egress-router egress-redirect-pod will not see below error any more: iptables v1.4.21: can't initialize iptables table `nat': Table does not exist (do you need to insmod?) Ben, can this be backported to 4.4, please? We have at least one OSD customer who has hit this and we do not have a workaround for them. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196 |