Bug 1717639
Summary: | Random outages with egressIP | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Pedro Amoedo <pamoedom> | |
Component: | Networking | Assignee: | Casey Callendrello <cdc> | |
Networking sub component: | openshift-sdn | QA Contact: | zhaozhanqi <zzhao> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | high | |||
Priority: | high | CC: | anusaxen, aos-bugs, danw, erich, jack.ottofaro, jcrumple, misalunk, nchavan, openshift-bugs-escalate, weliang | |
Version: | 3.11.0 | |||
Target Milestone: | --- | |||
Target Release: | 4.2.0 | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: |
Cause: If a pod using an egress IP tries to contact an external host that is not responding, the egress IP monitoring code may mistakenly interpret that as meaning that the node hosting the egress IP is not responding.
Consequence: High-availability egress IPs might get switched from one node to another spuriously.
Fix: The monitoring code now distinguishes the case of "egress node not responding" from "final destination not responding"
Result: High-availability egress IPs will not be switched between nodes unnecessarily.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1718541 1718542 1732486 (view as bug list) | Environment: | ||
Last Closed: | 2019-10-16 06:31:27 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1718541, 1718542, 1728342, 1732486 |
Description
Pedro Amoedo
2019-06-05 20:42:35 UTC
PR has been sent for this issue by rkojedzinszky . PTAL. https://github.com/openshift/origin/pull/23069 list reproduce steps for verified this bug: 1. Create cluster on 3.11 with networkpolicy plugin 2. Create new project 3. Added egressip for namespaces. eg: oc patch netnamespace z1 -p '{"egressIPs":["10.0.76.100"]}' 4. Added egressip on one node, eg: oc patch hostsubnet preserve-zzhao-311nrr-1 -p '{"egressIPs":["10.0.76.100"]}' 5. Create test pod to make sure it scheduled to node (not the egress ip node) 6. rsh into the test pod and ping one blocked ip 7. check the sdn logs of node which same the test pod. Above two PRs are for v3.11, can not find the PR number for v4.2 Move bug from ON_QA back to assigned (In reply to Weibin Liang from comment #24) > Above two PRs are for v3.11, can not find the PR number for v4.2 The PRs in the comments are the customer's original 3.11 PR; the PR linked from the "External Trackers" table is correct: https://github.com/openshift/origin/pull/23089 Following steps in comment 19, testing passed in 4.2.0-0.nightly-2019-06-21-041727. created bug 1732486 for 3.10 backport Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922 |