Bug 1851182
Summary: | OVS connection timeout leading to un-programmed OVS flows by the SDN | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Alexander Constantinescu <aconstan> | |
Component: | Networking | Assignee: | Alexander Constantinescu <aconstan> | |
Networking sub component: | openshift-sdn | QA Contact: | zhaozhanqi <zzhao> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | urgent | |||
Priority: | urgent | CC: | aaleman, aarapov, akonarde, aos-bugs, apahim, asegundo, aweiteka, bbennett, cattias, cblecker, ccoleman, cdc, dcbw, dhansen, jaharrin, jbeakley, jchevret, jeder, kbsingh, lmohanty, markmc, marobrie, mcambria, nmalik, pbergene, scuppett, sdodson, tparikh, trankin, tsmetana, vrutkovs, wking, zzhao | |
Version: | 4.3.0 | Keywords: | ServiceDeliveryBlocker, Upgrades | |
Target Milestone: | --- | |||
Target Release: | 4.6.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | No Doc Update | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | 1838007 | |||
: | 1853193 (view as bug list) | Environment: | ||
Last Closed: | 2020-10-27 16:09:42 UTC | Type: | --- | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1838007, 1853193 |
Description
Alexander Constantinescu
2020-06-25 17:21:30 UTC
Are the metrics we need to add (like iptables) that ensure we eventually converge? Everything should retry everywhere at some interval, but if we add a retry we also need to meauser how often it happens. https://search.apps.build01.ci.devcluster.openshift.com/?search=ovs-ofctl%3A+br0%3A+failed+to+connect+to+socket&maxAge=48h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job Not sure all of these are the same (could be races during shutdown), but this is showing up in ~0.64% of failing CI runs in last 2 days, 1.7% of failing CI runs in last 14 days In response to #comment 1: Added a commit with metric for failed OVS commits by openshift-sdn (see referenced PR) Hmm, the PR merged...the bot should have update this to MODIFIED, so I am doing that manually This issue cannot be reproduced in 4.6 version. Move this bug to 'verified' Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196 Removing UpgradeBlocker from this older bug, to remove it from the suspect queue described in [1]. If you feel like this bug still needs to be a suspect, please add keyword again. [1]: https://github.com/openshift/enhancements/pull/475 |