Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1544903 - Push image still failed, no route to host
Push image still failed, no route to host
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking (Show other bugs)
3.7.1
Unspecified Unspecified
unspecified Severity unspecified
: ---
: 3.9.0
Assigned To: Dan Winship
Meng Bo
: OpsBlocker
Depends On:
Blocks: 1546169 1546170 1547599
  Show dependency treegraph
 
Reported: 2018-02-13 12:38 EST by Max Whittingham
Modified: 2018-03-28 10:28 EDT (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: In some (as-yet-undetermined) circumstances, nodes were apparently receiving a duplicate out-of-order HostSubnet "deleted" event from the master. Consequence: When processing the duplicate event, the node could end up deleting OVS flows corresponding to an active node, causing pods on the two nodes to be unable to communicate with each other. (This was most noticeable when it happened to a node hosting the registry.) Fix: The HostSubnet event-processing code will now notice that the event is a duplicate and ignore it. Result: OVS flows are not deleted, and pods can communicate.
Story Points: ---
Clone Of:
: 1546169 (view as bug list)
Environment:
Last Closed: 2018-03-28 10:28:16 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
check-sdn.sh (1.79 KB, text/plain)
2018-02-13 21:29 EST, Eric Paris
no flags Details
flush-infra.sh (1.81 KB, text/plain)
2018-02-13 21:33 EST, Eric Paris
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Origin (Github) 18617 None None None 2018-02-14 09:55 EST
Red Hat Product Errata RHBA-2018:0489 None None None 2018-03-28 10:28 EDT

  None (edit)
Description Max Whittingham 2018-02-13 12:38:46 EST
Description of problem:
We've been seeing periodic but pretty consistent problems both pushing and pulling from the registry with the error 'No route to host'

Version-Release number of selected component (if applicable):
3.7.23-1

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
Comment 2 Eric Paris 2018-02-13 13:20:18 EST
us-west-2 should now be happy/working fine. we are still root causing/final solution and will update with how we got it working....
Comment 3 Eric Paris 2018-02-13 21:29 EST
Created attachment 1395746 [details]
check-sdn.sh

I run this script with:
ansible 'starter*infra*' -u root -m script -a check-sdn.sh
If the script exits with a 'FAIL' that means the OVS rules are messed up. It can affect other communication paths, but since the most common is infra<->making sure those stay pretty clean is more important. On us-west-2 we saw compute nodes unable to pull from the registry because the infra nodes rule sets were messed up.
Comment 4 Eric Paris 2018-02-13 21:33 EST
Created attachment 1395747 [details]
flush-infra.sh

Running from a master with affected infra nodes this script will drain the infra node, delete all of the containers and cruft left behind, and then start the infra node again. This results in a new clean OVS ruleset
Comment 6 Dan Winship 2018-02-16 08:10:57 EST
fixed by https://github.com/openshift/origin/pull/18617
Comment 10 Meng Bo 2018-03-06 22:06:36 EST
Tested on v3.9.3
There is no replay of DeleteHostSubnetRules event when deleting the node.
Comment 13 errata-xmlrpc 2018-03-28 10:28:16 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0489

Note You need to log in before you can comment on or make changes to this bug.